Caching dependencies with Google Cloud Build and GCS
Our testsuite on Cloud Build was taking about ten minutes to run 700+ Java and 410+ JavaScript tests with every merge to master, so we considered using Kaniko cache to save the dependencies into a Docker image, but it felt that we might end up with stale dependencies, given that sanity checking the contents of a docker image is a nontrivial task. So, inspired by the cache implementation in Github Actions, we decided to save and restore our gradle and npm dependencies ($HOME/.gradle
and $HOME/.npm
) into GCS, by adding two additional steps to our cloudbuild.yaml
:
steps:
#
# Restore dependencies cache
# You probably want to use your own container image extending
# this cloud-sdk and installing gsutil
#
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk:alpine'
id: 'restore-cache'
entrypoint: '/bin/sh'
args: ['-c', './restore-cache']
# Run other tasks here, launching JS and Java tests in parallel.
# Backup dependencies into the cache for the next run
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk:alpine'
waitFor: ['compilation-step']
entrypoint: '/bin/sh'
args: ['-c', './save-cache']
options:
env:
# Point gradle and npm caches to a persistent location
- GRADLE_USER_HOME=/home/cache/.gradle
- npm_config_cache=/home/cache/.npm
volumes:
- name: 'cache'
path: '/home/cache'
timeout: 900s
To make this work, we created a persistent volume mapped to /home/cache
to store the cached dependencies between build steps.
./save-cache
and ./restore-cache
are quite straightforward:
#!/bin/bash
# save-cache: Save the gradle and npm cache to GCS.
set -e -u
tar -czf /tmp/gradle.tgz -C $GRADLE_USER_HOME .
echo "$(tar -tvzf /tmp/gradle.tgz | wc -l) files copied from $GRADLE_USER_HOME"
tar -czf /tmp/npm.tgz -C $npm_config_cache .
echo "$(tar -tvzf /tmp/npm.tgz | wc -l) files copied from $npm_config_cache"
echo 'Saving dependencies to gs://my_cache_bucket/'
gsutil -q -m cp /tmp/gradle.tgz /tmp/npm.tgz gs://my_cache_bucket/
echo 'Saving timestamp to gs://my_cache_bucket/timestamp'
date +%s | gsutil -q cp - gs://my_cache_bucket/timestamp
We compressed both folders into .tgz
files, and you may have noticed the additional timestamp. This helps to avoid accumulating unused dependencies, for example if we change versions of gradle or individual libraries.
#!/bin/bash
# restore-cache: Restore the gradle and npm cache from GCS.
set -e -u
# If there is a cache and the content is not older than a month
TIMESTAMP=$(gsutil cat gs://my_cache_bucket/timestamp || echo 0)
SECONDS_IN_A_MONTH=2629743
if (( $(date +%s) - $TIMESTAMP < $SECONDS_IN_A_MONTH )); then
gsutil -q -m cp gs://my_cache_bucket/gradle.tgz gs://my_cache_bucket/npm.tgz /tmp
mkdir -p $GRADLE_USER_HOME $npm_config_cache
# copy gradle dependencies
echo 'Restoring gradle cache'
tar -xzf /tmp/gradle.tgz -C $GRADLE_USER_HOME
echo "$(ls -pR $GRADLE_USER_HOME | grep -v / | wc -l) files restored to $GRADLE_USER_HOME"
# copy npm dependencies
echo 'Restoring npm cache'
tar -xzf /tmp/npm.tgz -C $npm_config_cache
echo "$(ls -pR $npm_config_cache | grep -v / | wc -l) files restored to $npm_config_cache"
else
if (( $TIMESTAMP == 0 )); then
echo 'Skipping cache restore: timestamp not found at gs://my_cache_bucket/timestamp'
else
echo 'Skipping cache restore: timestamp at gs://my_cache_bucket/timestamp is older than a month'
fi
fi
If the timestamp is missing or the cache is too old, we skip the restore-cache
step and recreate the whole thing from scratch.
This process saved us 2 minutes of execution time per build, and it helped us reassess the size of our container images (saved 30s) and how many steps we could run in parallel (around 3 minutes per build). In the end, we could reduce from ten minutes to almost five, but your mileage may vary. Let us know your own results at @koliseoapi.