How To: Fast and Secure builds with Bazel Remote Cache

For the past few months, I’ve used mainly Scala programing language. Like any other language, I needed a toolchain to compile my code into something I could run. When I started with Scala, I chose to go with SBT. It went pretty well, but I wasn’t very happy with it – it is a bit slow, dependency management is not amazing, and some other issues.
So I started to look for alternatives, and I was surprised to learn how many alternatives are out there. Of course, I could use any Java tools, like Gradle or Maven – which both support Scala. Like SBT, each has its own downside. While I was looking for other alternatives, I wanted to find something with a good support monorepo development strategy (as we are slowly moving our code to monorepo). And at this point, I found Bazel, a build tool by Google.
Bazel is amazing and it can do a lot of things, including building Scala source code. One of the coolest Bazel features is the built-in support for remote caching, which aims to speed up build time. For lazy developers like me, the coolest thing here is the native support for Google Cloud Storage – so I could have “serverless” cache deployment. This all sounds simple, right? Until you ask: how secure it is? And this is where things become interesting!

What can go wrong?

Answering this question is very important – without knowing how bad it is, we cannot decide how much we want to spend on securing it. The best way to answer is by doing a Threat Modeling analysis, and I like to follow Adam Shostack‘s “4 Questions Framework“. In the previous section, we answered the first question – what are we building – a cache for Bazel using Google cloud storage. Now let’s discuss what could go wrong.
To answer it, let’s first look at what is stored in the cache. According to the documentation:
The remote cache stores two types of data:
  • The action cache, which is a map of action hashes to action result metadata.
  • A content-addressable store (CAS) of output files.
So assuming a hacker got access to the cache, she could download all the output of the cached output. In the case of Scala code – this is the compiled Java Bytecode which could be decompiled easily to the actual Java source code. So the main issue here is information disclosure – but it is not the only one.
What if a hacker can write to the cache? Assuming this done right, this will trick Bazel to download the compiled result and run it on your machine (or your CI machine) – leading to remote code execution. It is out of the scope of this blog to analyze how hard it is (and if you do end up writing something like this, please do let me know!).

Data Flow Analysis

So, to conclude, we want to protect the cache. Really hard. But what do we need to protect?
Dataflow Flow DIagram for Bazel Remote Cache Setup, showing the two actors - CircleCI machine and the developer machine which needs access to  the cache
Data Flow Diagram
We have 2 main flows here:
  • A Developer, running Bazel on her local machine (read-only)
  • The CI machine (we are using CircleCI), building our code (read/write)
The source of truth for our cache is the CI, so this is why developers need only read permissions from the cache. Only the CI will update the cache, and only for code that was pushed to source control.
Now, with a clear picture in the head of the data flow and what can go wrong, let’s see how we can prevent it!

What are we doing about it?

The first thing to do is ensure the storage follows Google’s security documentation. Especially, I prefer using Unified Bucket Level Access so that all objects will always have the same permissions – to ensure no object is accessible without authentication by mistake. I also prefer to enable Public Access Prevention on the bucket. By enabling both settings, we now just need to worry about creating the right IAM roles (see the list of all available roles here):
  • Developers need only the Storage Object Viewer role. This will give them read-only access to the bucket.
  • For our CI we can use the Storage Object Admin, which gives the CI full access to the bucket content (but no permissions to change bucket settings, like Public Access Prevention).
Careful! Those roles should be assigned to the specific bucket we created for the cache, not to all the buckets in the account!

Authenticating developers

Authenticating developers is really simple – Bazel supports authentication to Google Cloud Storage with Google Application Default Credentials. So, if you have gcloud installed and authenticated all you need to do is pass --google_default_credentials to Bazel and it will use your credentials to authenticate to GCS.
Now, all we need is to assign the Role we mentioned above to all the developers (for example, with a group), and now all our developers can securely access the cache.

Authenticating CircleCI

This part is a bit more tricky. Traditionally it would involve creating a service account with static credentials and using it from the CI. This has some downsides, for example:
  • If our CI is compromised, a hacker could steal those static credentials and use them to access our bucket.
  • We need to frequently rotate those credentials, which is cumbersome.
  • We need to ensure we pass them securely to the CI server and that we clean up any temporary location we used to store them in transit (think about the machine used by the developer creating them).
Luckily, we have a better option today: Identity Token. An identity token is a unique identifier of the specific job running on CircleCI that we can easily verify (usually, a JWT). By establishing trust between our cloud provider and CircleCI, we can use this token to authenticate to the cloud without any static credentials!
Spongebob Magic Meme
Magic! (Source: mememonkey)

Setting up Identity Token

To establish trust between Google Cloud and Circle CI, we need to set up Workload Identity Federation. This can be done by running a few commands using gcloud cli (or with Terraform):
gcloud iam workload-identity-pools create circle-ci \    
  --location="global" \
  --description="Used by CircleCI" \    
  --display-name="CircleCI Pool" \
  --project <project id>

gcloud iam workload-identity-pools providers create-oidc circle-ci \
  --location="global" \    
  --workload-identity-pool="circle-ci" \    
  --issuer-uri="<CircleCI Org Id>" \    
  --allowed-audiences="<CircleCI Org Id>" \    
  --attribute-mapping="google.subject=assertion[""]" \    
  --project <project id>
Those commands create the necessary resources for establishing trust between Circle CI and Google Cloud. See the docs above for more details!
To use them we need to find the Organization Id, which can be found at:<name>/overview
Replace the name with your organization’s name on Github.
Now we need to give permission for a specific project on Circle CI to Google Cloud. This can be done from the console (if you found how to do this with the CLI, let me know!(
  • First, go to the provider settings – it will be under (replace <> with the relevant project id):<>
  • Now click on “Grant Access”
Workload Identity Pool settings on Console
  • And now choose the permissions required:
Adding permissions to a service account window
Here we can bind a service account to identity in this pool. Create a service account, and assign it the permissions we discussed above (Storage Object Admin) only on the specific bucket. Choose it in the Service account Combobox.
We also want to control which projects on CircleCI can assume this service account – we can control this by specifying a subject on the second Combobox. The value should be the project id of the project we want to grant permission on CircleCI. This is working because of the attribute mapping we set when we created the pool:
You can find the project id on the project settings page on CircleCI.

Using Identity Federation in a Job

Ok, now we established trust between a specific project on CircleCI to a specific service account on Google Cloud. How we can use it? Very simple:
- run: 
          name: Login to GCP
          command: |
            echo $CIRCLE_OIDC_TOKEN > $HOME/circle_token
            gcloud iam workload-identity-pools create-cred-config \
              <pool URL>\
              --service-account=<service account email> \
              --output-file=$HOME/creds.json \
              --credential-source-file=$HOME/circle_token \
            gcloud auth login --cred-file=$HOME/creds.json
What we are doing here?
  • Writing CircleCI’s identity token to a file so we could use it later to authenticate to Google Cloud
  • Invoke a specific command to generate a credentials file using the identity pool we created before, the email of the service account we want to assume, and the location of the token file.
    You can find the service account email on the service account page on the console. To find the pool URL, use gcloud:
gcloud iam workload-identity-pools providers list \ 
  --location=global \
  --workload-identity-pool="circle-ci" \
  --project=<project name>
The name returned by this command is the URL.
All we need is to tell Bazel to use the credentials file to authenticate to Google Cloud:
And Bazel builds on our CI could authenticate to Google Cloud and use the cache for faster builds!

Did we do a good job?

This is the last question in the model, and usually the most challenging one. Let’s reflect on what we did today. First, let’s look at the risks we identified:
  • Potential information disclosure if unauthorized person gains read access to the bucket.
  • Potential remote code execution if unauthorized person gains write access to the bucket.
We mitigated those risks by:
  • Preventing anonymous access to the bucket (and following Google’s documentation).
  • Following Least Privilege Principle and grant only the permissions required and only to this bucket.
  • Leveraging CircleCI’s identity token to authenticate to GCP without static credentials.
Seems like we did a pretty good job here. But it’s important to remember we can always do more. For example, what about supply chain risks? Compromising the CI in order to poison the cache with malicious files to gain remote code execution?
There are always additional risks and mitigation we could implement, and this is why this question is so important. To remind us to keep assessing what we did.

Wrapping Up

Using Bazel remote cache, our builds are much faster now – down from ~12 minutes to ~4 minutes. I used .bazelrc to configure common Bazel settings with one setting for the CI and one for running locally. This was inspired by Tensorboard setup, take a look at the repository to learn how they did it.
We are now slowly moving to a very interesting future, where every machine has an identity – including our CI servers. We can use this identity to provide secure access to anything we need – and slowly move toward a truly passwordless future. There are endless opportunities for what we can do with it. Exciting times!

Leave a Reply

Your email address will not be published. Required fields are marked *