So I started to look for alternatives, and I was surprised to learn how many alternatives are out there. Of course, I could use any Java tools, like Gradle or Maven – which both support Scala. Like SBT, each has its own downside. While I was looking for other alternatives, I wanted to find something with a good support monorepo development strategy (as we are slowly moving our code to monorepo). And at this point, I found Bazel, a build tool by Google.
Bazel is amazing and it can do a lot of things, including building Scala source code. One of the coolest Bazel features is the built-in support for remote caching, which aims to speed up build time. For lazy developers like me, the coolest thing here is the native support for Google Cloud Storage – so I could have “serverless” cache deployment. This all sounds simple, right? Until you ask: how secure it is? And this is where things become interesting!
What can go wrong?Answering this question is very important – without knowing how bad it is, we cannot decide how much we want to spend on securing it. The best way to answer is by doing a Threat Modeling analysis, and I like to follow Adam Shostack‘s “4 Questions Framework“. In the previous section, we answered the first question – what are we building – a cache for Bazel using Google cloud storage. Now let’s discuss what could go wrong.
To answer it, let’s first look at what is stored in the cache. According to the documentation:
The remote cache stores two types of data:
- The action cache, which is a map of action hashes to action result metadata.
- A content-addressable store (CAS) of output files.
What if a hacker can write to the cache? Assuming this done right, this will trick Bazel to download the compiled result and run it on your machine (or your CI machine) – leading to remote code execution. It is out of the scope of this blog to analyze how hard it is (and if you do end up writing something like this, please do let me know!).
Data Flow AnalysisSo, to conclude, we want to protect the cache. Really hard. But what do we need to protect? We have 2 main flows here:
- A Developer, running Bazel on her local machine (read-only)
- The CI machine (we are using CircleCI), building our code (read/write)
Now, with a clear picture in the head of the data flow and what can go wrong, let’s see how we can prevent it!
What are we doing about it?The first thing to do is ensure the storage follows Google’s security documentation. Especially, I prefer using Unified Bucket Level Access so that all objects will always have the same permissions – to ensure no object is accessible without authentication by mistake. I also prefer to enable Public Access Prevention on the bucket. By enabling both settings, we now just need to worry about creating the right IAM roles (see the list of all available roles here):
- Developers need only the Storage Object Viewer role. This will give them read-only access to the bucket.
- For our CI we can use the Storage Object Admin, which gives the CI full access to the bucket content (but no permissions to change bucket settings, like Public Access Prevention).
Authenticating developersAuthenticating developers is really simple – Bazel supports authentication to Google Cloud Storage with Google Application Default Credentials. So, if you have
gcloudinstalled and authenticated all you need to do is pass
--google_default_credentialsto Bazel and it will use your credentials to authenticate to GCS.
Now, all we need is to assign the Role we mentioned above to all the developers (for example, with a group), and now all our developers can securely access the cache.
Authenticating CircleCIThis part is a bit more tricky. Traditionally it would involve creating a service account with static credentials and using it from the CI. This has some downsides, for example:
- If our CI is compromised, a hacker could steal those static credentials and use them to access our bucket.
- We need to frequently rotate those credentials, which is cumbersome.
- We need to ensure we pass them securely to the CI server and that we clean up any temporary location we used to store them in transit (think about the machine used by the developer creating them).
Setting up Identity TokenTo establish trust between Google Cloud and Circle CI, we need to set up Workload Identity Federation. This can be done by running a few commands using
gcloudcli (or with Terraform):
Those commands create the necessary resources for establishing trust between Circle CI and Google Cloud. See the docs above for more details!
gcloud iam workload-identity-pools create circle-ci \ --location="global" \ --description="Used by CircleCI" \ --display-name="CircleCI Pool" \ --project <project id> gcloud iam workload-identity-pools providers create-oidc circle-ci \ --location="global" \ --workload-identity-pool="circle-ci" \ --issuer-uri="https://oidc.circleci.com/org/<CircleCI Org Id>" \ --allowed-audiences="<CircleCI Org Id>" \ --attribute-mapping="google.subject=assertion["oidc.circleci.com/project-id"]" \ --project <project id>
To use them we need to find the Organization Id, which can be found at:
Replace the name with your organization’s name on Github.
Now we need to give permission for a specific project on Circle CI to Google Cloud. This can be done from the console (if you found how to do this with the CLI, let me know!(
- First, go to the provider settings – it will be under (replace
<>with the relevant project id):
- Now click on “Grant Access”
- And now choose the permissions required:
We also want to control which projects on CircleCI can assume this service account – we can control this by specifying a subject on the second Combobox. The value should be the project id of the project we want to grant permission on CircleCI. This is working because of the attribute mapping we set when we created the pool:
You can find the project id on the project settings page on CircleCI.
Using Identity Federation in a JobOk, now we established trust between a specific project on CircleCI to a specific service account on Google Cloud. How we can use it? Very simple:
What we are doing here?
- run: name: Login to GCP command: | echo $CIRCLE_OIDC_TOKEN > $HOME/circle_token gcloud iam workload-identity-pools create-cred-config \ <pool URL>\ --service-account=<service account email> \ --output-file=$HOME/creds.json \ --credential-source-file=$HOME/circle_token \ --credential-source-type=text gcloud auth login --cred-file=$HOME/creds.json
- Writing CircleCI’s identity token to a file so we could use it later to authenticate to Google Cloud
- Invoke a specific command to generate a credentials file using the identity pool we created before, the email of the service account we want to assume, and the location of the token file.
You can find the service account email on the service account page on the console. To find the pool URL, use
gcloud iam workload-identity-pools providers list \ --location=global \ --workload-identity-pool="circle-ci" \ --project=<project name>
namereturned by this command is the URL.
All we need is to tell Bazel to use the credentials file to authenticate to Google Cloud:
And Bazel builds on our CI could authenticate to Google Cloud and use the cache for faster builds!
Did we do a good job?This is the last question in the model, and usually the most challenging one. Let’s reflect on what we did today. First, let’s look at the risks we identified:
- Potential information disclosure if unauthorized person gains read access to the bucket.
- Potential remote code execution if unauthorized person gains write access to the bucket.
- Preventing anonymous access to the bucket (and following Google’s documentation).
- Following Least Privilege Principle and grant only the permissions required and only to this bucket.
- Leveraging CircleCI’s identity token to authenticate to GCP without static credentials.
There are always additional risks and mitigation we could implement, and this is why this question is so important. To remind us to keep assessing what we did.
Wrapping UpUsing Bazel remote cache, our builds are much faster now – down from ~12 minutes to ~4 minutes. I used
.bazelrcto configure common Bazel settings with one setting for the CI and one for running locally. This was inspired by Tensorboard setup, take a look at the repository to learn how they did it.
We are now slowly moving to a very interesting future, where every machine has an identity – including our CI servers. We can use this identity to provide secure access to anything we need – and slowly move toward a truly passwordless future. There are endless opportunities for what we can do with it. Exciting times!