In the past week, I was working on enabling Google Kubernetes EngineWorkload Identity on our clusters. Workload Identity is a solution for connecting Kubernetes Service Account to Google Cloud Service Account – and by this, granting specific permissions to a specific workload on the cluster. While enabling workload identity is relatively simple, the hard question is how we enable it in scale – how we let devs use it easily and securely.
And this is where Terraform come handy: using it, I can easily build an abstraction (=module) that developers could use to create all the resource required for workload identity. Writing this module allows me to carefully choose what to expose, building a paved road to be used by the developers. Finally, there are very interesting developments in the are of SAST for Terraform (see this talk, as one example) – making it an even more interesting tool.
So, I decided to try and use Terraform for this. Writing the module was pretty easy (there are even public modules that exist, like this one), but how devs will use it? This is where GitHub PR flow come handy: Using the pull request mechanism (PR), we let everyone to ask permissions (self-service) while ensuring those changes go through a defined process of reviewing and testing before applying them. Let’s see how we can build the same flow for terraform!
Step A: Plan
One of the most powerful concept of Terraform is “plan” and “apply”. The “plan” step let you see what your code is going to change in production, before running the “apply” step that (like the name implies) apply those changes. Terraform even go one step further and let you save the plan, and pass it to the apply step – so you can know for sure that the changes you reviewed during planning, will be the changes that applied to production.
So to build a good PR flow, we first need to run “terraform plan” on each PR and publish the changes back to the PR. This will make it easier to review the changes and understand what is going to happen. For that we can use a small tool called “tfnotify” that can parse terraform cli output and post it back to GitHub:
Running it with GitHub action is pretty easy (see tfnotify readme for more details). As I said earlier, we need to persist the plan so we can later use it when applying. GitHub actions support caching, but it was easier just to upload it to a Google Storage bucket.
Step B: Apply
After the PR was approved, and the tests we added passed it’s time to merge it to production and apply the changes. And here is the tricky part: We need somehow link the commit that just merged to the master branch with the PR – so we can find the plan we saved. Without the ability to do this, we need to run plan again on master – and reviewing it again.
Luckily, when using GitHub actions, we can trigger our build on any GitHub event – including Pull Request closed. The build will run on the code after merging – so if our branch was not synced it will contain those changes (and the apply will fail – as it will not match the saved plan). On the other hand, our workflow run on the same PR – so we have the PR number and can fetch the matching plan. And finally, using TF notify we can also post the apply output back to the same PR and update the developer about the result:
And now we have in one PR the full flow our changes have to go through plan, review, tests, and apply (including times!). This gives us a lot of visibility to any change in our production environment and makes it very easy to investigate later.
Using Terraform with GitHub PR flow allows us to build really powerful flows that devs can easily use. Using this flow, requesting access for service to Google Cloud Platform is really simple – create a PR, review the changes, and merge them. There are a lot of improvements we can add here – SAST to detect too wide permissions, good separation between prod and dev environment, and more. Now that we have a good strong foundation, we can start extending it and making it even better!