How To: Fast and Secure builds with Bazel Remote Cache

For the past few months, I’ve used mainly Scala programing language. Like any other language, I needed a toolchain to compile my code into something I could run. When I started with Scala, I chose to go with SBT. It went pretty well, but I wasn’t very happy with it – it is a bit slow, dependency management is not amazing, and some other issues.
So I started to look for alternatives, and I was surprised to learn how many alternatives are out there. Of course, I could use any Java tools, like Gradle or Maven – which both support Scala. Like SBT, each has its own downside. While I was looking for other alternatives, I wanted to find something with a good support monorepo development strategy (as we are slowly moving our code to monorepo). And at this point, I found Bazel, a build tool by Google.
Bazel is amazing and it can do a lot of things, including building Scala source code. One of the coolest Bazel features is the built-in support for remote caching, which aims to speed up build time. For lazy developers like me, the coolest thing here is the native support for Google Cloud Storage – so I could have “serverless” cache deployment. This all sounds simple, right? Until you ask: how secure it is? And this is where things become interesting! Continue reading “How To: Fast and Secure builds with Bazel Remote Cache”
Batman kid

Nginx Ingress: The Security Hero We Need!

I love Nginx Ingress! It is a very powerful Kubernetes Ingress, with so many capabilities. But I think it does not get enough appreciation in the AppSec world. Just by using Nginx Ingress, you could get so many security features for almost free. And even better, you can enable them once – and every workload in the cluster will have them! For example, you can monitor and chase after developers to enable security headers. Or just do it once, test it once, and forget. That’s it. An entire class of bugs doesn’t exist anymore. Isn’t that existing? Let’s see what else Nginx ingress can do for us! Continue reading “Nginx Ingress: The Security Hero We Need!”

Istio in Production?

Istio is one of the most popular service mesh. It can help in solving many issues that surface when running a lot of microservices – things like authentication, authorization, observability and traffic routing. It all sounds really promising, so we decided to give it a try at Soluto. During the process of deploying it on an existing cluster and enabling it on existing workloads, I faced a lot of interesting issues. Let me share some of them with you. Continue reading “Istio in Production?”

Keeping Prometheus in Shape

Prometheus is a great monitoring tool. It can easily scrape all the services in your cluster dynamically, without any static configuration. For me, the move from manual metrics shipping to Prometheus was magical. But, like any other technology we’re using, Prometheus need special care an love. If not handled properly, it can easily get out of shape. Why does it happen? And how can we keep it in shape? Let’s first do a quick recap of how Prometheus works.

Prometheus Monitoring Model

Prometheus works differently from other monitoring systems – it uses pull over push model. The push model is simple: Just push metrics from your code directly to the monitoring system, for example – Graphite.

Pull model is fundamentally different – the service exposes metrics on a specific endpoint, and Prometheus scrapes them once in a while (the scrape interval – see here how to configure it). While there are reasons to prefer push over the pull model, it has its own challenges: Each metric scrape operation can take time; what happens if it the scrape take longer then the scrape interval?

For example, let’s say Prometheus is configured to scrape its targets (that’s how services are called in Prometheus language) once in 20 seconds; what will happen if one scrape takes more then 20 seconds? The result is out of order metrics: instead of having a data point every 20 seconds, it will be every time the scrape completed. What can we do?

Continue reading “Keeping Prometheus in Shape”

Investigating Kubernetes Nodes Disk Usage

Today, I looked at our production Kubernetes cluster dashboard and I noticed something weird:

disk usage is high - almost 80%!
(sum (node_filesystem_size) – sum (node_filesystem_free)) / sum (node_filesystem_size) * 100

Well, this looks pretty bad. This is the average disk usage of the nodes running in the cluster. On average, only 20% percent of the disk in each node is available. This is probably not a good sign.

Continue reading “Investigating Kubernetes Nodes Disk Usage”