AppSec Learning SRE principles: Metrics and Measurements
Extending Kubernetes with CRDs – The Hard Way
A Production Thanos Deployment
Solving Trust Issues at Scale
Istio in Production?
Do we really need threat modeling?
I’m a huge fun of threat modeling. It’s a very powerful tool, that can find a lot of security issues. If you’re not familiar with it, check out my earlier post on the subject. For the past few years, I was struggling with one simple question: when should we conduct threat modeling? After all, threat modeling has a price – it takes time to conduct it, and usually involve a few peoples. We can’t conduct a full threat model for every feature – we need to find a way to identify the “interesting” features that require a threat model.
One very interesting solution to this hard problem was proposed by Izar Tarandach in this talk. In short, he proposes to tag features as “threat model worthy”, and once in a while go over all the features with this tag and review them. This is a really interesting approach, and I highly recommend you to watch the entire talk. However, from my experience, it’s not a silver bullet for this problem, and I want to propose an alternative approach.
Debugging iOS apps with Zaproxy
The other day I was debugging a really nasty bug that happens only in our iOS app. I was really frustrated because I couldn’t figure out why it happens. Everything looks good when debugging the iOS code, but for some reason – the server failed to deserialize the request body. I freaked out – nothing I tried seems to solve the issue. If only there was an easy way to view the actual request and response, maybe I could understand what the issue was…
This is where a proxy comes in handy: A proxy can inspect the traffic and print it an easy to understand manner. There are a lot of available proxies you can use (like Charles (commercial) or Fidler), but OWASP Zaproxy (Zap) is the best open source proxy that I know. Let’s see how easy it is to set it up:
Continue reading “Debugging iOS apps with Zaproxy”Monitoring Kubernetes HPA Utilization
In the past few weeks, I was working on migrating a legacy micro-service to Kubernetes platform. The migration process was relatively simple – mainly migrating the code from .NET 4.5 framework to .NET core 2.2. After making sure the service is deployed and working is expected, I started to gradually move production traffic to the new instance. The new service handle the traffic well, and I was happy – look like this task is about to complete!
After a few days of a gradual rollout, I felt good enough to move all the traffic to the new service. And then it hit me: will the new service be able to handle the load of production traffic? I mean, I configured a Horizontal Pod Autoscaler (HPA) for this service – but does it enough? Apparently – no. But before I’ll explain why, let’s do a quick recap on HPA.
Keeping Prometheus in Shape
Prometheus is a great monitoring tool. It can easily scrape all the services in your cluster dynamically, without any static configuration. For me, the move from manual metrics shipping to Prometheus was magical. But, like any other technology we’re using, Prometheus need special care an love. If not handled properly, it can easily get out of shape. Why does it happen? And how can we keep it in shape? Let’s first do a quick recap of how Prometheus works.
Prometheus Monitoring Model
Prometheus works differently from other monitoring systems – it uses pull over push model. The push model is simple: Just push metrics from your code directly to the monitoring system, for example – Graphite.
Pull model is fundamentally different – the service exposes metrics on a specific endpoint, and Prometheus scrapes them once in a while (the scrape interval – see here how to configure it). While there are reasons to prefer push over the pull model, it has its own challenges: Each metric scrape operation can take time; what happens if it the scrape take longer then the scrape interval?
For example, let’s say Prometheus is configured to scrape its targets (that’s how services are called in Prometheus language) once in 20 seconds; what will happen if one scrape takes more then 20 seconds? The result is out of order metrics: instead of having a data point every 20 seconds, it will be every time the scrape completed. What can we do?