Solving Trust Issues at Scale

Microservices are social constructs: they can’t function without talking with other services. This also raises an interesting question: do we trust all of our microservices? Not all microservices are the same: some are more sensitive – for example, services that handle personal user data or payment information. Others are user-facing and therefore riskier. We shouldn’t treat all services as equal. A robust mechanism that describes who can talk with who is required. Let’s see how!

Adding Authentication

The first step is adding authentication – you can’t perform authorization without knowing who is talking with you. There are many popular ways to perform authentication, while the most popular are mTLS and JSON Web Token (JWT) – which I’m going to focus on. While implementing JWT authentication is not hard, re-implementing it over and over again in each microservice is hard and error-prone. Also, not all languages have good support for JWT(see the comparison here). This is where the sidecar design pattern comes handy – by offloading the authentication task to another container, running on the same pod. Envoy Proxy is a perfect fit for this task – it can do a lot of cool stuff related to ingress and egress traffic, including handle JWT authentication (and also mTLS!). All we need to do is adding a JWT authentication filter:
- name: envoy.filters.http.jwt_authn
  config: 
    providers:
      identity-server:
        issuer: http://oauth-server
        payload_in_metadata: jwt-metadata
        audiences:
        - api1
        remote_jwks:
          http_uri:
            uri: http://localhost/.well-known/openid-configuration/jwks
            cluster: identity-server
          cache_duration:
            seconds: 86400
    rules:
      - match:
          prefix: /stats/prometheus
      - match:
          prefix: /
        requires:
          provider_name: identity-server
This short code snippet configured all that we need to validate a JWT token: how to fetch the signing keys to validate the signature (remote_jwks), the required audience claim (audiences) and the issue (issuer). We can also specify which routes require authentication and which not (the rules section). Very powerful. And the best part – we can add it to any application we have, no matter which language it is written with!

Adding Authorization

Now that we have authentication, it’s time to add the next layer – authorization. Envoy can add a layer of authorization by offloading this task to another service (see the docs for more details). For each incoming request, envoy will call this service, and based on the service response it will either approve or deny the request. This is very nice, but not so helpful for us – we still need to implement the authorization logic! This is where we can use another open-source – Open Policy Agent (OPA). OPA is a policy system: you can write policies using DSL called rego, and use OPA to evaluate them and return a result (also called a decision). It might sound confusing, so let’s take a look at a short example:
package opa.demo.b

default allow=false

allow {
    input["candies"] < 5
}
This is a very simple policy, that lives in a package opa.demo.b. The value of the policy is based on a special variable – input that represent any input to the policy. The policy will return a different decision based on the input. For example, given the input candies=4, the decision will be false (4 < 5). Refer to the docs for a full example. We can use the same mechanism for building our authorization policies. Let’s take a look at a simple authorization policy. This time, the input will be – the incoming request. Our policies can look at various request properties (method, path, body, client ip, etc) and based on those properties decide if the request is allowed or not:
package common.service2service
import data.services
import input.attributes.request.http as http_request

default allow = false

jwt_payload = _value {
    verified_jwt := input.attributes.metadata_context.filter_metadata["envoy.filters.http.jwt_authn"]["fields"]["jwt-metadata"]
    _value := {
        "client_id": verified_jwt["Kind"]["StructValue"]["fields"]["client_id"]["Kind"]["StringValue"]
    }
}

allow {
    jwt_payload.client_id == 'service-a'
    http_request.path == 'api/v1/pii'
    http_request.method == 'GET'
}
The policy is the last 5 lines of this snippet – the request will be allowed if and only if it matches those properties. Any other request will be denied. This gives us the ability to build a least privileges policy, granting only the required access for each consumer. For more details about OPA integration with Envoy, please refer to the docs.

Scaling Our Policies

While OPA is a very powerful tool, it has one downside – Rego is just another thing devs need to know and master. If we want to scale our authorization system, we need to create an abstraction, that will make the adoption easier. This can be done easily by using OPA data files. A data file is simply a JSON or YAML file, representing a data model that can be consumed by our policies. For example, this data file contains an array of all the clients that allowed to access service:
{
    "services": 
    [
        {
            "client_id": "client",
            "allowed_routes": [
                {
                    "path": "/api/v1/sensitive",
                    "method": "GET"
                }
            ]
        }
    ]
}
It’s the same data that was represented before in our policies, just this time it’s modeled in JSON instead of Rego. JSON is more familiar to devs, and can have a scheme – making it a lot easier to use. Now, we can author generic policies that consume this data file:
package common.service2service
import data.services
import input.attributes.request.http as http_request

default allow = false

jwt_payload = _value {
    verified_jwt := input.attributes.metadata_context.filter_metadata["envoy.filters.http.jwt_authn"]["fields"]["jwt-metadata"]
    _value := {
        "client_id": verified_jwt["Kind"]["StructValue"]["fields"]["client_id"]["Kind"]["StringValue"]
    }
}

allow {
    jwt_payload.client_id == services[i].client_id
    http_request.path == services[i].allowed_routes[j].path
    http_request.method == services[i].allowed_routes[j].method
}
It’s almost the same policy as before, just this time there is no hardcoded values – the policies consume the data file and used it to evaluate the decision (see the import data.services – this is how we import the data file!).

Putting it All Together

So now we have a single, generic policy (maintained by our AppSec team). Each service has its own data file, representing who can access this service. A permission request is just a PR to the relevant service data file – a change that is very easy to review (and maybe, in a later phase, even add some static analysis on top of it). The only part left is how our service can consume our policies – and this is where another OPA feature comes handy – the bundle APIs. Policies can be archived into policy bundles, and those bundles can be uploaded into bundle services (for example, S3). OPA supports hot bundle loading – so changes to our policies can be applied without restarting our service. And now we have a complete authorization system. To run it locally, clone the demo repository and follow the readme. The demo has all the components I described here, and you can play with them to have a better feeling of the system. You can also use it as an implementation reference if you want to build something similar in production.

Wrapping Up

Authorization is crucial for building a really secure system. But it’s as important to build an authorization system that devs will not hate (that much :)). In this post, I shared the approach we did, leveraging GitOps for authorization requests. A simple flow looks like the following:
  • Dev opens a PR, with changes to the relevant service data file.
  • The PR is reviewed by the service owner (AppSec team – you might also want to review sensitive services.)
  • Once the PR is approved and merged to muster, the policies are bundled and uploaded to the server. OPA will load the new policies and the changes are live in production – without restarting the service.
I’m looking forward to hearing your thoughts about this approach. Did you find it interesting? Do you want to build something similar? Please don’t hesitate to reach out! In case you find it interesting, consider viewing my talk from DevOps Days Tel Aviv 2019 – which covers it in depth.

Leave a Reply

Your email address will not be published.