If you’re like most security pros, chances are pretty good that you’re starting to get frustrated with microservices a little bit, or maybe a lot. Microservice architectures — that is, architectures that leverage REST to build a number of small, distributed, modular components — are powerful from a software architect’s point of view.
Want to make a change to a component quickly without bringing the whole application down, or want to add new functionality on the fly? Microservices foster these aims. Instead of having to rebuild a large monolithic application, you can modify (or add) particular services you’re interested in independently.
The downside of this, of course, is that it can be a nightmare from a security management point of view. There are a few reasons this is so. For the security architect, it’s challenging because one of our most effective tools — application threat modeling — relies on analyzing interactions between components from an attacker’s point of view.
Doing this presupposes communication channels that remain more-or-less constant over time. If developers are pushing updates every five minutes — and if pathways between services change — the threat model is valid only for that point in time. If you’ve ever tried to threat model (and keep current) a rapidly-evolving application that makes heavy use of microservices, you know exactly how frustrating this can be.
Catch the Wind
From an operations point of view, it’s challenging too. Under the hood, the most prevalent approach to microservice implementation is Docker with Kubernetes orchestration. This means that the containers actually running the services are designed to be ephemeral: New containers are added to accommodate load increases, and containers are redeployed to accommodate application changes or updated configurations.
To illustrate why this is challenging, let’s say you have an intrusion detection system alert, log entry, or suspicious activity from a few days ago. Which hosts/nodes exactly were involved, and what state were they in?
Trying to figure this out can be like trying to catch the wind: Those containers probably were overwritten and redeployed a few times over by the time you got there. Unless what transpired is crystal clear from the alert (and when is it ever?) your incident resolution is now dependent on reverse-engineering the state of a highly complex system from some time in the past.
Fortunately, one recent-ish technique that can help significantly with this is the service mesh architecture. Service mesh, as a design pattern, actually can provide great assistance to the security practitioner in a few ways. It’s powerful for developers, but equally — if not more — powerful for those of us in the security arena as well.
How Service Mesh Helps
What is a service mesh? One way to think about it is as a “traffic dispatcher” for your services. When one service wants to communicate with another, there are two options for how it might do so. Option one: It knows about every other service that exists and implements the logic to talk to it. Option two: It asks someone else to do the work.
Think about it like sending a letter. If I wanted to send a letter to my cousin in Kentucky, one option is I write the letter, get in my car, drive to his house and put it in his hands. This is dependent on a bunch of things: me knowing his address, having a car available and ready to go, figuring out how to get to his house, knowing about it if he moves, etc. It’s just not efficient.
A better option would be for me to write the letter, address it, and let the post office do the work. Let them maintain the necessary information and delivery apparatus so I can focus on what I really care about: my letter getting there.
Implementation-wise, there are a number of ways to do this, but the most common approach is via the “sidecar” container. What is a sidecar container? It’s just another container — a container running a proxy that is configured specifically to vector application traffic between services. That means it’s configured and deployed in such a way as to decouple the “delivery” of messages from the application logic.
From an application development point of view, the benefits should be fairly obvious: The developer can focus on business logic and not on the mechanics of “east-west” communication (that is, communication between services). From a security point of view though, there are also advantages.
Notably, it provides a hook for monitoring and other security services. This can be added without the need for adjustment to (or, in fact, even knowledge of) individual services’ application logic. So, for example, if I want to allow service A to talk only to service B using TLS and robust authentication, I can do that. Likewise, if I want to keep a record of what version of what container was talking to another one at a given point in time, I can configure it to tell me that.
Integration Considerations
If that sounds compelling to you, it should. In fact, it represents something that rarely occurs in the security world: It makes it the path of least resistance for developers to do things in a more secure way rather than a less secure way.
Developers find it compelling because they don’t have to sweat the details of the communication and delivery logistics for communication with other services. Further, it simultaneously adds security options that otherwise we’d have to enforce at the application layer.
So if your organization is considering microservices, a service mesh architecture actually can help your efforts to secure that environment. If you are using one already, having an understanding of what it is can help you get integrated into the conversation and give you tools to alleviate some of the microservice “pain points.”
The only caveat to this is that it does require a bit of prep work in learning the new toolset and adapting architectural tools to the new model. Whether you’re using Istio+Envoy, Linkerd, or something else, it first behooves you to read the docs to understand what features are available, how the toolset works, and what policy/configuration options are available to you. This is a good idea anyway, because it’s only a matter of time until you’ll need to validate that configuration.
Also, you’ll probably need to account for the new paradigm if you still intend to threat model your applications, which is always a good idea.
Specifically, it’s helpful to take a more logical view in your data flow analysis — perhaps by analyzing inputs and outputs of each service individually rather than assuming “Service A” will only ever talk to “Service B” (or, worse yet, assuming a static traffic flow between services based on what the application is doing at a given point in time).
The point is that security professionals not only should not be scared of service mesh, but also should consider the solid arguments for actively embracing it.