In this post we will be covering eBPF concepts, as well as challenges faced when applying network policies for microservices and how these challenges can be tackled. Finally, we will have a look at Cilium to see how it makes eBPF simple and easy to utilize.

Introduction

In this section we are going to answer some important questions as:

What is eBPF

Before talking about extended Berkely Packet Filters (eBPF), lets remove the ‘e’ for a second and talk about BPF.

BPF was first introduced in 1992 by Steven McCanne and Van Jacobson as an in-kernel virtual machine that translates byte-code to the underlying machine-architecture and provides packet filtering using a simple instruction set on Free-BSD and Linux. Before the introduction of BPF, packet filtering APIs such as Sun’s STREAMS NIT, DEC’s Ultrix Packet Filter, SGI’s Snoop and Xerox Alto had CMU/Stanford Packet Filter had to copy packets from the kernel to the userspace to be filtered, needless to say, it is not a the most optimal way to do line-rate packet filtering especially when traffic peaks. Also, copying all packets into userspace is expensive.

To filter packets with BPF, fitler-expressions (e.g., “ip or tcp and port 80”) were defined, then parsed to produce byte-code. The byte code would then be attached to a tap interface and is then injected into the kernel in the form of native instructions after being checked by a verifier (makes sure that the program terminates and is safe to execute).

Basically, if you have used tcpdump then you have made use of BPF, the filter that you specify will be compiled to BPF byte code, then inserted into the kernel showing only the packets that applies to that filter.

Affan Syed

Now, let’s put the ‘e’ back on, so what is eBPF?

With BPF, we are pretty much limited to packet fitlering and monitoring, therefore BPF was extended to do much more (e.g., parsing, lookup, update, modification), hence the name extended BPF (eBPF). Below are some of the features that were introduced with eBPF:

  int bpf(int cmd, union bpf_attr *attr, unsigned int size)

A feature-based comparison can be found in the table below:

bpf_vs_ebpf.PNG

The bpf() system call

When the eBPF functionality was added as of version 3.18 of the kernel, two features were the key enablers:

The bpf system call can be showed below:

// From the macro expansion of the following code:
// int bpf(int cmd, union bpf_attr *attr, unsigned int size);

int bpf(int cmd, union bpf_attr *attr, unsigned int size);

Lets look at these paramters in details

// From https://github.com/torvalds/linux/blob/v4.11/include/uapi/linux/bpf.h#L73
enum bpf_cmd {
    BPF_MAP_CREATE,
    BPF_MAP_LOOKUP_ELEM,
    BPF_MAP_UPDATE_ELEM,
    BPF_MAP_DELETE_ELEM,
    BPF_MAP_GET_NEXT_KEY,
    BPF_PROG_LOAD,
    BPF_OBJ_PIN,
    BPF_OBJ_GET,
    BPF_PROG_ATTACH,
    BPF_PROG_DETACH,
};
union bpf_attr {
	struct { /* anonymous struct used by BPF_MAP_CREATE command */
		__u32	map_type;	/* one of enum bpf_map_type */
		__u32	key_size;	/* size of key in bytes */
		__u32	value_size;	/* size of value in bytes */
		__u32	max_entries;	/* max number of entries in a map */
		__u32	map_flags;	/* prealloc or not */
	};

	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
		__u32		map_fd;
		__aligned_u64	key;
		union {
			__aligned_u64 value;
			__aligned_u64 next_key;
		};
		__u64		flags;
	};

	struct { /* anonymous struct used by BPF_PROG_LOAD command */
		__u32		prog_type;	/* one of enum bpf_prog_type */
		__u32		insn_cnt;
		__aligned_u64	insns;
		__aligned_u64	license;
		__u32		log_level;	/* verbosity level of verifier */
		__u32		log_size;	/* size of user buffer */
		__aligned_u64	log_buf;	/* user supplied buffer */
		__u32		kern_version;	/* checked when prog_type=kprobe */
	};

	struct { /* anonymous struct used by BPF_OBJ_* commands */
		__aligned_u64	pathname;
		__u32		bpf_fd;
	};

	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
		__u32		target_fd;	/* container object to attach to */
		__u32		attach_bpf_fd;	/* eBPF program to attach */
		__u32		attach_type;
		__u32		attach_flags;
	};
} __attribute__((aligned(8)));
enum bpf_prog_type {
	BPF_PROG_TYPE_UNSPEC,
	BPF_PROG_TYPE_SOCKET_FILTER,
	BPF_PROG_TYPE_KPROBE,
	BPF_PROG_TYPE_SCHED_CLS,
	BPF_PROG_TYPE_SCHED_ACT,
	BPF_PROG_TYPE_TRACEPOINT,
	BPF_PROG_TYPE_XDP,
	BPF_PROG_TYPE_PERF_EVENT,
	BPF_PROG_TYPE_CGROUP_SKB,
	BPF_PROG_TYPE_CGROUP_SOCK,
	BPF_PROG_TYPE_LWT_IN,
	BPF_PROG_TYPE_LWT_OUT,
	BPF_PROG_TYPE_LWT_XMIT,
};

Important use-cases for eBPF

eBPF enables important use-cases for different themes:

How to create an eBPF program?

There are two ways to write an eBFP program:

After that the byte code is verified, compiled with a JIT compiler to the CPU arhitecture at hand, and finally injected at the desired hook point (e.g., at the traffic control layer). This way policies on packets can be enforced even before being processed the network stack.

Microservices

Now that we understand how eBPF works, lets look into some constructs we will need for having the necessary knoweldge building blocks. In this section, we are going to:

What are Microservices and how do they work?

If you have been following the posts on my blog, I have previously discussed the definition of microservices and how you can decompose a monoltih into microservices.

But to recap, a microservice is an architecture style for building applications so that they are more loosely coupled, easy to modify, test, and integrate with other microservices. The separation of concerns gives you additional felxibility to scale,test, and deploy your appliations independently from one another.

Usually, to successfully apply the microservice model of developement, each service should do one thing and do it well. Furthermore, most of the microservices nowadays are based on HTTP, whereas each service interfaces with the outside world by exposing a path to the service it offers. For example, consider the below figure, where we have three microservices, each exposes a different path.

Since each microservice exposes a different path, and maybe even a different port, the question arises as to how can we apply security polocies to such services.

Security policies with IPTables

There are two methods to apply security policies with IPTables:

If we are talking about microservices and how many they can be on single host, we can safely assume that its not going to be a walk in the park setting these rules up on a per-container level, needless to say, it is not so easy to scale. Therefore, in a container run-time such as docker, this is how security policies and NAT rules are applied. Alternatively, with OpenStack for example, network namespaces are used to carry out the routing functionality and therefore carry its own IPTable rules configuration.

Now that we know how IPTables rules are applied, there are two things that are yet left to discuss.

What happens when IPTables are used in a microservice architecure?

To answer the first question, lets revisit a sample microservice application architecture through the figure below:

With IPTables, we can do things such as accepting or dropping packets/segments based on their L3/L4 parameters.

Cillium Docker Talk

What if we want to apply security policies based on the application level verbs (e.g., with REST GET/PUT/POST/DELETE) or RPC Paths (/service1, /service2, /service3) for each microservice? With IPTables, we can apply policies for instance, only to allow packets with destination port of 8080, but not application level policies on a per-path level.

How efficient it is to use IPTables for microservices? To answer this questions, lets look at what happens when a packet is sent from a container that represents a microservice.

Each container (network namespace) has its own stack. Therefore, the steps that will be followed are:

In terms of hops, we need 5 hops of calls to reach to a decision on whether that particular packet should be forwarded which is not the most optimum scenario. Can we do better? Let’s have a look at eBPF policies.

Security Policies using eBPF

So IPTables are not the best solution for applying policies to microservices as they are limited to L3/4 policies, furthermore, they need whole packet construction and forwarding to reach to a decision.

With eBPF we really don’t need to go all the way in constructing the packet. eBPF programs hook into the kernel, which makes it possible to apply policies on the system call level before even entering the stack or constructing a packet. eBPF attaches to the container network namespace, as a result, all the calls are intercepted and filtered on the spot.

DockerCon Cillium

eBPF also has a feature called maps which allows infromation sharing between different BPF programs or between userspace and kernel-space. One application for such feature is sharing state between containers and proxy services.

DockerCon Cillium

So far we have been talking about the technologies but we have not really discussed the tools or frameworks that unlocks the ease-of-use for configuring such policies (you guessed right, I am talking about Cillium).

Cilium

So far so good, lets review what we have in our knowledge pocket so far.

Cillium Important Concepts

Since the documentation on the main page of the Cillium project is quite good,I am not going to repeat whats there, I will rather summarize the important bits, you can have a look yourself here.

Copied from the documentation, the Cilium definition is:

Cilium is open source software for transparently securing the network connectivity between application services deployed using Linux container management platforms like Docker and Kubernetes.

What we are really interested in though is how Cilium applies security policies, but before that lets have a look at two important constructs.

Important Cilium Constructs

Cilium Policies

Cilium can enforce security policies in different ways:

Demo

Please find below a scripted demo, which follows the instruction on the project documentation page to show how policies can be applied with Cilium along with Docker.

Important References