BLOG | OFFICE OF THE CTO

eBPF: It's All About Observability

James Hendergart Miniatura
James Hendergart
Published June 14, 2022


Every once in a while, a cool technology reaches an inflection point where real business needs intersect with that technology's strong points. This makes the technology not only plausible but practical. Berkeley Packet Filter (BPF) recently reached this point among Linux developers. BPF is an extremely efficient intercept and processing point on a Linux host that promises to expand to Windows servers sooner than later. The range of available data is vast, directly adding to full stack visibility for Site Reliability Engineering (SRE) operations tasks. It also naturally aligns with solving challenges related to security and traffic management. The range of hooks is equally vast, providing an attractive array of useful trigger points for BPF applications that appeal to observability, security, and network specialists. BPF is the right technology to enable observability without breaking the bank. And it all starts with observability.

The fundamental design of BPF makes it just about as efficient a method for compute work as possible ($ per watt). Even better, the toolchains produce the bytecode for you, so you can focus on your desired result and not low level assembly language-type programming. How? These two design characteristics cause BPF to shine:

Instruction set design

The software design of BPF was purposely modeled after that of the emerging modern CPU architectures. In fact, processor terminology is used because it accurately describes BPF elements and use. BPF has registers and instructions. They are designed for direct consumption by CPUs. BPF, based on the BSD Packet Filter design (1992), is a redesigned packet capture filter machine better suited to current day register-based CPU architectures. This design received a natural boost in 2014 when “Enhanced Berkeley Packet Filter” or eBPF was released in v3.18 of the Linux kernel. eBPF was an important distinction from Classic Berkeley Packet Filter (cBPF) in the early days after its release. Today the distinction is less crucial given all supported versions of the kernel contain the 2014 enhancements. They are worth noting: Wider registers (from 32-bit to 64-bit means more work gets done per cache line/clock cycle), more registers (from 2 to 10 means more 1-to-1 mapping between modern CPU registers and kernel ABIs), and a handful of additional instruction set enhancements that make BPF programs safer and more useful.

Link Layer

BPF consists of a network tap and a packet filter. The tap operates at the data link layer as packets come off the wire to a given network interface. It copies packets for the filter to interrogate. This insertion point gives it full visibility into the network path of the Linux host. For ingress traffic this means before the host processor starts working on it, and for egress traffic it means just before it hits the wire to leave the host. This diagram shows BPF intersecting the ingress packet path at the Data Link Layer.

Ingresspath

The range of available data is vast, directly adding to full stack visibility for SRE operations tasks. In this example, we focus on some of the most commonly used fields from the IPv4 header.

Ingresspath

Using this data, policies can be defined to customize packet filtering. The following policy filters for TCP packets destined for a specific destination IP address.

Ingresspath

Leveraging eBPF for observability has a BOGO (buy one, get one) benefit: The observed data can be used for other purposes beyond observing. For example, traffic routing or security.

Referring back to the data set we shared earlier, pieces of that data set used for observing the traffic ingress path of a Linux host are useful for other things. For example, source IP, destination IP, destination host, and port are useful for both ingress traffic routing and also for limiting access.

Regardless of use, everything starts with the packet copy by the BPF tap. Once the copy is taken, data can be put into memory (see more on BPF Maps) and then exported as a telemetry stream as well as leveraged simultaneously by other BPF programs that specify policy and filtering actions. This is where the branching from observability to traffic management and security occurs. To capitalize on the extreme efficiency of BPF a clear picture of what data is needed and how that data is used is the starting point. In the next post in this series, my colleague, Muhammad Waseem Sarwar, will explore options for BPF programming at various locations in the Linux network stack.