What Is a Data Processing Unit (DPU)?

A data processing unit (DPU) is a specialized processor designed to offload and accelerate data-centric tasks, freeing up central processing units (CPUs) to process application-specific workloads. Designed to process high speed networking, storage requests, and security processing, DPUs are suited for modern high-density data centers and high-performance computing (HPC) demands.

Understanding DPUs in depth

DPUs and their counterparts, infrastructure processing units (IPUs), answer a requirement to offload common and throughput-intensive tasks from CPUs. Reducing encryption tasks, storage I/O operations, and high-bandwidth networking packet processing allows CPUs to target higher density application tasks that container-based applications, cloud or hypervisor partitioning, and artificial intelligence (AI) compute intensive tasks require.

Several key functions include:

  • Networking: Building on features found in smart network interface cards (SmartNICs), DPUs can process large volume packet flows at near line-rate speeds, providing network overlay offloading including VXLAN, and offer security and application delivery controller services like firewall support, TLS offload, load balancing, and traffic routing. Further, DPUs can provide entropy sources for cryptographically secure pseudo-random number generators (CSPRNGs).
  • Storage: Accelerating data transfer between hosts and storage, DPUs support advanced storage protocols like non-volatile memory express (NVMe) over fabrics (oF) protocols, providing increased speeds required for solid-state storage and hyperconverged infrastructure (HCI). DPUs also provide encrypt/decrypt support, compression, and deduplication processing, further reducing loads on CPUs.
  • Virtualization: Offloading hypervisor and container-based networking tasks improves the ability to partition and tenant HCI infrastructure for higher workload densities, providing increased ROI on infrastructure.

Benefits of using DPUs

Optimizing CPU performance for application-specific tasks in HCI and HPC environments is increasingly important as compute density and power usage become new metrics for infrastructure cost benefits. Advances in networking speeds and latency reduction, storage performance, and the need to provide compute resources to more users further tax the non-application specific tasks required of CPUs. The currently accepted measurements of success, adopted from the HPC industry, are defined by CPU density and performance.

The ratios of processing power datapoints include but are not limited to:

  • CPU core count (by rack, node, or total available to users)
  • Floating point operations per second (FLOPS)1
  • Power consumption (measured in average and peak kilowatts)
  • Physical space measurement (measured in square feet or meters)

Long used by HPCs to measure supercomputer performance at launch and over time, these measurements are increasingly being applied to traditional data centers as the technology between the two industries continues to converge.

DPUs provide a way to increase the CPU availability for application and compute intensive pipelines, which can bottleneck if the CPU is required to handle lower-level non-compute tasks. These tasks are compounded when densities and application tasks are increased, so DPUs provide a way to alleviate this bottleneck. By adding DPUs to data center infrastructure, CPUs are freed up to provide better per-core performance. Alternately, compute resources can be partitioned and tenanted to allow more users access to system resources.

How does F5 work with DPUs?

Building on its success utilizing SmartNICs, ASIC, and FPGA technologies, F5 takes advantage of the processing and inline traffic location of a DPU within the compute infrastructure to increase and improve the workload capacity, performance, and security of HCI/HPC infrastructures.

Taking advantage of NVIDIA Bluefield-3 DPUs, F5 offers multiple benefits for service providers and large enterprises looking to build out large-scale compute resources while maximizing compute resources. These include but are not limited to:

  • Simplified integration: F5 combines networking, security, traffic management, and load balancing into a consolidated suite of services focused for DPU integrations. It offers an integrated view of these services across HCI/HPC infrastructure, along with the observability and granular control needed to optimize compute-intensive workloads.
  • Enhanced security: F5 supports critical security features including firewall, distributed denial-of-service (DDoS) mitigation, API protection, encryption, and certificate management by offloading these functions to the DPU.
  • Improved performance: F5 accelerates networking and security, which is critical to meeting the demands required in high-density infrastructure to deliver applications at cloud scale.
  • Multi-tenancy support: F5 enables scalable multi-tenant architecture, allowing service providers to safely host multitudes of tenants on the same infrastructure while keeping their workloads and data separate and secure.

For more information on DPU and F5 integrated solutions please click on the resources on the right.


1Standard measurements for scientific-based HPC measurements traditionally consisted of single or double precision floating point precision (FP32 and FP64). Current AI trends now measure performance at half or lower (FP16). The use of smaller precision memory addressing (floating point and integer data types) allows for faster training and smaller memory footprints of language models.