How to achieve predictable scaling and fault isolation in a hybrid environment
Service providers are in a race to deliver new 5G services. Some of them have already launched fixed mobile services and 5G-capable hotspots, and will soon deliver edge computing to support autonomous vehicles, virtual reality, augmented reality, artificial intelligence, and more. The new architecture needed to deliver these new services is expensive, requiring investments in radio access, edge computing devices, and expanded network performance capabilities.
Many service providers are leveraging Network Functions Virtualization (NFV) to deliver on their 5G strategies. NFV helps them increase flexibility and agility, increase service velocity, scale faster to meet demands, and reduce total cost of ownership. NFV decouples network hardware and software, allowing the transition from high-cost, purpose-built hardware to lower-cost, commercial, off-the-shelf (COTS) servers.
While NFV provides many benefits, distributing functionality across multiple servers and nodes often creates scaling challenges. Passing traffic through the hypervisor layer can increase latency while reducing network throughput just as service providers need to maximize throughput for critical 5G network functionality to meet service demands.
Adding more elements and scaling across nodes (horizontal scaling) can be efficient. However, another scaling challenge is posed by the network traffic needing to be stateful. Service provider traffic usually requires some degree of state (subscriber information which enables policy management and billing, for example). This makes horizontal scaling more difficult.
Compute node consolidation brings multiple functional elements (usually virtual machines) together in a common compute node. This shortens the path between functions because the network traffic doesn’t need to pass through multiple hypervisors and servers. This approach has a number of advantages—it reduces latency, enables higher throughput between elements, and allows predictable scaling for services. When higher throughput is needed, additional nodes are added until network demands are met. Because each node provides a fixed set of services with a known maximum performance level, it is easy for the provider to determine the number of additional nodes required to meet the new demand. This optimizes performance on the N6/SGi LAN, with Evolved Packet Core Gateways, and for Multi-access Edge Computing (MEC) elements, where high performance is required.
One common compute node consolidation scenario puts security, TCP optimization, and policy enforcement onto a single compute node or virtual machine offering improved throughput.
Compute node consolidation helps simplify policy enforcement, billing, and delivers predictable operational and scaling performance.
A key benefit of compute node consolidation is that it provides predictable failure paths for services. If the compute node fails, it fails in a way that’s predictable, understood, and easy to identify since each node carries an identical set of services as all other nodes of the same type. This simplifies fault isolation and speeds system recovery. Moreover, if the services on a collocated node are well designed, the failure of any element will trigger a failure of the node. This prevents elements on a single node from sending traffic across nodes, which would reduce throughput and performance predictability.
In certain cases, a wholly virtualized, software-only approach simply cannot provide sufficient processing performance. However, there are several hardware-assisted ways to increase compute node performance for a specific purpose. Single Root Input/Output Virtualization (SR-IOV) is a specification that allows the isolation of the PCI Express resources to improve performance. A single physical PCI Express function or device can be shared on a virtual environment using the SR-IOV specification. This direct hardware access from a compute guest greatly improves performance–particularly for network interfaces–by reducing data copies through the hypervisor layer. F5 offers optimized drivers leveraging SR-IOV for Intel and Mellanox high performance NICs (including 100G NICs) that enable near line rate throughput.
In addition, compute providers have added chips and cards to handle functions that can be accelerated by specialized compute facilities. An excellent example of this is Intel’s Quick Assist Technology (QAT) which provides dedicated hardware in the PCH chipset. Intel QAT hardware performs crypto processing and compression while reducing the load on the host CPU. Together, these improve TLS performance—especially during the TLS setup.
Another way to increase performance is to offload processing functionality to SmartNICs. A SmartNIC is a network interface that also contains a Field Programmable Gate Array (FPGA); that is, a programmable subsystem. While the processing and memory facilities for this subsystem are more constrained than those of the general compute node, they can be programmed to handle simpler functions at much higher performance. For example, layer 3/layer 4 security ACLs can very profitably be handled by SmartNIC FPGAs. F5 is planning to introduce support for SmartNIC acceleration in the future, leveraging our IP in hardware offload and acceleration.
F5 has the hardware and software expertise to help service providers efficiently scale their networks to meet the demands of 5G. Our technical and service delivery teams have collaborated with service providers for years helping them to deliver high-throughput, high-bandwidth, and low-latency solutions, as well as best-in-class 5G security hardware and software. Our experienced engineering and sales professionals can partner with your technology teams to develop solutions for your specific network and market needs.
For more information visit: https://www.f5.com/solutions/service-providers/5g
To learn more about virtual infrastructure scaling, watch the webinar: https://interact.f5.com/5gwebinar-amer-5g_lp1.html