Add Simplicity, Security, and Performance to AI Inference with F5, Intel, and Dell

F5 Ecosystem | May 21, 2024

Organizations seek to build new apps and workflows powered by AI. But operating them successfully can be tricky. Multiple AI frameworks and app environments create complexity for developers and security teams. They need a solution that makes rapid inference easier to build, run, and secure.

Simplify AI development and security

Intel’s OpenVINO™ toolkit is an open source toolkit that accelerates AI inference while offering a smaller footprint and a write-once, deploy-anywhere approach. It helps developers create scalable and efficient AI solutions with relatively few lines of code. Developers can use AI models trained using popular frameworks such as TensorFlow, PyTorch, ONNX, and more. With OpenVINO, developers first convert and can further optimize and compress models for faster responses. Now, the AI model is ready to be deployed by embedding the OpenVINO runtime into their application to make it AI capable. Developers can deploy their AI-infused application via a lightweight container in a data center, in the cloud, or at the edge on a variety of hardware architectures.

A developer may not want to host the model with the application or embedded in the application. The application’s model may need to be updated from time to time, and the application may need to run multiple models to deliver the features the application provides. OpenVINO has a solution with the OpenVINO model server, a software-defined high-performance system for serving models in a client-server architecture. Benefits of the OpenVINO model server include:

Ease of Deployment: With its containerized architecture using Docker, deploying models with OpenVINO model server becomes more straightforward and scalable. It abstracts away the complexities of hardware configuration and dependencies.
Scalability: OpenVINO model server can be deployed in a clustered environment to handle high inference loads and scale horizontally as needed. This scalability ensures that inference performance remains consistent even under heavy workloads.
Remote Inference: OpenVINO model server supports remote inference, enabling clients to perform inference on models deployed on remote servers. This feature is useful for distributed applications or scenarios where inference needs to be performed on powerful servers while the client device has limited resources.
Monitoring and Management: OpenVINO model server provides monitoring and management capabilities, allowing administrators to track inference performance, resource utilization, and manage deployed models effectively.

OpenVINO simplifies the optimization, deployment, and scale of AI models, but to run in production, they also need security. F5 NGINX Plus works as a reverse proxy, offering traffic management and protection for AI model servers. With high-availability configurations and active health checks, NGINX Plus can ensure requests from apps, workflows, or users reach an operational OpenVINO model server. It also enables the use of HTTPS and mTLS certificates to encrypt communications between the user application and model server without slowing performance.

When deployed on the same host server or virtual machine, NGINX Plus filters incoming traffic and monitors the health of the upstream containers. It also offers content caching to speed performance and reduce work for the model server. This combination provides efficient security, but NGINX Plus and the OpenVINO model servers may need to compete for resources when deployed on a single CPU. This can result in slowdowns or performance degradation.

Accelerate AI model performance

Because infrastructure services such as virtual switching, security, and storage can consume a significant number of CPU cycles, Intel developed the Intel® Infrastructure Processing Unit (Intel® IPU) that frees up CPU cores for improved application performance. Intel IPUs are programmable network devices that intelligently manage system-level resources by securely accelerating networking and storage infrastructure functions in a data center. They are compatible with the Dell PowerEdge R760 Server with Intel® Xeon® processors for performance and versatility for compute-intensive workloads. Integration with the Dell iDRAC integrated management controller provides closed-loop thermal control of the IPU.

Using an Intel IPU with a Dell PowerEdge R760 rack server can increase performance for both OpenVINO model servers and F5 NGINX Plus. Running NGINX Plus on the Intel IPU provides performance and scalability thanks to the Intel IPU’s hardware accelerators. This combination also leaves CPU resources available for the AI model servers.

Integrating an Intel IPU with NGINX Plus creates a security air gap between NGINX Plus and the OpenVINO model servers. This extra layer of security protects against potential shared vulnerabilities to help safeguard sensitive data in the AI model.

Power AI at the edge

The combined solution from F5, Intel, and Dell makes it easier to support AI inference at the edge. With NGINX Plus on the Intel IPU, responses are faster and more reliable in supporting edge applications such as video analytics and IoT.

The solution also works for content delivery networks with optimized caching and content delivery, as well as providing support for distributed microservices deployments that need reliability across environments.

Accelerate AI Security and Performance with F5, Intel, and Dell

Power high-performance AI inference anywhere securely and consistently with a combined hardware and software solution. Easily deploy AI inference to data centers, clouds, or edge sites while maintaining availability and performance to support users and AI-powered apps.

Learn more about the F5 and Intel partnership at f5.com/intel.

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Tags: Office of the CTO, 2024, AI Gateway, Generative AI, Application Delivery

About the Author

Kunal AnandChief Product Officer | F5

Kunal Anand leads the F5 product organization as Chief Product Officer. Responsible for product vision, strategy, and execution, he ensures development of breakthrough solutions that solve critical challenges and create exceptional experiences for customers. In his previous role as Chief Technology and AI Officer, Kunal charted the company’s technology and AI strategy and vision. Prior to F5, Kunal held the dual role of Chief Technology Officer and Chief Information Security Officer at Imperva. His journey to Imperva began in 2018 with the acquisition of Prevoty, an application security startup he co-founded in 2013. Before joining Prevoty, he was the Director of Technology at BBC Worldwide. Kunal has a deep history of innovation and technical expertise, and has held roles leading security, data, technology, and engineering teams at Gravity, MySpace, and the NASA Jet Propulsion Lab. Kunal has over 15 years of experience in AI and machine learning, ranging from model training, employing AI-driven algorithms to enhance products, and designing and implementing AI architectures. Kunal holds a Bachelor of Science degree in computer science from Babson College.

More blogs by Kunal Anand

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Related Blog Posts

F5 Ecosystem | 11/24/2025

Multicloud chaos ends at the Equinix Edge with F5 Distributed Cloud CE

Simplify multicloud security with Equinix and F5 Distributed Cloud CE. Centralize your perimeter, reduce costs, and enhance performance with edge-driven WAAP.

API,

F5 Ecosystem | 10/22/2024

At the Intersection of Operational Data and Generative AI

Help your organization understand the impact of generative AI (GenAI) on its operational data practices, and learn how to better align GenAI technology adoption timelines with existing budgets, practices, and cultures.

F5 Ecosystem | 12/19/2022

Using AI for IT Automation Security

Learn how artificial intelligence and machine learning aid in mitigating cybersecurity threats to your IT automation processes.

Office of the CTO,

2022

F5 Ecosystem | 02/24/2022

Most Exciting Tech Trend in 2022: IT/OT Convergence

The line between operation and digital systems continues to blur as homes and businesses increase their reliance on connected devices, accelerating the convergence of IT and OT. While this trend of integration brings excitement, it also presents its own challenges and concerns to be considered.

Office of the CTO,

2022

F5 Ecosystem | 10/05/2020

Adaptive Applications are Data-Driven

There's a big difference between knowing something's wrong and knowing what to do about it. Only after monitoring the right elements can we discern the health of a user experience, deriving from the analysis of those measurements the relationships and patterns that can be inferred. Ultimately, the automation that will give rise to truly adaptive applications is based on measurements and our understanding of them.

2020,

Office of the CTO

F5 Ecosystem | 12/23/2019

Inserting App Services into Shifting App Architectures

Application architectures have evolved several times since the early days of computing, and it is no longer optimal to rely solely on a single, known data path to insert application services. Furthermore, because many of the emerging data paths are not as suitable for a proxy-based platform, we must look to the other potential points of insertion possible to scale and secure modern applications.

2019,

Office of the CTO