Imagine a world where AI inference tasks not only run faster but also more securely, with minimal effort required for setup and maintenance. Sounds too good to be true? The latest Intel OpenVINO-based solution makes this a reality by integrating Intel’s E2100 “Dayton Peak” Infrastructure Processing Unit (IPU), F5 NGINX Plus, and Red Hat OpenShift. It’s a groundbreaking configuration designed for developers and enterprises looking to scale AI workloads securely and efficiently while streamlining installation and operation.
Let me take you on a deep dive into how all the pieces of this puzzle come together and why this integration is a game-changer for AI inference security and scalability.
At its heart, this setup is built to supercharge AI inference by offloading critical infrastructure tasks to the Intel IPU. This allows the host system, running the Intel OpenVINO inference server, to dedicate its resources to what really matters: delivering fast and accurate inferencing results. Paired with F5 NGINX Plus and Red Hat’s Enterprise Linux OS, OpenShift, and MicroShift, the system achieves a unique balance of performance, scalability, and security.
Here’s the core workflow: encrypted traffic flows from the AI client to NGINX Plus, which is deployed directly on the Intel IPU. NGINX Plus acts as a traffic proxy, decrypting data and securely routing it across the PCIe bus to the Intel OpenVINO inference servers hosted on the Dell R760 system. Results are then sent back through NGINX Plus for delivery to the AI client.
While the workflow itself is compelling, the architectural advantages add even more value. By shifting infrastructure tasks to the IPU, the solution delivers both performance benefits and a clear division of responsibilities for administrators.
One crucial benefit of deploying NGINX Plus on the Intel IPU is the offloading of infrastructure tasks from the host system’s CPU. Things like traffic routing, decryption, and access control—which can be resource-intensive—are handled entirely on the IPU. This means the host CPU has significantly more cycles available to focus on application-specific workloads, like running additional Intel OpenVINO inferencing models or handling resource-heavy AI processes.
In real-world scenarios, this translates to better utilization of your expensive, high-performance server hardware. Instead of being weighed down by background infrastructure tasks, the host CPU can operate at full capacity for the workloads you care about most.
Another unique benefit of the solution is the separation of infrastructure services and application workloads. By running all infrastructure tasks—like NGINX Plus, network management, and access control—on the Intel IPU while keeping the Intel OpenVINO inference server on the host, we’ve created a clear “bright-line” division between control plane responsibilities.
The Intel OpenVINO application administrator is responsible for managing inferencing workloads, deploying and scaling AI models, and optimizing application-level performance. While the infrastructure admin oversees the Intel IPU environment, manages routing, enforces access control (via FXP rules), and ensures infrastructure services operate securely and efficiently by configuring the NGINX Plus instance.
This separation of duties eliminates ambiguity, strengthens organizational collaboration, and ensures that each admin can focus squarely on their respective domain of expertise.
Together, these benefits make this solution not just practical but also efficient for scaling enterprise AI workflows while keeping resource utilization and security top-notch.
One of the standout aspects of this system is how it leverages Red Hat MicroShift and OpenShift DPU Operators to make configuration and scaling practically effortless. Honestly, this kind of automation feels like magic when you see it in action. Let me break it down:
There are two clusters. There’s the OpenShift cluster that runs on the host system. Specifically, this is an OpenShift worker node, and it runs on the Dell R760. The second cluster is a MicroShift cluster. It's deployed on the arm cores of the Intel IPU. This lightweight version of OpenShift provides the flexibility of containers without the overhead of a full Kubernetes environment.
These clusters work together through DPU operators, which do the behind-the-scenes heavy lifting. They talk to each other, exchanging data about active pods and networks. This connection is particularly important for dynamically managing security and traffic rules.
Here’s the part that really makes life easier for developers: dynamic rule creation. Previously, setting up FXP rules (used to manage access control for PCIe traffic) required manual effort and knowledge of P4 programming. Now, all you must do is deploy your workloads, and the operators handle everything automatically:
The operator creates new FXP rules dynamically whenever Appropriately tagged OpenVINO inference pods are deployed. These FXP rules allow communication across the PCIe bus, and as workloads scale up or down the system automatically adjusts these access rules, taking the guesswork out of the configuration.
This level of automation means anyone—from developers to sysadmins—can focus on AI workloads without getting stuck in the weeds of infrastructure configuration.
Now let’s get into the meat of how this whole system operates for AI inferencing. Let’s take the example of recognizing animal species in images using the Intel OpenVINO deep learning deployment toolkit. Here's what the workflow looks like step by step
First, encrypted image data is sent from an AI client via a GRPCS API call. NGINX Plus, running on the Intel IPU, decrypts the data and acts as a traffic proxy. This traffic then flows securely across the PCIe bus to the Intel OpenVINO inference servers hosted on the Dell R760. The Intel OpenVINO inference servers process the images using the ResNet AI model to determine the species in each picture. For example, it might infer, “This is a golden retriever” or “That’s a tabby cat.” The results are sent back via the same pathway—through NGINX Plus and onward to the client.
The system can be set up to handle multiple AI clients simultaneously processing batches of images. Even with multiple clients running inference requests on a loop, the system will remain secure, seamless, and responsive.
Let’s talk about one of the key benefits of this system: security. The Intel IPU doesn’t just process traffic—it actively protects the communication between the infrastructure and the inference workloads running on the host.
Here’s how it works: The IPU uses FXP rules to control traffic across the PCIe interface. Only the traffic authorized by these dynamically generated rules (managed by the DPU operators) is allowed to flow. This ensures secure communication while blocking unauthorized access to the host system. This kind of layered security helps mitigate risks, especially for enterprises processing sensitive data through AI pipelines.
To me, the magic of this solution lies in its perfect blend of performance, automation, and security. By isolating infrastructure management on the IPU while hosting inference workloads on the host machine, Intel, Red Hat, and F5 have created a setup that is both efficient and secure.
Here’s what makes this configuration a game-changer:
This Intel OpenVINO-based solution brings together hardware and software in a way that feels effortless. Intel’s E2100 IPU, Red Hat OpenShift, and F5 NGINX Plus provide a best-in-class example of how to simplify complex AI inference pipelines while improving security and scalability.
Whether you’re a developer, infrastructure architect, or enterprise decision-maker, this solution offers a practical blueprint for managing AI workloads in a modern, containerized environment. If this has piqued your interest, don’t hesitate to reach out to Intel, F5, or Red Hat to explore how this configuration can fit into your workflow.
It’s exciting to see how this tech is evolving—and I, for one, can’t wait to see the next set of innovations. To learn more, watch my demo video on LinkedIn.