The Art of Scaling Containers: Retries

F5 Ecosystem | January 11, 2018

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

Scaling containers is more than just slapping a proxy in front of a service and walking away. There’s more to scale than just distribution, and in the fast-paced world of containers there are five distinct capabilities required to ensure scale: retries, circuit breakers, discovery, distribution, and monitoring.

In this post on the art of scaling containers, we’ll dig into retries.

Retries

When you’re a kid playing a game, the concept of a retry is common to a lot of games. “Do over!” is commonly called out after a failure, in the hopes that the other players will let you try again. Sometimes they do. Sometimes they don’t. That rarely stops a kid from trying, I’ve noticed.

When scaling apps (or services if you prefer), the concept of a retry is much the same. A proxy, upon choosing a service and attempting to fulfill a request, receives an error. In basic load balancing scenarios this is typically determined by examining the HTTP response code. Anything other than a “200 OK” is an error. That includes network and TCP layer timeouts. The load balancer can either blindly return the failed response to the

requestor or, if it’s smart enough, it can retry the request in the hopes that a subsequent request will result in a successful response.

This sounds pretty basic, but in the beginning of scale there was no such thing as a retry. If it failed, it failed and we dealt with it. Usually by manually trying to reload from the browser. Eventually, proxies became smart enough to perform retries on their own, saving many a keyboard from wearing out the “CTRL” and “R” buttons.

On the surface, this is an existential example of the definition of insanity. After all, if the request failed the first time, why should we expect it to be successful the second (or the third, even?).

The answer lies in the reason for the failure. When scaling apps, it is important to understand the impact connection capacity has on failures. The load on a given resource at any given time is not fixed. Connections are constantly being opened and closed. The underlying web app platform – whether Apache or IIS, a node.js engine or a some other stack – has defined constraints in terms of capacity. It can only maintain X number of concurrent connections. When that limit is reached, attempts to open new connections will hang or fail.

If this is the cause of a failure, then in the microseconds it took for the proxy to receive a failed response a different connection may have closed, thereby opening up the opportunity for a retry to be successful.

In some cases, a failure is internal to the platform. The dreaded “500 Internal Server Error”. This status is often seen when server is not overloaded, but has made a call to another (external) service that failed to respond in time. Sometimes this is the result of a database connection pool reaching its limits. The reliance on external services can result in a cascading chain of errors that, like a connection overload, is often resolved by the time you receive the initial error.

You might also see the oh-so-helpful “503 Service Unavailable” error. This might be in response to an overload but as is the case with all HTTP error codes, they are not always good predictors of what actually is going wrong. You might see this in response to a DNS failure or in the case of IIS and ASP, a full queue. The possibilities are really quite varied. And again, they might be resolved by the time you receive the error, so a retry should definitively be your first response.

Of course you can’t just retry until the cows come home. Like the unintended consequences of TCP retransmission – which can overload networks and overwhelm receivers – retries eventually become futile. There is no hard and fast rule regarding how many times you should retry before giving up, but between 3 and 5 is common.

At that point, it’s time to send regrets to the requestor and initiate a circuit breaker. We’ll dig into that capability in our next post.

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Tags: 2018

About the Author

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

More blogs by Lori Mac Vittie

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Related Blog Posts

F5 Ecosystem | 01/29/2026

Why sub-optimal application delivery architecture costs more than you think

Discover the hidden performance, security, and operational costs of sub‑optimal application delivery—and how modern architectures address them.

F5 NGINX,

Application Delivery & Traffic Management,

F5 on Azure,

F5 on Google Cloud Platform

F5 Ecosystem | 01/23/2026

Keyfactor + F5: Integrating digital trust in the F5 platform

By integrating digital trust solutions into F5 ADSP, Keyfactor and F5 redefine how organizations protect and deliver digital services at enterprise scale.

F5 Application Delivery and Security Platform (ADSP),

F5 NGINX,

F5 BIG-IP,

ADSP Partner Program

F5 Ecosystem | 01/20/2026

Architecting for AI: Secure, scalable, multicloud

Operationalize AI-era multicloud with F5 and Equinix. Explore scalable solutions for secure data flows, uniform policies, and governance across dynamic cloud environments.

AI,

Application Delivery,

Security,

Equinix

F5 Ecosystem | 01/09/2026

Nutanix and F5 expand successful partnership to Kubernetes

Nutanix and F5 have a shared vision of simplifying IT management. The two are joining forces for a Kubernetes service that is backed by F5 NGINX Plus.

Multicloud,

F5 NGINX,

F5 BIG-IP,

F5 Distributed Cloud Services,

Nutanix partner,

Hybrid Multicloud Application Delivery

F5 Ecosystem | 12/19/2025

AppViewX + F5: Automating and orchestrating app delivery

As an F5 ADSP Select partner, AppViewX works with F5 to deliver a centralized orchestration solution to manage app services across distributed environments.

F5 Application Delivery and Security Platform (ADSP),

Strategic Alliance,

F5 BIG-IP,

F5 NGINX

F5 Ecosystem | 11/11/2025

F5 NGINX Gateway Fabric is a certified solution for Red Hat OpenShift

F5 collaborates with Red Hat to deliver a solution that combines the high-performance app delivery of F5 NGINX with Red Hat OpenShift’s enterprise Kubernetes capabilities.

F5 NGINX,

2025