Load balancing. It’s commonly accepted that we need it, rely on it, and use it every day to scale up (and hopefully down) applications. It’s become critical infrastructure responsible for not only scaling to meet demand but ensuring the continued availability of applications and services upon which business relies for both productivity and profit.
Which is why it’s something we need to revisit. Because load balancing shouldn’t be as tactical as its increasingly treated by the operations folks who more often than not now have to provision, configure and deploy these magical services. Load balancing, when looked at strategically, is able to improve performance, reduce risk, and help make more efficient use of the resources needed to deliver applications. They’re smarter than the “plumbing” moniker they’re often forced to bear and understanding a few key points will help ops to think more about how they’re using load balancing to support applications.
So without further ado, here are three things ops really needs to know about load balancing.
I’d start by mentioning that round robin is the last algorithm you should ever mention but you already knew that, right? So we’ll skip that and tackle the more intelligent algorithms like least connections and fastest response time. These are, of course, much better choices as you strategize on how to balance performance with efficient use of resources. Each one takes into consideration application characteristics (or at least the characteristics of the platforms delivering the applications) that are critical to making decisions about which application instance (or container, if you prefer) should receive the next request. Least connections infers that if an instance has fewer connections, it has more capacity and thus is better able to fulfill this here request right now. It’s choosing capacity efficacy over performance.
Fastest response time, on the other end of the spectrum, is choosing to direct requests based on performance. The faster the instance, the more often it’s going to be selected. Operational axioms being what they are (as load increases, performance decreases) this means that eventually a less burdened server will respond faster and thus be chosen. While this means a head nod toward capacity efficacy, this algorithm chooses performance over capacity every time.
But now note the names of the algorithm: Least and fastest. Now consider that if two turtles are racing down the sidewalk, one of them is faster, even though they’re both travelling at what we’d all call “slow” speed. The same goes for least connections. When given a choice between 99 and 100; 99 is definitely the least of the two.
Why it matters
The way in which load balancing manages requests has a direct and immediate impact on performance and availability. Both are critical characteristics that ultimately affect customer engagement and employee productivity. Optimizing architectures inclusive of load balancing will help ensure business success in realizing higher productivity and profit goals.
Since the ascendancy of cloud and software-defined data centers, elasticity has become the way to scale applications. Elasticity requires on-demand scale up – and scale down – as a means to optimize the use of resources (and budgets). Why overprovision when you can just scale up, on demand? Similarly, high availability (HA) architectures dependent on the principles of redundancy have become almost passé. Why require idle resources on standby in the (unlikely) event the primary app instance fails? Tis a waste of capital and operational budgets! Out, out damned standby!
While on-demand fail and scale is a beautiful theory, in practice it isn’t quite so simple. The reality is that even virtual servers (or cloud servers, or whatever term you’d like to use) take time to launch. If you (or your automated system) wait until that primary server fails or is at capacity before you launch another one, it’s already too late. Capacity planning in cloudy environments cannot be based on the same math that worked in a traditional environment. Capacity thresholds now need to factor into the equation the rate of consumption along with the time it takes to launch another server in order to seamlessly scale along with demand.
And the same goes for failover. If the primary fails, it’s going to take time to launch a replacement. Time in which folks are losing connections, timing out and probably abandoning you for a competitor or the latest cat video. While an idle spare may seem like a waste (like insurance) when you do need it, you’ll be happy it’s there. In particular, if that app is responsible for customer engagement or revenue then the risk of even a few minutes of downtime (and its subsequent cost) may more than make up for the cost of keeping a spare at the ready.
Interestingly, containers seem to potentially address these issues with blazing fast launch times. If availability, performance and cost are all equally important, it may be time to start exploring the value containers can bring in terms of balancing all three.
Why it matters
Downtime is costly. The cause of downtime isn’t nearly as important as avoiding it in the first place. Ensuring the right architecture and failover plans are in place in the face of failure is imperative to maintaining the continued availability critical to business success.
Of all the problems that occur when an app moves from development to production, this is probably the most common and easily avoidable. You see, most load balancing services (all the good ones, anyway) are proxies. That means the client connects to the proxy, and the proxy connects to your app. Both are using TCP to transport that HTTP, which means it has to obey the laws of networking. The source IP (what you think is the client IP) is actually the IP address of the proxy. If you’re doing security or authentication or metering based on IP address, this poses a serious problem. The value you pull out of the HTTP header isn’t what you want.
The industry has pretty much standardized on dealing with this by taking advantage of HTTP custom headers. The X-Forwarded-For header is probably what you’re really looking for – that’s where a proxy puts the real, actual client IP address when it forwards on requests. Unfortunately it’s not a standard, it’s more a de facto “we all kinda agreed” standard, so you’ll need to verify.
The point is that the client IP address you’re looking for isn’t the one you think it is, and thus developers need to take that into account before apps that need that information move into a production environment and suddenly stop working.
Why it matters to the business
Troubleshooting issues in production is far more costly than in dev or test environments. Both the time to find and fix the problem negatively impact project timelines and impede the time to market so critical to competitive advantage and business success in an application world. Recognizing this common issue and addressing it in the dev or test phase can ensure a faster, seamless deployment into production and out to the market.