This post is adapted from a presentation by Kevin Reedy of Belly Card at nginx.conf in October 2014. While that was a couple years ago, the content is still highly relevant today. You can view a recording of the presentation on YouTube.
Table of Contents
Good morning, everybody. My name is Kevin Reedy, and I work for a company called Belly Card based out of Chicago. We have about a hundred employees and twenty engineers. Today I’m going to talk to you about how we do our load balancing, specifically in an always changing dynamic infrastructure.
So first I’m going to tell you about our infrastructure stack over time starting with version 0, which I call the Dark Age. Then we’ll go over how we use Chef to configure NGINX. After that, we’ll talk about how we’ve changed away from that [to] using data from something called Consul service discovery platform to configure NGINX. If time permits, I have some pro tips on configuring NGINX, especially when it comes to configuration management.
So even though I’m going to talk to you about Chef and Consul, I want to put out a disclaimer that nothing in this talk is specific to them. Chef, Puppet, Ansible, Salt, CFEngine, and all the rest are great products and you should be using one of them. We enjoy Chef, and I’m sure a lot of other people do too.
The same goes with service discovery. Consul, etcd, and ZooKeeper are all great, so use what works for you.
What is Belly? Essentially it’s a digital replacement for a punch card at a coffee shop where you buy ten drinks and get one free. We put an iPad in the store, customers scan using their phone or a physical card with a QR code, we track their points, give the rewards, and we actually build marketing tools on top of that.
But I’m not here to talk to you about belly as a company. If you’re interested in that, you can just go to our website. I’m gonna talk to you about our infrastructure stack.
So version 0. Back when there were three developers and before I worked there, Belly was a monolithic Ruby on Rails application, which meant that every API call and part of our website went to a single Rails app. It was backed by MySQL, MongoDB, and Memcached, and we deployed it on Heroku, which is a great place to start.
So version 1 – A New Hope. We decided Heroku wasn’t going to meet our needs anymore, so we started deploying something called Capistrano, which is essentially “better than a shell script” written in Ruby. It helps you deploy things.
We had about six servers on Amazon Web Services EC2. That was a static number, but we would bump it up if we knew there was something big happening and back down if we needed to. In front that we had AWS Elastic Load Balancer.
So then a project came up where we wanted to separate our home page from the rest of the API. We wanted to do this so we could move faster with our home page and have different developers working on it.
The big buzzword at the time – and still today – is service‑oriented architecture.
So version 2 of our stack was to enable service‑oriented architecture. Our first plan was to separate the home page from the API. That seems really simple – we just have www.bellycard.com and api.bellycard.com.
But it turned out to be really complex and not simple whatsoever, because some of our mobile applications were statically written to use www.bellycard.com/api and /api‑assets. Our email providers and all of our click links that had already been emailed to people had /eo for “email open” and /ec for “email click”. Our Facebook app was configured to /facebook/callback.
So that meant that AWS Elastic Load Balancer no longer gave us the flexibility we needed to route this traffic to different applications. So NGINX came to save the day. Specifically, we could have multiple upstreams and rewrite rules to route traffic to our different applications.
In this very basic example we define an upstream
block with six servers for the API and a block with two servers for the home page.
There are also two server
blocks [defining virtual servers]. On one we listen for api.bellycard.com and route all of that traffic to the API servers. On the other we listen on www.bellycard.com and route things accordingly.
This worked great. It allowed our developers to move fast. Our frontend people were working on one project and not being blocked by our API. So we deployed a third service. I believe it was for integration with apple’s Passbook. Then a fourth – we moved all of our Facebook functionality out. Our fifth was to deal with user events on our website.
Then someone decided it was a great idea to write that in Node.js, so now we had five distinct services in two languages. They’re all deployed independently without much configuration management, and we were manually editing NGINX, which I’m sure a lot of people start with.
We needed a better way to scale this process, so we did that with Chef.
Chef is a configuration management suite. It’s written in Ruby and you can write your actual configuration management in Ruby, so it’s great for developers. learnchef.com is an amazing resource for learning about Chef; it can teach you way more about Chef than I can in 45 minutes.
I will go over some of the basics just so we’re all in the same page, though.
The first kind of building block in Chef is a recipe. Chef uses a lot of clever names like knife as a tool and recipe and everything’s in a cookbook. That makes it kind of fun and easy to figure out and provides a great metaphor, but also makes it really hard to Google for anything.
So here’s the most basic recipe I could think of. Install the package nginx
and start and enable the nginx
service.
You can also have file resources. For example, I want to write the content Hello
World!
to the file /usr/share/NGINX/html/index.html.
You can also build that into a templating system. Here I have a couple of variables g
and w
for Hello
and Kevin
and I can pass these variables into a template file, which is in ERB format (a Ruby templating script).
So instead of just writing out the content manually, we can write a template for reading and who to greet.
The last major building block is a cookbook, which is collection of recipes, files, and templates. Also resources, providers, and libraries, which are more complex resources. There are tons of open source and other cookbooks at https://supermarket.getchef.com, including ones for NGINX, MySQL, Java, apt
and yum
, even managing your ssh_known_hosts files. Pretty much the main building blocks of what you’re trying to do in configuration management you can find in a cookbook already.
We’ll go through a very quick NGINX example with Chef where we’re going to install NGINX and configure it by changing a couple of variables. We’ll use Chef to discover where our application is running, and then configure NGINX to route traffic to our application.
So in this example, the first thing we do is set some node attributes. For every node, you can set attribute values by appending a JSON document. We’re going to use a third‑party NGINX cookbook that looks for settings of these attributes. We’re setting worker_connections
to a certain value, the worker_rlimit_nofile
directive, what event
type we’re going to use in NGINX, and even client_max_body_size
.
After that, we include two recipes from the NGINX cookbook. The first is nginx::repo, which sets up either an apt
or yum
repository depending on your operating system. Then [the nginx recipe] installs NGINX. You can also install NGINX from source, but using packages is faster.
Next we take advantage of Chef’s search capabilities. When you run Chef, you can either run it as an agent on the machine, or you can run it in solo mode which essentially runs through it like a script. When you have an agent on a machine, you can query the Chef server for nodes and other attributes.
Here we’re asking Chef to give us a list of all of the nodes that have the recipe belly‑api attached to them. After that, we template out a site file called /sites‑enabled/api. Its source is the template [file called api.erb that] I’ll show you next. The variables we pass in are the nodes we searched for just before.
The last new thing here is a notifies
statement to reload the nginx service. This says that whenever this template changes, I want you to reload this service.
So our template file here is pretty simple. We have an upstream called api and in ERB we iterate over all of the servers we pass through as a variable, specifying the server by IP address and port 8080. We even throw in a comment with the server name. When an operator is looking for what’s going on with this service it’ll be a nice comment for them.
We also define a server
block to listen on api.bellycard.com. I just pass all traffic to the api upstream.
I also want to provide a real‑world example. It might be too small for you guys to read, but you can kind of just get a feel for how complicated your Chef recipes can get. This is the actual template and recipe from some of our production servers today.
So now back to our stack. I told you we had deployed about five applications and that was going great. We now had a way to route traffic to them, so our developers just took hold of this and today we’re up to about 40 Ruby applications that are deployed onto our servers using Chef. Traffic is routed through completely dynamically using NGINX.
Our developers don’t really have to do anything other than edit one recipe to include the name of their new service. All of the traffic is routed by NGINX and dynamically configured with Chef.
We also had about ten frontend JavaScript applications that are just static files that we deploy on S3 with a CDN, [Amazon] CloudFront, in front of it. We served index.html out of the S3 bucket, and the rest of the assets were over CloudFront. We also had about another five apps that developers just pushed to Heroku because it was a one‑off thing that eventually became production and nobody was really told about it.
[The five apps on Heroku] were separate – completely outside of our infrastructure – until a beautiful day in April when the Heartbleed bug was announced. A lot of people scrambled. Heartbleed was a bug in OpenSSL that allowed people to potentially extract your private keys.
Since we use Chef, we were able to upgrade our OpenSSL infrastructure across all of our boxes as soon as a patch was available. I believe it was deployed in about 20 minutes everywhere in our entire infrastructure. However, AWS (and thus Heroku) took about 24 hours to patch their Elastic Load Balancers.
This prompted us to actually move all of our load balancing and SSL termination into NGINX regardless of where it was deployed – both S3 and Heroku. You take a slight penalty for doing so, but at least on that day in April it was worth it to us.
So version 3 of our stack went from being “SOA all the things” to “SSL all the things”. All of our traffic to bellycard.com and its subdomains passes through our load balancers. Our assets are still actually served from CloudFront on a different domain for performance. We don’t want the overhead of SSL, and also we’re only a single data center on the East Coast.
Version 4 – the buzzword everyone’s talking about this year is Docker. Since Docker was at I think version 0.5, our developers were excited to start using it.
The reasons [to use Docker] are simple.
It greatly simplifies deploying applications. There are no recipes to write. You don’t need to worry about how you’re going to do asset compilation. It doesn’t matter if you’re writing your app in Ruby when somebody else is writing their app in Python.
Essentially it separates the responsibilities of Development and Operations. Your developers can worry about code, what libraries to use, even which Linux packages to install. Operations can worry about more things like: where am I going to deploy this? How am I going to capture logging from it? How do I monitor it? Once you’ve solved those problems and operations, every application will work, regardless of what it’s written in.
A bonus of this is that it allows our developers to easily spin up dependent services on their own laptops. As I mentioned we’re up to about 55 different applications, many of which are now in Docker containers.
Say you’re working on a new service for sending out email campaigns. It relies on the email service. So on your development machine you would typically download the code, run bundle
install
, start it up in the background, and “oh – but actually that relies on another service, the user service”.
That service relies on something else and something else. So actually the new email‑campaign service relies on about five other services plus MySQL, Elasticsearch, Redis, and the list goes on.
With Docker we can actually build those containers from the same Chef recipes that we use in production and provide our developers with containers they can download extremely quickly.
Thus this speeds up deployment and development.
So in version 4 we went from deploying our applications using Chef to deploying them in containers. There are tons of solutions to do this and I’m sure there will be even more.
The first is CoreOS and Fleet. CoreOS is a stripped‑out Linux operating system that runs a service discovery engine on top of it called etcd. Fleet is a way for you to define things like “I want a container running five times and I don’t want it running on the same machines as this other service”.
There’s another called Deis that is similar to Heroku in that it’s got a Git Push interface. There’s Apache’s Mesos and Google’s Kubernetes; Flynn is another one that’s just come out recently. We actually wrote our own because when we started this voyage most of these tools weren’t at a place where we could use them.
Ours is called Jockey, and it’s worked out really great because our deployment process is extremely opinionated. What’s great about Docker is that it essentially provides a set of common APIs for containerization. We don’t have to worry about a lot of things we would if we were just using Linux’s LXC containers.
We plan on open sourcing it, but as I mentioned deploys are really opinionated and what works for us might not work for you.
Also our deployment strategy kind of changed when we moved to Docker. Previously we were doing zero‑downtime deploys by deploying a new version of the application in place. This means that if we had that user fax service on four machines, we would use Chef to pull down the new version, install it on top of [the old version], and then send a signal to the web server to reload. Unicorn is a great Ruby engine for doing that.
However, containers are immutable. Once you have a running container, if you want to make a new build you should make a new container. You can kind of attach to it, run extra commands, and then save it – kind of like you would in Git. But really if you want to get the most out of Docker, your container should be completely immutable.
So that means we had to figure out a new deployment strategy because we’re no longer able to deploy in place. There are a few out there. Blue‑green is an interesting one. This is where if you’ve got an application running on four machines, you bring it up on four new machines and have your load balancer point to the new machines. Once you’re sure it’s actually working the way you expect, you can tear down your blue, and your green becomes the blue. Amazon’s got some really great tooling for doing this with AWS.
There’s also “canary” deploys where I deploy one version of the new machine, monitor it, and if it passes metric checks and health checks then I can continue the deployment.
The difference with containers here is you’re going to actually tear them down and spin up new ones, so you need some kind of system to track the location of each application running in your infrastructure.
Chef is actually starting to become a really good way to do this. There’s a cookbook that’s being merged in called Chef Metal that allows you to actually treat containers like any other resource you would in your Chef infrastructure.
There’s also service discovery. ZooKeeper is a very popular tool that’s used all over Hadoop. [Airbnb] has got a great product called SmartStack that allows you to do deployments and keep track of them in ZooKeeper.
I mentioned etcd, which is essentially a distributed key‑value store with a REST interface. This is great compared to ZooKeeper, which is more of a Java client library, because you can now communicate with etcd from any of your applications. A lot of those open source Platform‑as‑a‑Service things I mentioned earlier – like Deis and Flynn and even Google’s Kubernetes – use etcd. If you’re using CoreOS, etcd is already there for you and built in.
Another one has come out, also REST‑based, is called Consul. It’s released by the people from HashiCorp that make Vagrant and Packer and tons of other awesome tools. Consul is actually what we decided to use.
Here’s a diagram of the very basics of Consul. You can see there are multiple data centers, each with three servers. In Data Center 1 there are also clients. What’s interesting is that among your server agents, you’ll elect a primary agent and then all of your clients are actually aware of not only all the servers but also all the other clients. They then use a gossip protocol to communicate with all the other agents.
It reminds me of an old AT&T commercial where you tell two friends who tell two friends and they tell two friends and the message spreads that way throughout your entire network of nodes. [Editor – Well, close: it was a commercial for Faberge Organics shampoo.] This is compared to etcd or ZooKeeper where you essentially just have server agents and your clients connect to them.
With the Consul way of doing things, you would actually run a Consul client on pretty much every server, everywhere we’re deploying Docker, and everywhere we run our load balancers. So you can always talk to that API on localhost port 8500.
Just like etcd, [Consul] is a distributed key‑value store. You can set and get keys and they’ll maintain linearizability so you can make sure they’re always consistent. There are some gotchas there, but their documentation on where those problems are is phenomenal. However, unlike etcd they treat service discovery as a first‑class citizen.
Instead of just key‑value pairs, you can also register services, for example an API. When you register a service, you tell Consul what node it’s running on. This means that when we deploy a container, we also tell Consul to register this service API on this host as well.
On top of that Consul has distributed health checks, so now you can define Nagios‑style health checks right inside your service‑discovery platform. When you decide to query Consul, you can ask for a list of all locations where the API service is running. You can send a flag for “all” or “only passing” or “only unhealthy”.
It also has support for multiple data centers. I mentioned you should be running that Consul client everywhere. When you start it up, you pass in a parameter specifying the data center. If you have a Data Center East and a Data Center West, you can query your local Consul client without having to be data‑center–aware, asking for an address or something.
We actually use the data center as a boundary for our staging and production environments. Our configuration for an application in staging might say, “ask me where MySQL.services.consul is”. Whether that’s actually running on a staging node or a production node, you’ll get the correct value without ever having to be data‑center[–aware] or multienvironment‑aware.
The other thing that’s great about it (and also etcd) is it has long polling on its HTTP interface. This means you can query and ask for a list of all the servers that are running this service, but also keep the connection open for 60 seconds and be notified right away if anything changes. The polling time can actually go all the way up to 10 minutes. This means you can get instant notification of any changes to your infrastructure.
The other thing I mentioned already is that you can build the ability to scale out your agents by having clients everywhere. This is great because you can always talk to localhost.
The big question, though, is: now that I’ve got this data living in a distributed service‑discovery platform, how do I configure NGINX from the data?
Thankfully there’s a tool called Consul Template. It was literally released yesterday, which made me change my slides for today from this point on. It used to be called “Consul HAProxy”, which was a terrible name because it wasn’t actually specific to HAProxy.
Consul Template gives you a very simple interface to build templates to subscribe to changes in services, the health checks of those services, and even a key‑value store. This is so you can store settings like [the number of] NGINX workers inside the key‑value store and regenerate your config when that changes.
The big ability here is to generate files from templates, which sounds really similar to what we were doing with Chef before. The bigger difference here is it uses the Golang templating language as opposed to ERB, which is Ruby. Once you change a file you can optionally call a command on the change, for example service
nginx
reload
.
So here’s an example of running Consul Template at the command line. You point it to your actual Consul host, which in this case is consul‑prod.example.com. In a real production environment, I would suggest running Consul everywhere and connecting to localhost.
Then you pass in a template file, which is a Golang template. In our case we’re passing in /etc/consul‑templates/api. Then you pass in what file to actually write out, so in our case /etc/NGINX/sites‑enabled/api. Then a command to call: service
nginx
reload
.
The Consul template looks really similar to a slide I had before, except instead of ERB it’s Golang. So we define two upstreams here. One’s called api and one’s called homepage. We have a range
statement for the service API, and there are built‑in variables for grabbing the IP and port of where that service is actually running.
What’s great about Consul Template and using it this way is that the convergence on service changes is fast. Previous to deploying with Docker, we had Chef running about every 5 or 10 minutes depending on the machine. That was fast enough for us because we were deploying in place.
Now that we’re building a more advanced deployment system, we need to have notification of those changes in the seconds rather than minutes. We actually get it in the milliseconds, on average probably about 200 milliseconds.
However, configuration management is still needed to generate templates. We can’t just replace Chef with Consul. We still need to figure out how we’re going to deploy Consul and Consul Template and everything.
So we are actually using Chef templates to deploy our Consul templates, which seems a little convoluted. What we do first in our Chef recipe is query our Consul host for a list of all services. Then we generate a Consul template [with the template
and source
statements] and we pass in our SSL cert and key locations and reload consul‑template
. [Editor – Mr. Reedy initially referred to consul‑haproxy
, which appears in the first notifies
statement on the slide. He then corrected himself and clarified that the slide predates the release of Consul Template.]
We also write out a config file for Consul Template and we pass in [variables specifying] the Consul host, source, destination, and reload command. This is really the guts of how we’re using Chef and Consul together to get instantaneous updates to our NGINX configs.
Next I’ve got some pro tips both about Chef, Consul, and NGINX configuration in general.
The first tip is if you’re using Chef search to find where your services are running, you should look into Chef partial search. The problem with Chef search is it makes a copy of the entire node, including all of the attributes. If you save a lot of things to your node’s attributes you’re gonna have to pull those over the wire while talking to your Chef server, and also store them in memory.
If the Chef search returns hundreds or thousands of servers, you can easily see how you could run out of memory for your actual recipe. Partial search allows you to return only the keys you care about.
For example, we first had
nodes = search(:node, 'recipe:belly-api')
That search returns every single piece of information about every object. This includes information on its disk space, RAM, CPU, and all the applications that have written their own node attributes.
Instead, we can use partial search where we make a new hash called search_keys
and give it a list of keys we care about – in this case only the name and IP address. Then we hit the very similar API called partial_search
and we pass in the additional [keys
] parameter.
The second thing that bit us when we moved to load balancing in front of S3 and Heroku is that NGINX uses DNS to resolve the hostnames in the upstream
blocks only at configuration reload. This means that if you are setting an upstream server to s3.amazonaws.com or anything.herokuapp.com and it changes, your load balancer doesn’t know about the new IP address until the next reload.
The way we’ve gotten around that is to actually do the DNS resolution in Chef and then trigger the NGINX reload when that changes. This led to a second problem: [the order of addresses in a DNS response] is not always the same, so make sure you also sort the response before you write it into your template.
The other issue when you start deploying tons of services is you want to avoid stale configuration. [Editor – Mr. Reedy pauses the presentation briefly to answer a question from the audience.] In Apache, NGINX, and other UNIX things, a common configuration pattern is to have a conf.d folder and then load everything from it. So if I deploy services A, B, and C, I make service-a.conf, service-b.conf, and service-c.conf.
Then I decide to replace service B with D, and now have four configs in the NGINX configuration folder but service B no longer exists. You might end up with conflicts in B versus D because they have a similar hostname.
There are two ways we’ve targeted doing this [removing stale configuration] in the past. With templates you can add a :delete
function, but this became kind of unwieldy really quickly. We essentially had a list of old services that we would always try to delete if they existed. That didn’t really scale out.
Another thing you can do is render all of your site configs to a single file. A lot of people have referred to this as “move idempotency upward”. Essentially, instead of having an NGINX config that loads from multiple sources, throw it all in your nginx.conf.
That does make it a little unwieldy when you’re actually logging into the box, but hopefully we’re getting to the point where you’re not troubleshooting a config file live in production. If it’s still a little too unwieldy in your Chef recipes, what we do is have one nginx.conf that loads a series of different sites. For example one of our site config files lists about 30 services, and then we have one for the Heroku one, one for all the S3 ones, etc.
So if you render all of your site configs to a single or small amount of files, you don’t need to worry about cleaning up after yourself. If from Chef’s point of view a service goes away, Chef just stops rendering it to the actual file.
Q: How has Consul worked out for you so far?
A: We’ve loved Consul, and the thing I really like about it compared to something like etcd is that there are services that continue to be built on top of it. I don’t have to worry about service discovery. In fact, if you have legacy applications and you deploy (let’s say) your MySQL servers using Consul, Consul also provides a DNS interface.
Your application doesn’t need to be aware of Consul; it can just query “what address do I get from mysqldb.services.consul?”. In this case, Consul is the DNS server. You set up your DNS infrastructure so that if the TLD [top‑level domain] ends in .consul
, the query is routed to Consul.
It’s scaled out great for us. We’re running out about 3 nodes in production for server agents and about 60 clients. We haven’t hit any hiccups. They just keep adding features and don’t break backward compatibility so upgrading has always been great.
Q: Do you use NGINX Plus?
A: We don’t. We’ve looked into it for exactly this, when we started working with Consul. If we didn’t need a templating system, it would be great because we could just have a script that listens to that HTTP long polling and then speaks to the NGINX API on your behalf. The only issue there is you probably also want a templating system in case NGINX were to restart. It would reload your upstream configurations right there.
There’s also some work other people have done in NGINX Lua to query Consul directly. I haven’t seen a good implementation yet, because they tend to fall into one of two buckets. The first is they query Consul on every HTTP request, which is slow. The second is they load it at configuration runtime and you end up with the same problem you have with DNS. So the templating thing is best for now, but I wouldn’t be surprised if we see a way to do it with NGINX Lua in the future.
Q: Would you ever move away from Chef?
A: There isn’t a great way built into Consul to maintain those templates, and we’re already deploying NGINX with Chef. Our recipes around there are hundreds, if not thousands, of lines long at this point for all the other things we’re doing with NGINX including logging, tracing, and some OAuth. I don’t see us moving away from Chef in our infrastructure and replacing it, but it’s a great augmentation and good way to kind of get rapid notifications.
Q: How much overhead does Chef provide?
A: Because most of our machines are doing a single task, our actual Chef overhead is pretty low. RAM usage is low while it’s sitting idle. It actually sits in a timer and can fork off. Other people also run it in a cron
if they don’t care about how fast it converges. We see a little bump in CPU in some of our heavier recipes, but nothing we can’t just deploy another server to fix.
Q: How much have you done with CloudFormation?
A: We’re not using CloudFormation yet. It’s on our radar, but one change we’ve made is to utilize a lot more Auto Scaling groups. The way that’ll work is when an instance comes up, it’ll have a user data that contains its Chef run list.
A new machine will come up into a cluster with a key already based on its image, so it’s able to talk to our Chef server, register itself, and pull down the cookbooks it needs. That solves most of the CloudFormation stuff for us because we’re not using a lot of other AWS tools that would go hand in hand with that.
Q: How do you deal with a server being destroyed?
A: If a server goes away in an Auto Scaling group, you can use Amazon SNS [Simple Notification Service] to get a notification of that. We have a very simple service that listens for those notifications and then cleans up the Chef server.
Q: Have you used OpsWorks from AWS?
A: We haven’t. We’ve been using the hosted Chef from the beginning and it’s worked great for us. I’m not sure about this, but I believe that OpsWorks is more like Chef Solo, where you’re using it just for provisioning. We still run Chef Agent on almost all of our machines, and use those same Chef recipes to generate Docker containers for development.
"This blog post may reference products that are no longer available and/or no longer supported. For the most current information about available F5 NGINX products and solutions, explore our NGINX product family. NGINX is now part of F5. All previous NGINX.com links will redirect to similar NGINX content on F5.com."