As part of our SaaS-based Control-plane, we have built and run our own global backbone (AS35280), using multiple 100G and 400G links between our PoPs.
That way, we have complete control over end-to-end connectivity between our regional edges, but also this allows us to provide the same high-performance connectivity and low latency to our customers — across their private data centers, edge sites, public cloud VPCs(AWS, Azure, GCP), as well as SaaS providers.
Our European footprint was already pretty good, with presence in Paris, London, Amsterdam, and Frankfurt, but existing and new customers required a new PoP in Lisbon, Portugal.
This was all agreed at the beginning of 2020, and deployment was planned for Q3 2020. Of course, this was pre-COVID-19 :)
With the crisis, we saw a lot more traffic (and also DDoS attacks, but more on that in a future blog post) on our backbone, and so did our customers.
They asked us to deploy before Q3, because they needed this PoP ASAP — more precisely, before the end of May. And since we at Volterra are nice guys, and also because we like challenges, we looked carefully at the time needed to meet the customer demand:
Knowing that we were early April, this looked fine and we decided to carry on and launch the project, even though that was really the worst possible time to do it, due to:
Deploying a new PoP is not only about routers, switches, and cables. You also need to:
With the ongoing crisis, having the required hardware in time was impossible. So we decided to reuse some we had available, mostly from our lab. This was an acceptable trade-off (e.g. routers used will be Juniper QFX10K instead of the planned MX10K).
Staging, which we usually do obviously in a datacenter (because of power and rack space needed, but also… noise!) would have to be done at home because of the lockdown. Raphaël, our CTO of Infrastructure, had a big enough office room (including a 60-Amp contract, which can prove useful when you boot/power equipment who take up to 16 Amp!), so he would do the whole staging by himself, which would also avoid having other staff involved/having to get out.
Once everything had been configured and tested multiple times, we shipped to Lisbon:
Even though we were confident on the setup we did (and had remote access via OOB or our backbone anyway), still this was the first time that a new PoP would not be deployed by us directly, but by someone else 😅
We use the same rack design all around the world, and the goal was to be consistent and have the same setup for this new Lisbon PoP.
So we had to be extremely precise with the instructions we were giving to Equinix remote-hands so that they can mimic and just had to “follow the guide”.
Below is a part of the procedure we sent to Equinix - so that they can easily rack and connect everything.
There’s a lot of components to deal with — not only the hardware devices (routers, switches, firewalls, servers) but also the cabling, and more importantly, the switch and server ports to connect the cables to.
As you can see below, the procedure is as detailed as possible, bearing in mind that the Equinix technicians have lots of installation to do, so the more precise we are, the better it is!
Yes! Installation began on May 5th, with all the devices racked and powered, and no hardware failure — we’ve been lucky, or maybe thanks to our experience, shipment and packaging were done properly, or maybe both — but in any case, everything worked fine.
The day after, Equinix technicians took care of cabling (copper/fiber), and at 11:30 PM, we could ping our Lisbon PoP from Paris!
The installation was completed on May 7th, with the final tasks to carry, such as the configuration of PDUs, Cross-connect of the OOB ports, IXP ports check end-to-end. Even our switches/firewall configuration were fully functional, we didn’t have to ask Equinix for configuration changes.
The final installation looks like this:
As we are super demanding, we are not 100% satisfied e.g. the rear panel of the rack is not as clean as we would like it to be — but we’ll fix that once the Crisis settles down and we can travel again to Portugal.
Even though we are extremely happy and proud that we managed to meet the challenge, we like to step back and reflect on what worked, but especially what can be improved.
Why did that work?
What can be improved?
We presented this deployment during the first remote RIPE meeting (RIPE 80), you can watch the recording here: