Most enterprises are in the process of transforming their IT operations—from on-premises to cloud, physical to virtual. And with the onset of COVID-19, nearly every company is accelerating their modernization journey.
Such is the case for CSG, a 35-year-old provider of customer engagement services for the telecommunications and cable industry. The company equips North America’s largest cable providers with digital solutions to more efficiently manage customer relationships, billing, and operations.
A few years ago, CSG was straining under the limitations of aging processes, compounded by a lack of engagement from new application owners. Just as CSG helps its customers modernize, it was time to evolve its own operations.
CSG’s vice president of software engineering, Erica Morrison, was tasked with helping CSG’s Operational Engineering team establish a DevOps organization and culture—even though she had previously only worked on the software side of the house.
“It’s a shock to the system for a developer to be suddenly dropped into the ops world, to see front and center the challenges they face,” she says. “It definitely made me more appreciative of what all of our operations teams do.”
Bringing development best practices to what was historically an ops-only team positioned CSG to tackle a host of challenges: leveraging capabilities from F5 that they hadn’t previously used, adding in new tools, and adopting an enterprise licensing agreement that provided needed flexibility.
Erin Garrigan was the scrum master for the team at the time. Now a supervisor, she recounts several initiatives that added up to what CSG envisioned as a five-year or even 10-year plan back in 2016.
“From a technical standpoint, we had too many manual processes around changes, and not enough visibility into who was doing what,” she says. “Many different teams had access to our devices, and we didn’t have the robust controls we needed around that.”
These weren’t the only issues. There was instability across the infrastructure, and a lack of general health monitoring and alerting meant the team was often unaware when devices were in a bad state until they heard about problems from clients. Modernizing the hardware was another important concern.
The team’s first and largest project was getting everything into source code using F5 iApps. Since CSG’s processes had been manual, they started by instituting a nightly export of device configurations, which enabled the team to gain visibility to the configs. Eventually they progressed to a new paradigm where the source code now drives what’s on the devices.
“Within a year we created over 100 iApps from manual changes and the infrastructure-as-code concept,” Garrigan says. “The sheer volume of effort to codify every manual setup into iApps was substantial, but we created some tooling and tackled it over time.”
With the infrastructure now defined as code, the team could break out the functions of its apps. Operational Engineering supports dozens of application teams, in a dynamic server environment that receives multiple change requests per day. Implementing a self-service process allowed internal consumers from CSG’s application teams to use a sandbox BIG-IP device to configure changes, check them, and push them through a pipeline for validation and code review. They also created another tool that allows those users to push the changes into production themselves.
“At that point, we were really offering load balancing and application delivery-as-a-service,” says Phil Todd, CSG’s director of software development. “We use Jenkins to drive most of our automation self-service functions, as well as our reporting functions. And we’ve written some of our own C# code to implement that functionality behind the scenes.”
Driving visibility into those changes was another critical need across CSG’s large and diverse set of applications. In their previously manual world, the application teams could be stomping on each other’s toes with overlapping changes. Finding bad code was a more complex puzzle than it needed to be.
“There would just be something funky in the environment,” Garrigan says. “We wouldn’t know why it was wonky, only that somebody must have done something.”
The manual processes also meant that, once an issue was identified, the team would often have to track down the individual who made the change to fully understand it.
To solve the problem, the team implemented F5 BIG-IQ and began instituting change around the change process itself, introducing automated reports on system health and the overall impacts of changes. They also created a Grafana dashboard to monitor more than a thousand end points to support validation of changes. With their configuration now as code, along with the automation built around deployments, CSG could gain real visibility into all of the changes that had been made.
According to Todd, this has led to one of the biggest differences between CSG’s environment today and the manual processes they had in the past—if a change breaks something, the mean time to repair can now be minutes, whereas before it could be hours while the team investigated and resolved the issue.
“Logging into Kibana logs all of the changes—what version was deployed and what the previous version was,” he says. “So without sitting there and questioning why it's not working, we just deploy the previous version of the code simply by pushing the button in Jenkins.”
The next evolution was to address scalability, flexibility and stability of the infrastructure. Although they were running dozens of physical hardware devices, including F5 VIPRIONS, the majority of applications were flowing through just two: one for external traffic from the Internet and another for internal traffic.
That resulted in large group sizes, which represented a bigger risk to the organization in the event of a failure. “If one of those devices went down, it essentially impacted every product and every customer,” says Todd.
At the same time, CSG’s applications were starting to move into the company’s private cloud as well as the public clouds, but the system had limited ability in terms of expanding into those environments—and it didn’t allow for migration to AWS.
Virtualization was instrumental in tackling these issues as well. Having infrastructure-as-code in iApps provided the flexibility to reduce the overall failure group size, and thereby establish more application-specific load balancing. It also opened the door to ultimately evolve into the public cloud when needed.
“We had an outage recently and it impacted one product,” Morrison says. “A little over a year ago it would have affected all the products. With more instances broken out into smaller groups, we also have the ability to fail over if we need to, and that blast radius is very small. So the investment in stability has already resulted in some good news for our internal and external customers.”
CSG gained further operational flexibility and cost savings by moving from F5 BIG-IP Virtual Editions to a new F5 enterprise licensing agreement (ELA). Formerly the group functioned primarily within one datacenter at the company’s Omaha headquarters. Their disaster recovery solution consisted of simply standing up its services in a third-party data center if events necessitated it.
When the team built out a second data center a couple years ago, they were positioned to explore high availability across those data centers, but their former licensing agreement and physical hardware limited their options. With fixed hardware, the company faced challenges with scalability, availability, and the ability to expand into public cloud.
“Moving to an F5 enterprise license agreement and the virtual solution gave us the freedom to stand up what we want, when we want and add that second data center,” says Todd. “And it’s given us the freedom to explore and be very responsive to our internal customers’ needs.”
Today the team has more flexibility to evolve and achieve high availability for various services.
Having modernized their environment, CSG’s Operations Engineering team now has room to plan for the future. The team wants to tap further into F5 capabilities for security, put together reference architectures for using BIG-IP and Amazon Web Services (AWS), and bring in new capabilities with NGINX. And with an ELA, the company has been able to shift existing budget to new architectures and feature sets.
Morrison is also working with CSG’s product owners on a new roadmap to build on what they’ve accomplished so far, improving system monitoring and alerting, adding new monitors, and developing a center of excellence model for application delivery.
From a big-picture standpoint, she says, it's a nice place to be. At this point the team has realized almost every project from its original five-year vision. With the partnerships, self-service functionality and configurations they’ve built, all that’s left is to hand over more ownership of apps to the app teams themselves.
“For the first time we're saying, what do we do next?” she says. “We've made so much progress that we’re no longer digging ourselves out. Now it’s a matter of being proactive and building on the solid foundation we’ve established.”
