How I Designed an Open Source HTTPS Checker

Table of Contents

Editor’s Note: F5 Labs is a threat research and analysis team within F5 Networks. As a relatively small team of researchers, evangelists, and writers who produce vendor-neutral threat-related content, we look forward every summer to the opportunity to bring in a college intern to help us with special research and data analysis projects. In the summer of 2020, we were excited to welcome Katie Newbold, a Computer Science major from Johns Hopkins University, to build an HTTPS checker (scanner). This is her internship story.

The letters T-L-S meant virtually nothing to me before starting my internship with F5 Labs. I knew I was coming to intern for a cybersecurity company but in all honestly, I had never even taken a class in anything remotely related to security. I didn’t really know what I was getting into.

Security in University Computer Science

My university computer science curriculum does not offer too many classes related to security. I think this is probably true across many schools in the U.S. because most computer science programs are designed to take in students who have no experience coding and build them up to a fully functional level in just four years. When professors have to introduce core concepts of computer science and coding like building classes, allocating computer memory, understanding different data structures and their use cases, and the importance of commenting your code, there isn’t a lot of time left in the last year and a half of school to cover elective topics like secure coding or web security in general. Therefore, a lot of learning about cybersecurity comes from hands-on experience, like internships.

Meeting F5 at the Grace Hopper Conference

I came to F5 through Grace Hopper Celebration, the largest annual conference for women in computer science in the U.S. with over 30,000 attendees. The conference is a whirlwind of networking, interviews, more networking, and workshops. It’s a completely unique space to be in, and incredibly uplifting because, for many of us, it is the first time that we are surrounded by only women in the tech space. Technology, and especially security, is very male dominated, which, by nature, can make it less welcoming or appealing to women. Having the opportunity to connect with other women who had the same goals and aspirations as me and speak with companies and recruiters who were passionate about increasing the diversity of their workforce was a very impactful experience.

It was through the recruiters that I learned about F5 and initially came to love the company because of its dedication to female leadership and the Girls Who Code program, which helps foster a love of technology for young girls in the Seattle area. I knew after my interview for the F5 Labs team that I would be joining a very unique group of security experts and I was excited about the company’s dedication to the interns and making it an educational and rewarding summer experience.

My First Days at F5

The first few days of work, I spent a lot of time trying to figure out what TLS was and how it worked. This included teaching myself what I could about computer networks and learning about the TCP/IP model. David Warburton, senior threat research evangelist on the F5 Labs team, spent multiple hours with me on Zoom doing whiteboard drawings and answering questions like, “So, how do the server and client agree on the same key again?”

The F5 Labs TLS Telemetry Report

David has over 20 years of experience in IT and security. In 2018, F5 Labs’ annual TLS Telemetry Report became his responsibility and he was tasked with collecting the data that goes into the report. After some exploration into many tools, he found that SSLyze, an open source SSL/TLS scanner, provided the best information for the report. However, it was not designed to handle hundreds of thousands of scan requests. If one scan crashed the program, all the data would be lost because the data is not written to individual files. Using SSLyze, he was able to collect enough data for the 2018 report, but he needed a more appropriate and efficient tool to gather the amount of data needed for the 2019 report.

Looking at the SSLyze Library

Thus began the journey to build a new open source HTTPS checker/scanner, Cryptonice (unnamed at the time). David introduced me to the sslyze project on GitHub and described what he was looking for in a new TLS scanner. First and foremost, the output needed to be improved so that it could be read into a database (that is, it needed to be single-lined), and the scanner needed to be able to handle more than one domain name at a time. Reading someone else’s code is never super easy, and I initially spent most of my time trying to figure out how the sslyze code worked. The API documentation was helpful, and link-following on GitHub was key to increasing my understanding of how the whole library worked together.

Down to Business

Four days after my first day of work, I dove into the code. With David being in the U.K., eight hours ahead of me, I spent a large amount of time online googling concepts and terms related to TLS and sslyze. My list of things to google looked like this:

Key derivation and key stretching
DTLS = Datagram TLS
- TLS for UDP
AES (Advanced Encryption Standard) encryption protocol
- Authenticated symmetric encryption
Elliptic Curve Cryptography
Merkle-Damgard
SHA-1
SHA-2 (most common)

I spent a solid month or more working on the scanner. If you’re interested in reading about my day-by-day tasks and progress, jump to my daily entries.

Code Complete

Once the library was live and code was at version 0.0.3, we turned our attention to promotion of the tool. We recorded a presentation for the internal F5 Global Tech Summit and wrote articles for F5 Labs (introducing Cryptonice) and DevCentral (how to use Cryptonice). We presented in another webinar for the Americas on July 23, which generated a lot of engagement with Cryptonice and helped us identify bugs and enhancements. The library was initially not compatible with Linux systems, so we ironed that out and made a Windows executable.

Lessons Learned

Building Cryptonice has taught me a lot of new skills, both technical and non-technical. I learned a lot about TLS and how the handshake works, and what the information stored in a certificate tells you. I understand the importance of encryption on the web for simple data protection. There are thousands of data points that are transferred in any connection, and attackers are constantly trying to extract data from insecure connections. TLS provides an important frontline defense against attacks.

Building a Python Library

I also learned which components of code are necessary to build a complete Python library. Piecing together the library took a lot of google searches and finding answers to related problems on Stack Overflow, extrapolating a little bit and trying out a solution I’ve guessed. Trial and error testing is a huge part of code development because some solutions work while others don’t. Oftentimes I couldn’t even figure out why. I benefitted a ton from pinging Malcolm Heath, another senior threat research evangelist on the F5 Labs team, or scheduling a meeting with David or Malcolm to verbalize the roadblock I was up against. Sometimes describing the problem out loud crystalized the solution, and hearing suggestions and getting a fresh set of eyes on the issue made the problem-solving less frustrating.

Appreciating the Python Universe

I also came to appreciate the Python “universe” and the seemingly infinite number of packages that make Python development possible. Cryptonice itself makes use of sslyze, nassl, tls-parser, urllib3, cryptography, pathlib, and dnspython, to name a few packages. The interconnectedness of Python packages can cause problems, like the incompatibility issues we ran into with sslyze and nassl on some Linux systems, but it can also help streamline the code because a lot of helper functions already exist.

Released to the Universe as Open Source

I was also able to solidify my understanding of all the concepts related to building Cryptonice because David and I did a lot of presentations on it. Being asked to talk about the tool pushed me to speak succinctly and in a straightforward manner to make the project understandable to others, even people who don’t code themselves.

Back to School

Working with the F5 Labs team on Cryptonice has been not only a fantastic learning experience but also a very formative one. I felt personally responsible for problems that arose in the code because the project was largely my work, which pushed me to be a better coder and problem solver. Cryptonice will hopefully continue to be a beneficial tool in the security space to examine TLS configuration, and I’m proud to have worked on a new open source project. May the code continue to improve through the work of both myself and others! If you haven’t seen it yet, please read our article, Introducing the Cryptonice HTTPS Scanner.

My Developer Diary: Stardate May 22, 2020 through June 26, 2020

Friday, May 22

Update: Using the api_sample.py starter code from sslyze, we can now read in a csv plaintext file with a list of URLs, connect to each hostname and scan for the desired information (currently just getting raw certificate information), and write the results of the scan out to a JSON file that is titled web_ip.json (address varies per website obv). Also got rid of all the newline characters that were making the certificate un-decodable. Updated file is on the tlsscanner GitHub as well as some of the sample results.
Next steps: Parse the raw certificate into all the information that it generates and add the relevant key-value pairs to the data dictionary so it can all be written out in one long (one-line) JSON file

Tuesday, May 26

Update: Trying to figure how the sslyze code parses the certificate and spits out the information like SHA1 Fingerprint, Common name, issuer, serial number, etc. I think the code I need is in the CertificateDeploymentAnalysisResult class and I thought I would get the information through an OCSP Response object, but a lot of the websites don't provide that information I guess so I'm kind of confused as to how else it is being derived. The sslyze documentation says that all of the certificates are parsed using the x509 module in the cryptography library, so if sslyze becomes too confusing I might just turn to that. However, I'm not sure how to create a Certificate object because I essentially want to send it a raw certificate and have it give me back the information like version, status, etc.
Next steps: Figure out best way to parse the raw certificate --> maybe David will know how to do this? Either through sslyze code and classes or by redoing the x509 stuff...

Wednesday, May 27

Update: Figured out how to parse the certificate (using x509.Certificate stuff) and get information from it like not_valid_before, not_valid_after, SHA, serial number, and each of those items are stored as a key:value pair in the JSON output. Output also collects data on compression, injection vulnerabilities, heartbleed, early data support, etc. Big wins for today!
Next steps: Restructure the api_sample.py file (and also probably rename it) to have helper functions and a much smaller main function. Check in with David to see if he has requests for the JSON output.

Thursday, May 28

Update: Reformatted JSON output to get rid of \ character before every double quote as well as redundant quotes at beginning and end (issue lay in using the json.dumps format to handle datetime object, ended up fixing it by just converting the datetime object to a string before adding it to the dictionary. Also tried to restructure api_sample.py today but found that restructuring it made it run a lot slower (not entirely sure why this is, but I think queueing all of the scan commands is costly. Ex: running original structure takes about 1:30 for 6 hostnames, but restructured file takes 4:30 for same 6 hostnames)
Next steps: Still need to break file into smaller helper functions but need to double check how objects will be passed between functions as to not lose data.

Friday, May 29

Update: JSON output is drastically improved with more data like IP address, preferred cipher suite, highest TLS version handled, and outputs information for individual certificates (program was previously overriding data in the dictionary). Certificate information also handles certificate subject common name. The program also reads in parameters and targets from the same JSON file (see server/samplejob_tls2.json for example file) but is not yet able to have the parameters dictate what commands the program actually runs. Ditched the csv file with a list of hostnames to instead just include them as a target in the JSON file.
Next steps: Need to get the SubjectAlternativeName data for each certificate. Get the program to cleanly figure out which parameters we actually want it to run (ideally I won't have to use some huge if-else block but I'm not really sure), and decide what unique identifier we will want for the output so that it can be better inputted into an ELK stack (see notes with David and Malcolm for more info). Also, I still want to split the file into multiple files or just functions so that it's not one function that is 300 lines long. Maybe I can have a chat with Malcolm about best practice in python to make this happen. I don’t need any extra classes or anything so maybe I just need to split the work into more functions rather than files? Also now that we are playing around with providing scan commands, we might want to take a look at some of the other command line stuff that SSLyze can handle like –slow_connection or –sni=SNI (idk if this is necessary or not, ask David's opinion)

Monday, June 1

Update: More meetings than normal today so not a ton of time to work on the code. Nonetheless, restructured the code a little to move some chunks into smaller helper functions that are called from main. Still working on getting the scan command list to dynamically grow or shrink based on user input. Added curve name for EllypticCurveKeyAlgorithm.
Next steps: add meta data for each scan to the json output (see issue on GitHub for suggestions from David), figure out how to access the subject alternative names (buried somewhere in an OID extension for the certificate)

Tuesday, June 2

Update: Scan commands can now be included in the JSON input file and program will only run with the ones desired. Restructured main function more to move some of the code into helper functions. Added SAN (admittedly in raw format) so that can be captured along with other certificate information. Added scan meta data at bottom of JSON output.
Next steps: Clean up the SAN data a little more if possible, and ask David for further instruction!

Wednesday, June 3

Update: Added extra certificate info, parsed SAN, moved optional tests to separate block. Added session_renegotiation, session_resumption, session_resumption_rate and http_header results to file.
Next steps: Might want to break main function down a little more as it grew a lot today. Include at top to check for www. Prefix and add it if it doesn't exist. Add geolocation for IP (make a helper function)

Thursday, June 4, and Friday, June 5

Update: Integrated TLS Scan, HTTP Headers scan and created a redirection function to handle any redirects before sending an update hostname to either web scan function. Fixed some bugs with the JSON output to restructure it a little.
Next steps: Integrate DNS testing and HTTP2 testing into the main function to run those tests as well. Shouldn't be too hard. Look at other bugs that David put on GitHub and go through the list to fix them!

Monday, June 8

Update: Integrated DNS and HTTP2 testing (though my computer doesn't have any certs stored on it so http2 doesn't work), fixed parsing errors related to cookies and Alt-Svc. Code is basically ready to be put on AWS and tested to get some sort of gauge on pricing.
Next steps: Hash out details regarding redirection with David. Currently it seems as though we might not be able to cleanly catch redirection data. Integrate virtualenv into the code so we can avoid issues with one computer vs another (ie Katie not having the necessary certs)

Tuesday, June 9

Update: Fixed little errors/things in code but it is largely done. Decided with the help of Malcolm that we probably don’t have to worry about hostnames in languages that aren't English because the utf8 encoding of everything in Python 3 can probably handle it.
Next steps: Play around with virtualenv to figure out how that works, also S3 buckets.

Thursday, June 11

Update: Added S3 capabilities (sending data to scan-tls-us bucket right now). Further discussion needed about splitting the code into more modules to handle mass scan vs ad-hoc and where we might want to send data for each type. Added "preferred_cipher_suite" for each of the cipher suites as additional data for the JSON output. Looked into turning the program into a Python library. Started a TLS Scanner write up (something that can eventually be published along with the code/link to code or whatever we end up producing) outlining the importance of the work, why we created a new scanner and how we went about doing that.
Next steps: Possible creation of a Python library to allow for separate use cases? Package would include all the modules as well as the main scanner.py function, and then users could spend their time just getting the output to be the way they want it (into S3 bucket for internal purposes, into HTML/JSON/something else for individual domain name scans). Discuss with David the intricacies of S3 output (like how we want to specify where it goes based on location (side note, what if we don’t have the geolocation data?), subdirectories within the bucket, adding a UUID, etc).

Friday, June 12

Update: Created a library (moved relevant files into new folder and worked with those) and uploaded it to test.pypi.org to try to test it out but got some errors with that. Named it f5labsscanner, but I'm not sure how to access any of the information? So maybe I built it wrong…
Next steps: Make the library actually work, create the Lambda function deployment package for the other version of the scanner.

Monday, June 15

Update: Created f5labsscanner Python library (currently hosted on test.pypi.org and not pypi.org), waiting for the go-ahead to do that in case there are legal issues with the name or something. Started doing testing to find bugs in the code and fixed some errors as they came up. Scanning the full suite of (TLS) tests is definitely a time-consuming operation. Only got through about 75 sites in roughly an hour. We will definitely need multiple instances of this function running to get through a mass scan in any reasonable amount of time!
Next steps: Continue doing error checking and look into Jupyter notebook stuff for data display.

Tuesday, June 16

Update: Trouble shooting with David regarding redirects (he's working on the logic to get it right), so testing was momentarily halted for the day. Changed library input so that the user only has to specify the domain name, everything else is added by default. Produces JSON output and now I am working on formatting the input into some HTML file. Currently thinking of getting the necessary data into a pandas DataFrame and the putting it into HTML format, but this doesn't really allow for a lot of customization so it will probably look pretty ugly at the end of the day. I also looked into Jupyter for data display, but it doesn't seem to be more streamlined than the DataFrame (though it is probably a little prettier and there is the ability to add markdown text. Worth a longer discussion with the team.
Next steps: Pick testing back up again to get the code to a good place where it can fail gracefully for almost any scenario.

Thursday, June 18

Update: Generated cleaner console output and updated library to have some more of the error checking results. Date put on calendar for TLS webinar.
Next steps: Probably need to finish testing, need to write some documentation for the code (what the functions do, how to call them, etc.).

Tuesday, June 23

Update: added command line input capabilities to the code and updated the README. Tried installing it into a virtual environment but there are issues with the dependencies (like if I didn't already have them on my computer)
Next steps: Fix the issue with the dependencies.

Wednesday, June 24

Update: Did more testing of the library to fix bugs and worked on slides for the Tech Summit presentation.
Next steps: Publish the library to PyPi.

Thursday, June 25

Update: After significant waffling, we officially named the library cryptonice! Published the library to PyPi, and all dependencies work (tested in a virtual env so there was a clean slate). Moved library code into public repo on F5-Labs GitHub name so it can be publicly available and downloaded.
Next steps: presentations (and minor bug fixing as they come up)

Friday, June 26

Update: Removed psutil and added some more dependencies into requirements.txt so that it would build. Based on team testing, it seems to be working OK. Changed input a little so that default is just the domain name, and customization comes if any of the other commands are specified.
Next steps: Work on articles regarding the project to be published on Dev Central, F5 Labs, etc.

July

Update: The library is live on PyPi and we’ve done two live presentations. One presentation was for the webinar for the Americas, and the other was on Buu Lam’s YouTube show. After the webinar, we were alerted to issues in compatibility with Ubuntu and CentOS, and also some problems with package dependencies that the sslyze library had.
Next steps: Make library compatible with Linux systems and build the windows executable. Complete AWS Lambda Cryptonice construction.

Recommendations

HTTPS is used everywhere now. This means more ciphers, keys, and certificates to manage. And with the increasing adoption of DevOps, the speed of change and deployment is constantly increasing. Encryption standards are constantly evolving, so it’s crucial to stay up to date with current best practices. Many privacy and security gaps still exist, even when TLS is deployed correctly. Keep your TLS current and configured correctly and you can use Cryptonice to verify that.