Finding new ways to collect information about a network and limit the meta-data exposed to others is a constant struggle we see in research as this data can be used for both benign and malicious intentions. In this article, I’ll show how exposed data can even be used to identify potentially malicious servers.
In our paper (PDF) my fellow researchers and I from the Technical University of Munich, Germany, and researchers from the Huawei Munich Research Center, Germany, used the Transport Layer Security (TLS) stack (the combination of configuration, implementation, and hardware) to fingerprint servers. The paper was recently presented at the Network Traffic Measurement and Analysis Conference (TMA) 2022. We used the Transport Layer Security (TLS) stack (the combination of configuration, implementation, and hardware) to fingerprint servers. We fingerprinted 28 million servers from top- and block-lists over 30 weeks and were able to achieve a precision of more than 99% to detect Content Delivery Network (CDN) and Command and Control (CnC) servers.
We assume active TLS fingerprinting will be used in the future to:
- Enhance existing intrusion detection systems: Servers from network flows are fingerprinted on-demand and compared against a fingerprint database of known malicious actors.
- Internet-wide measurements: Security researchers use fingerprinting to find previously unknown threats.
- Monitoring own servers: Deviations from a fingerprints baseline can indicate an unintended software change or even a malware infection.
TLS fingerprinting is an approach to collecting TLS characteristics and creating a database that maps the fingerprints with something not directly related, such as:
|771_1301_…||Apple web server|
|771_1302_…||Default Nginx docker image|
|770_cf_…||TrickBot CnC server|
The TLS protocol has grown into a complex ecosystem; therefore, a lot of meta-data relating to client and server capabilities needs to be exchanged during the handshake. This has been used in several related works to fingerprint clients in passive network flows. However, we can also fingerprint the server by actively conducting a handshake, as seen in Figure 1.
It turned out that we did not get enough information from servers performing a single handshake. Therefore, we performed 10 handshakes per server, each with one of the general-purpose Client Hellos (CHs) that we empirically designed in an experiment.
The final server fingerprints are the truncated features extracted from each handshake, as shown in Figure 2:
CDNs are perfect to evaluate our approach because they are a single actor deploying TLS servers on a large scale. This is exactly what we want to detect, and they provide us with a large number of verifiable data samples. For this measurement, we combined Autonomous System (AS), HTTP header, and x509 certificate data to generate a ground truth. Then, we assigned a server to a CDN if it had a previously observed fingerprint. This assignment was unambiguous because the fingerprints did not overlap. Figure 3 shows the results of this classification:
We used precision and recall performance metrics. Simplified, ‘Precision’ indicates the rate of correct detections, and ‘Recall’ is how many CDN servers from the ground truth we were able to detect. The detection worked surprisingly well for Cloudflare and Fastly. We assume the lower metrics of Akamai and Alibaba are due to having more diverse networks because they provide additional services in their networks than other CDNs.
Interestingly, this approach enabled us to detect several CDN servers outside of the CDN networks. Sometimes, this was expected (Akamai) and sometimes not (Cloudflare). For a few of these off-net servers, we saw indicators of third-party reverse proxies tunnelling the traffic to Cloudflare.
Detecting CnC servers was a bit more complex because we could not assign fingerprints unambiguously to a CnC server most of the time. Therefore, we computed a score ∈ [0,1] to define how certain we are that the detection was a CnC server. This score reflects how often we saw a fingerprint from the top-list versus block-list servers.
If the score was above a tuneable threshold, we classified the server as a CnC server, as shown in Figure 4.
Figures 4 and 5 show that CnC servers are detectable with fingerprinting and that it works even better when combined with additional data sources, such as HTTP headers. This is how we would expect such fingerprinting to be used — in conjunction with all the information known about a server or as a single feature in a more complex detection pipeline.
How to actively fingerprint TLS servers
If you want to perform large-scale active TLS fingerprinting, several aspects should be considered. The tools, available resources, the impact on the scanned servers, and how to build the fingerprint database.
Currently, we are not aware of any up-to-date and publicly available database that contains malicious fingerprints. So, this database must be built by yourself.
If resources are not an issue, TLS server debugging tools like testssl.sh can collect fingerprintable information. We observed testssl.sh performing hundreds of handshakes per server and found the output would need to be sanitized, as it (and presumably other tools) sometimes neglects fingerprintable implementation-specific features like the extension order and needs a very large number of requests. Therefore, using these tools to conduct Internet-wide scans would be time-expensive and ethically questionable, as the scan may be interpreted as a Denial of Service (DoS) attack.
JARM is a related open source server fingerprinting tool published by Salesforce that can actively fingerprint the TLS stack. Like us, they use ten TLS handshakes for fingerprinting and at the time of writing, censys.io is fingerprinting with JARM. However, the effectiveness of JARM is not ideal and can be improved. Therefore, we implemented our own approach and extended the TUM goscanner with TLS fingerprinting functionalities. We compared the effectiveness of both tools, as shown in Figure 6:
In Figure 6, we can see that both the empiric design of CHs and the additional features we extract result in a more fine-grained differentiation among servers. Our procedure also results in uniquely identifying more CnC servers.
You can test this for yourself as we have open-sourced our TLS server fingerprinting tool.
The tool is designed to be part of a more comprehensive measurement pipeline, in which massive DNS resolutions are offloaded to suitable tools and a state between handshakes is omitted for parallelization.
This post has shown how TLS stack fingerprinting can be efficiently conducted and applied to provide valuable security-related insights. For more detail, read the paper. Let us know if you have any questions about the research, or the tool, in the comments below.
Markus Sosnowski is a Research Associate at Technischen Universität München with an interest in network architectures and technologies, network modelling, trustfulness of network entities, and Internet-wide measurements.
This article is based on a research paper by Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, Georg Carle (TU Munich), Claas Grohnfeldt, Michele Russo, and Daniele Sgandurra (Huawei Technologies Munich). The paper received the Best Paper Award at the Network Traffic Measurement and Analysis Conference (TMA) 2022.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.
Why you didn’t take into account the tls_prober tool?