Certifiably vulnerable: Using Certificate Transparency logs for target reconnaissance

X.509 certificates are indispensable for modern-day web browsing. Chances are you are receiving this blog post over a TLS connection, which was successfully initiated because your browser approved the presented certificate.

This approval, in turn, finds its foundation in a chain of trust. This means that either the certificate itself or one of its ‘parent’ certificates is present in your browser’s root store, signifying the browser trusts one of the signing parties.

Unfortunately, trust is a human quality, and humans can fail. Badly. We all still remember the infamous DigiNotar case of 2011, where a Certificate Authority (CA) was compromised and issued malicious certificates for google.com.

Although we cannot fix humans, we can put extra measures in place to minimize the risk of having wrongly issued certificates operational in the wild. In comes Certificate Transparency (CT), a concept introduced by Google in 2013.

A short background on Certificate Transparency

The basic concept of CT is to have a public append-only log that — ideally — collects all issued X.509 certificates.

As an organization, it is possible to monitor these logs, and immediately react whenever a certificate appears for one of its domains that wasn’t requested by the organization. In general, a certificate gets appended to a log by the CA responsible for issuing the certificate.

To incentivize this process, many web browsers now require certificates to be included in at least two different CT logs for the browser to accept (that is, trust) the certificate.

CT logs as a data set

While being able to monitor maliciously-issued certificates is a good thing, publicly logging all certificates unfortunately exposes more data than one might like. Since each certificate is pushed again to the log upon every renewal, an adversary can gauge whether a website is being actively maintained, and hence whether it has been kept up-to-date with the latest security patches.

This inspires the question — can CT logs be used for target reconnaissance?

To investigate this, my fellow researchers and I from the University of California, Santa Barbara, Delft University of Technology, and Max Planck Institute for Informatics executed two large-scale measurement studies. The first, being a passive measurement study, deploys several honeypot websites and examines whether pushing your certificate to one or more CT logs increases incoming traffic to your website.

The second, an active measurement study, explores different techniques to extract vulnerable websites from CT log data.

The following post highlights the key results of our research.

Diagram showing the setup for the passive measurement study. — Figure 1 — Setup for the passive measurement study.

Does CT induce more scanning traffic to your website?

The setup for this is, in essence, quite basic. We deploy a DNS server and populate it with records for a set of randomized domain names. On a separate server, we host each domain and expose a basic website accessible through HTTPS.

Certificate-wise, we evenly split our set of websites into two groups. One group has their certificate pushed to one or more CT logs, and the other group (called the control group) self-signs each certificate. The purpose of the control group is to get a baseline of backscatter and Internet scans that we can filter out of our experiment group. We do this for both IPv4 and IPv6. Once deployed, we simply wait and observe the traffic.

As intuition suggests, the CT group indeed receives noticeably more traffic than our control group. Moreover, when renewing our certificates, a clear surge is observed in the incoming network traffic, both to our DNS server as well as our HTTPS server. Interestingly enough, our control group received 0 traffic on IPv6, whereas the CT group received over 2,700 packets, demonstrating that CT logs are indeed used for IPv6 scanning.

Charts showing incoming DNS traffic over time, both for IPv4 and IPv6. We notice clear surges each time we renew our certificates. — Figure 2 — Incoming DNS traffic over time, both for IPv4 and IPv6. We notice clear surges each time we renew our certificates.

Is scanning websites from CT logs worthwhile?

As mentioned above, having a public log of certificates can reveal more than just maliciously issued certificates. Not only does CT provide a pretty neat and up-to-date list of websites currently in production, but it also reveals certificate renewal cycles, which could potentially expose development patterns.

Observing such development patterns can then aid an adversary in finding potentially vulnerable targets. Concretely, if we observe a website that, after regularly renewing its certificate, suddenly stops requesting a certificate renewal, one could hypothesize that development for that website has ceased. However, if the administrator forgets to take down the website itself, the software stack will become old and unmaintained, eventually failing to implement the latest security updates.

We looked specifically for this type of renewal pattern in the Google Pilot CT log and found millions of websites advertising an expired certificate. Following up on these results, we did a vulnerability assessment of this set of websites and compared it against a control group of websites with valid certificates.

In particular, we studied two CVEs for Apache and two for Nginx and found that the expired group was indeed more vulnerable to all these CVEs compared to our control group, lending more weight to the hypothesis that these websites are unmaintained and have therefore missed the latest security updates.

	Apache			Nginx
	CVE-2018	CVE-2021	CVE-2018	CVE-2021
Expired	69.19%	79.31%	56.16%	89.47%
Control	48.95%	73.42%	34.36%	87.47%

Table 1 — Vulnerability analysis for each experiment group.

Should we abolish CT?

Introducing ways of abusing CT for malicious purposes might suggest the desire for its removal, both as a policy as well as a technology. However, we argue that the benefits of CT still outweigh these costs. For example, the increase of incoming scanning traffic mentioned in our passive measurement study was never of a disproportional size to the extent that it could take down a server or network.

Furthermore, having unmaintained websites online is no direct cause of CT, but rather a problematic issue on the administrator side. As such, administrators should be aware of this technology, and expect a slight increase of scans when having their certificate present in one or more CT logs.

Read the details of this research in our paper.

Author Bio

Stijn Pletinckx is a first-year PhD student in the SecLab at the University of California, Santa Barbara. His research focuses on the intersection of network security and Internet measurements, often incorporating concepts of web security as well. His work aims to empirically study the Internet landscape within a security context.

Thanh-Dat Nguyen, Tobias Fiebig, Christopher Kruegel, and Giovanni Vigna contributed to this work.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.