I cannot imagine that the creators of SSL (Secure Socket Layer) and its successor, Transport Layer Security (TLS), had the current scale of SSL/TLS in minds back in the 1990s, let alone knew how much we would rely on it to secure our daily affairs.
The Public Key Infrastructure (PKI) for the web has outgrown the scope of its initial design, and the number of trusted root certificates included in common root stores is counted in the hundreds. The number of trusted intermediate certificates signed by these roots is even tougher to keep track of, as the intermediate certificates are presented to TLS clients during the handshake instead of being stored on the client computer.
Getting an overview of the certificates signed by all these roots and intermediates has traditionally not been feasible, as there is no public record. To make matters worse, not all certificates are reachable over the Internet and therefore are not detectable by scanning the public IP space.
This situation has led to problems, as certificates sometimes do get issued without the consent of the domain owner.
Famously, back in September 2011, there was an incident where the Dutch certificate authority DigiNotar was hacked. The attackers used the CA to issue hundreds of fraudulent certificates that could be used for man-in-the-middle attacks (or other malicious purposes). At the time, there was no independent method for tracking what certificates had been issued.
Four years later, in September 2015, Google called out Symantec for issuing unauthorized certificates for Google domains. The ensuing investigation revealed thousands of mis-issued certificates and led to restrictions on Symantec certificates in Chrome. The incident also had heads rolling within Symantec. The incident was brought to light through something called Certificate Transparency, a project designed for exactly this purpose and recently deployed by Google (among others).
So what is Certificate Transparency (CT), and how does it work?
The idea behind CT is beautifully simple. With CT, all TLS certificates are logged in public log servers, and clients refuse to honour certificates that are not present in at least a subset of trusted logs. These logs provide a record of certificates that are issued and, also very importantly, helps to identify certificates that are not issued.
Public certificates that are not logged should not exist, and if they do they should not be trusted by clients. This lets domain owners verify that they are the only ones in possession of TLS certificates for their domains (or address the situation if they are not).
Google has taken the lead in pushing the adoption of CT, and the fact that they have influence over some of the largest domains on the Internet as well as through being a major browser places them in an almost unique position to do so. They are driving standardization (RFC6962), browser enforcement, and also operate several of the largest public log servers for CT.
At the time of writing, Google and Mozilla have CT policies, with varying degrees of implementation and enforcement in their respective browsers. Both policies require certificates to be logged to CT logs operated by multiple organizations.
Certificate Authorities are hereby incentivized to operate their own transparency logs or to be forced to use the logs of their competitors. Logs control which root certificates that have submitted certificate chains need to be rooted in, and refuse submissions with other roots. Therefore, logs can refuse to log entries from competitors, although a more generous acceptance policy is encouraged.
Public logs are increasingly becoming important
A few patterns appear in how the public logs are used based on the number and type of entries, as well as development over time.
Notably, the Google-operated logs are populated by crawling the Internet and adding any certificates that are encountered. This type of log population does not really serve any purpose when it comes to validating certificates for TLS sessions, although they can be valuable for assessing certificates available on the Internet.
The remainder of logs are mainly operated by Certificate Authorities and show a pattern of starting small with a number of test certificates, and then being populated with a large share of Extended Validation (EV) certificates. The large share of EV certificates is explained by the fact that it is the type of certificates that the Google CT policy requires to be logged in order to show the added visual cues to users of their browser.
CT logs are a great source of data for the certificates used on the Internet, without the need to monitor traffic or crawl public servers. As the logs fill up with more data over time, they will become increasingly important data sources for researchers. Just imagine if there was a record of all certificates ever seen on the Internet!
This work is part of a larger study that was presented at PAM2017 [276 KB].
Josef Gustafsson is an independent researcher working in collaboration with The Institute of Technology at Linköping University, Linköping, Sweden.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.