Am I talking to you?
In a networked world that’s an important question.
For example, where I’m located, when I look up the DNS name
www.google.com I get the IPv6 address
2404:6800:4006:813::2004. This implies that when I send an IPv6 packet to this destination address I will reach a Google-operated server. Right? Well, most of the time that’s probably a reasonable assumption but it’s not always true. A packet’s adventure through the Internet is well beyond my direct control and I have no idea if my packet might be captured and sent elsewhere, accidently, or maliciously. This risk is true for every online service.
Of course, the implications for such potential misdirection vary according to the nature of the service. So, let’s get personal. What about my bank? When I enter the URL of my bank, how do I know the resultant session is a session with my bank?
Most of the time and for most users the Internet ‘just works’, so it may seem like a silly question, but there are times when trust turns into credulity. Exactly what are you trusting?
Let’s look at the DNS transforms for some banks as an example, with the provision of services:
Commonwealth Bank: www.commbank.com.au. 3599 IN CNAME prd.akamai.cba.commbank.edgekey.net. prd.akamai.cba.commbank.edgekey.net. 300 IN CNAME e7049.x.akamaiedge.net. e7049.x.akamaiedge.net. 20 IN A 126.96.36.199 Westpac: www.westpac.com.au. 10800 IN CNAME dplenhq18279.cloudfront.net. dplenhq18279.cloudfront.net. 60 IN A 188.8.131.52 dplenhq18279.cloudfront.net. 60 IN A 184.108.40.206 dplenhq18279.cloudfront.net. 60 IN A 220.127.116.11 dplenhq18279.cloudfront.net. 60 IN A 18.104.22.168 ANZ: www.anz.com.au. 1200 IN CNAME jtyncho.x.incapdns.net. jtyncho.x.incapdns.net. 30 IN A 22.214.171.124 NAB: www.nab.com.au. 7200 IN CNAME www.nab.com.au.edgekey.net. www.nab.com.au.edgekey.net. 300 IN CNAME e11252.x.akamaiedge.net. e11252.x.akamaiedge.net. 20 IN A 126.96.36.199
In none of these cases does the bank’s DNS name map directly link an IP address to servers that are part of the bank’s name space. Each of these banking services use platforms operated by Akamai, Amazon’s Cloudfront, or Incapsula. No doubt there are good reasons why the banks choose to outsource their service platforms to a hosting company, including DDOS defence and cost-effective service resiliency in a hostile environment. The days of running your own services on in-house operated infrastructure are long gone for these online service enterprises.
The question remains — how can these cloud service platforms convince my browser they are acting on behalf of my bank?
The way trust is inferred in this situation is through public key cryptography:
- The service provider signs a certificate signing request containing the public key part of a public/private key pair and passes this request to any one of several trusted Certification Authorities (CAs).
- The CA checks the validity of the service provider and the domain name that it wants to be associated with.
- If the CA is satisfied it will sign a public key certificate with its own private key and pass this certificate back to the service provider.
My browser opens a secure connection to the named service using the Transport Layer Security (TLS) protocol. In the initial part of the TLS, the browser will tell the server the original domain name my browser wants to establish a connection with, via the Server Name Indication (SNI) field in TLS. In response, the server provides my browser with a copy of this public key certificate and encloses a piece of data encrypted using the associated private key.
This is communicating with an endpoint that has knowledge of the key pair’s private key that the trusted CA was prepared to associate with the domain name of the service. If this service delivery endpoint demonstrates it has control of the named service’s private key, the browser is prepared to believe that this is the genuine service regardless of whatever aliasing transforms were applied in resolving the DNS name and wherever the IP address is hosting the service.
What could possibly go wrong?
Well, because you asked:
- We are often incapable of keeping a secret. Anyone who learns your private key can impersonate you and nobody else can tell the difference.
- The relationship or authority that a public key certificate is supposed to attest might be subverted and the wrong party might be certified by a CA.
- The CA might have been hacked, compromised, socially engineered, or otherwise coerced to issue certificates to the wrong party under false pretences. Nobody else witnesses the transaction between the applicant and the CA that results in a certificate. We just have to trust that the CA is operating with integrity.
- The architecture of the distributed trust system used by the Internet’s PKI makes the system itself only as trustable as the most vulnerable CA. It doesn’t matter how good your CA might be. If every user can be duped by a falsely issued certificate from a corrupted CA, then the damage has been done.
- Trust Anchors (TAs) are distributed ‘blind’ by browser vendors and are axioms of trust. Things certified by a trusted CA are automatically valid. Where a CA fails to maintain robust procedures and issues certificates under compromised conditions, we as users and end consumers of the security framework are exposed without any conscious buy in. It just happens as a function of the existence of the CA’s point of trust in our system’s software. We either must be experts to know how to flush these out or rely on others to help us update our circle of credulity.
- When certificates are issued in error there is also the ability for the CA to revoke a certificate. However not all clients check the revocation status of a certificate, so my client might still trust a certificate that the CA has revoked. There is also the issue of how to manage the situation of a corrupted CA that issues spurious revocations as part of a denial-of-service attack.
Each time these incidents occur we blame the errant CA. Sometimes we eject them from the trusted CA set, in the belief that these actions will fully restore our collective trust in this obviously corrupted framework. But there is trust and there is credulity, and here we’ve all been herded into the credulity pen.
We’ve seen two styles of response to these structural problems with the Internet’s PKI. One is to try and fix these problems while leaving the basic design of the system in place. The other is to run away and try something completely different.
The fix it crew have come up with many ideas over the years. Much of the work has concerned CA ‘pinning’.
The problem is that the client doesn’t know which CA issued the authentic certificate. If any of the other trusted CAs have been coerced or fooled into issuing a false certificate, then the user would be none the wiser when presented with this fake certificate. With around one hundred generally trusted CAs out there, this represents an uncomfortably large attack surface. You don’t have to knock them all off to launch an attack. Just one. Any will do. This vulnerability has proved to be a tough problem to solve in a robust manner.
The fixers want to allow the certificate subject to be able to state, in a secure manner, which CA certified them. That way, an attacker who can successfully subvert a CA can only forge certificates issued by this subverted CA. Obviously, it doesn’t solve the problem of errant CAs, but it limits the scope of damage from everyone to a smaller subset.
The various pinning solutions proposed so far rely on an initial leap of faith in the form of ‘trust on first use’.
One deployed pinning solution is effective, namely the incorporation of the public key fingerprint for several domain names into the source code of Google’s Chrome browser. While this works for Google’s domain names when the user is a Chrome user, it doesn’t work for anyone else, so it’s not a generally useful solution to the pinning problem inherent in a very diverse distributed trust framework.
HTTP Public Key Pinning (HPKP) (RFC 7469) requires a hash of the ‘real’ public key to be included in the delivered web content. If you trust the web content you could trust the key. If you trust the key, then you could trust the web content. As the RFC itself conceded, it’s not a perfect defence against man in the middle (MITM) attacks, and it’s not a defence against compromised keys. If an attacker can intrude in this initial HTML exchange, then the user can still be misled.
Rolling your pinned key also implies rolling the content that references the key or upgrading the entire set of deployed applications that incorporate the hash of the key.
Certificate Transparency uses a different approach that requires all valid certificates to be published in a public log. Without evidence of such publication in the form of a Signed Certificate Timestamp (which the browser is supposed to validate through Online Certificate Status Protocol (OCSP) stapling in a TLS extension, or from a SCT embedded in the certificate), then the certificate should not be regarded as trustable. This does not directly address the pinning problem by itself, nor does it address the rogue CA problems, but it is intended to expose rogue actions that would otherwise go undetected.
We can confidently predict that errant CA incidents will continue to occur. But the worrisome observation is that the CA space is changing.
Rather than many CAs each with a proportionate share of the total volume of issued certificates we are seeing aggregation and consolidation in the CA space. If the strength of the CA system in this PKI is the number and diversity of CAs, then this particular strength is being rapidly eroded, and the CA space is rapidly centralizing toward just one CA. Obviously, that’s the Let’s Encrypt CA, and their product is short-term free certificates based on automated proof-of-control tests.
Perhaps this value destruction in the CA space is not only inevitable but long overdue. Users are generally completely unaware which CA issues a certificate, and a good case can be made that they shouldn’t need to care. If the user can’t tell the difference between a free CA using automated checks and an expensive CA that may (or may not) have performed more rigorous checks, then why bother with expensive tests?
If our entire security infrastructure for the Web PKI is based on the convenient fiction that spending more money to obtain precisely the same item somehow imbues this item with magical security powers, then this PKI is truly in a bad place.
The DNS is truly magical — it’s massive, fast, timely, and it seems to work despite being subject to consistent hostile attacks of various forms and various magnitudes. And finally, after 20 years of playing around, we have DNSSEC. When I query your DNSSEC-signed zone I can choose to reassure myself the answer I get from the DNS is authentic, timely, and unaltered. The only thing I need to trust is my local copy of the root zone KSK key — not a hundred or so trust points that don’t back each other up, creating a hundred or more points of vulnerability — a single anchor of trust.
The DNS is almost the exact opposite of the PKI. In the PKI, each CA has a single point of publication and offers a single service point. The diverse nature of the Internet PKI means CAs do not back each other up and avail themselves of massively replicated service infrastructure. When I want to phrase an OCSP query I can’t ask any CA about the revocation status of a given certificate. I must ask only the CA that issued the certificate. The result is many trusted CAs but a very limited set of CA publication points, each of which is a critical point of vulnerability.
The DNS uses an antithetical approach. A single root of a name hierarchy, but with the content massively replicated in a publication structure that avails itself of mutual backup.
DNSSEC has a single anchor of trust but with many ways to retrieve the data. Yes, you can manage your zone with a single authoritative server and a single unicast publication point and thereby create a single point of vulnerability, but you can also use multiple secondary services, anycast-based load sharing, and short TTLs, giving the data publisher some degree of control over local caching behaviours.
Let’s not put these public keys in a PKI as a derivation of trust. Instead, let’s put these public keys in the DNS. After all, the thing we are trying to associate securely is a public key to be used in TLS with a domain name. Why must we have these middleware notaries called CAs? Why not just put the key in the DNS and eliminate the intermediary?
DANE was always going to be provocative to the CA industry. Predictably, they were vehemently opposed to the concept. There was strong resistance to adding DANE support into browsers: DNSSEC was insecure, the keys used to sign zones were too short, but the killer argument was ‘it takes too much time to validate an DNS answer’. Which is true. The query intense DNSSEC validation process operated at a time scale that set new benchmarks in slow application behaviour. It wasn’t geologically slow, but it certainly wasn’t fast either. No doubt a DNSSEC validator could’ve been made a little faster by ganging up all the DNSSEC validation queries and sending them in parallel, but even if it did this there would still be a noticeable time penalty to perform DNSSEC validation.
In response to this criticism, the DNSSEC folk came up with a different approach to validation.
Rather than parallel queries, they proposed DNSSEC chained responses as additional data (RFC 7901). This approach relies on the single DNSSEC trust anchor. Each signed name has a unique validation path so the queries to retrieve the chain of interlocking DNSKEY and DS records are predictable. It’s not the queries that are important, nor to whom these queries are addressed. What’s important are the responses. Because these responses are themselves DNSSEC-signed it does not matter how the client gets these responses as DNSSEC validation will verify that these are authentic.
It’s quite feasible for an authoritative server to bundle these responses together and hand them back with the response to the original query as additional data. Chaining is a nice idea as it removes any additional query overhead for DNSSEC validation. However, it’s likely that these chained responses will be large, and very large DNS responses that rely on UDP fragmentation become a point of strain. Yes, UDP truncation and TCP fallback can work, but that adds a further two round trip times for the DNS transaction. If the entire point of the exercise was to make all this faster, then it hardly appears like forward progress! However, with the current fascination with TLS variants of DNS over TLS (DoT), DNS over HTTPS (DoH), and DNS over QUIC (DoQ) adding a chained DNSSEC validation package, additional data in a TCP/TLS response would be quite feasible.
However, the DNS has gone into camel-resistance mode these days and adding new features is being regarded with suspicion bordering on paranoia. So far, DNS vendors have not implemented RFC 7901 support, which is a shame because eliminating the time penalty for validation makes the same good sense as multi-certificate stapled OCSP responses (RFC 6691 and RFC 8445). It’s often puzzling to see one community, TLS, say that a concept is a good idea and see the same concept be shunned in another community, namely the DNS developers.
But that’s not all. It is also argued that within the DNS resolution process DNSSEC validation is implemented in the wrong way, as validation is commonly performed in the recursive resolver and not directly within the end client’s system in the stub resolver. The problem here is that the end client has no reason to implicitly trust any recursive resolver. Similarly, there are no grounds whatsoever to believe an open unencrypted UDP exchange between a stub and recursive resolver is not susceptible to a MITM attack. Currently, DNSSEC validation in the DNS is just the wrong mode, they claim. There is a strong element of truth here. A more robust implementation of DNSSEC validation would have every endpoint performing DNSSEC validation for themselves.
Maybe the DNS Camel folk have a point. Perhaps we should try and spread the load. Why not have the application that wants to use DNS-provided data perform their own independent validation of this data? That way we can circumvent the issues with validation only being performed in DNS recursive resolvers and circumvent the issues with DNS and transport protocols, at the same time. This appears to be the motivating thought process that lies behind the publication of RFC 9102.
RFC 9102 — TLS DNSSEC Chain Extension
The document was originally posted as an individual Internet Draft in 2015. It was submitted to the IETF’s TLS Working Group and adopted as a Working Group draft a year later. However, progress within the TLS Working Group appeared to grind to a halt in 2018, after seven revisions of this working draft. The document was then further developed over the ensuing three years and published via the RFC’s Independent Submission process as an experimental specification as RFC 9102.
Placing the DNSSEC validation chained response into the TLS handshake can circumvent issues with DNS transports and coping with large responses, and in theory directs the information directly to an intended client (‘Here’s why you can trust this proffered public key’). However, DNS transport issues and implicit trust in validating recursive resolvers were not the entirety of concerns with DNSSEC, and simply addressing these DNS transport issues by removing them from the DNS still leaves some critical questions to be answered.
It is claimed that DNSSEC’s commonly used crypto is too weak, as there is a common belief that everyone uses RSA-1024 to sign in DNSSEC with SHA-1 hashes. These days, that’s not a very strong crypto setting. While we are looking at weaknesses, we should note that there is no explicit form of revocation of bad data in the DNS. There is no clear equivalent of transparency associated with the signing of DNS zones, or derivation of authority for the publication of DS and NS records in parent zones in the DNS. While Certificate Transparency may be a relatively weak security measure in the Web PKI, there appears to be no such equivalent of even that in the DNS. Bad deeds in the DNS can go largely unnoticed and unremarked.
There is the problem with stapled DNSSEC chain data that a man-in-middle can strip the stapled TLS extension as there is no proof of existence for TLS extensions. RFC 9102 attempts to mitigate this issue using the
ExtSupportLifetime element of the TLS extension. Its value represents the number of hours the TLS server commits to serving this extension in the future.
It’s unlikely that RFC 9102 represents a breakthrough of this impasse between the Web PKI model of trusted CAs and the DANE/DNSSEC model of placing the entirety of the trust infrastructure into the DNS.
It’s possible that the Web PKI proponents have a good point when they argue that perhaps it’s unwise to pin the entire Internet security framework onto a single key, the DNS KSK root zone key. Maybe it would be more resilient to use additional TAs so that we are not vulnerable to a single point of potential failure.
On the other hand, the Web PKI has largely adopted automated certificate issuance based on DNS proof of control tests, and the underlying reliance in a largely opaque transaction between the CA and the DNS to validate a certificate signing request is little different in the DANE/DNSSEC environment.
Perhaps the only difference here is that the client is directly performing the validation of the data when using stapled DNSSEC validation data. Otherwise, the client is merely validating an otherwise untestable attestation made by the CA that they performed a DNSSEC validation operation at some point in the past and the client should simply believe the CA’s attestation. Neither seems like a truly comfortable position to be in.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.