All Internet communication relies on the Domain Name System (DNS), which maps a human-readable name to an IP address before two endpoints establish a connection to exchange data.
Most DNS queries and responses are transmitted in cleartext (Do53), making them vulnerable to eavesdroppers and traffic analysis — past work has demonstrated that DNS queries can reveal sensitive information, such as browsing activity or user activity in a smart home.
To mitigate some of these privacy risks, two protocols have recently been proposed: DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH). Rather than sending queries and responses as cleartext, these protocols establish encrypted tunnels between clients and resolvers. This fundamental change has implications for the performance of the DNS, as well as for content delivery.
Our team of researchers at Princeton University sought to measure how encrypted transports for the DNS affect end-user experience in web browsers. Specifically, we measured DNS response times and page load times for a week across different resolvers (Cloudflare, Google, Quad9, and a university network), websites, and network conditions.
From our measurements, we found that TCP enables DoH and DoT to outperform Do53 in page load times on a university network, despite higher response times. This is particularly the case when there is significant packet loss.
Measuring page load times with Cloudflare’s resolver
To understand the relationship between network conditions and DNS protocols, we measured page load times while using Cloudflare’s recursive resolver (184.108.40.206).
In this post, we focus on page load times using Cloudflare’s resolver across different network conditions, because it performed the best. Our pre-print paper discusses the results for all resolvers.
We measured page-load times by controlling Mozilla Firefox through Selenium and visiting websites from the Tranco list and recording their load times. We visited the top 1,000 websites and the websites ranked 99,000 to 100,000, which represent highly optimized websites and less optimized but still relevant websites.
We obtained the page load times by inspecting HTTP Archive objects (HARs), which we extracted from Mozilla Firefox through a custom extension. Importantly, HARs includes timings for the on-load event, which triggers when a browser finishes loading and rendering a page, as well as timings for individual components for each request that the browser made.
Figure 1 — Comparison of page load times between protocols using Cloudflare’s resolver on a university network and an emulated 4G network.
Figure 1 compares page load times across a university network and an emulated 4G network. Each plot shows a cumulative distribution function for the difference in page load times between two protocols on a given network.
Interestingly, page load times with Cloudflare DoH, DoT, and Do53 are indistinguishable from each other on a university network. These results stand in contrast to our naïve expectation that page load times for DoT and DoH would be slower than Do53 due to additional latency for individual requests (see Section 4.1 in our paper).
We also found that on the 4G network, DoT still performs indistinguishably from Do53, and DoH performs slightly worse than Do53.
Figure 2 — Comparison of page load times between protocols using Cloudflare’s resolver on an emulated, lossy 4G network and a 3G network.
Figure 2 compares page load times across an emulated lossy 4G network and a 3G network. On the lossy 4G network, DoT performs slightly better than Do53, with page load times that are between 100ms and 1s faster. However, as throughput decreases and loss increases on the 3G network, DoT and DoH are no longer able to perform as well as Do53 with page load times.
Transport protocols greatly affect performance
We believe that TCP enables DoH and DoT to outperform Do53 in page load times, despite higher response times. This is particularly the case when there is significant packet loss.
For example, the default timeout for Do53 requests in Linux is set to 5 seconds by /etc/resolv.conf, which is the earliest time after which a failed DNS query can be retransmitted. However, depending on the TCP configuration, DoT and DoH packets may be automatically retransmitted after only 2x the round-trip-time latency to the recursive resolver. Thus, DoT and DoH may be able to more quickly recover lost DNS queries that block rendering a page than Do53.
Considering the ubiquity of the web and the multitude of DNS queries that a single website may cause, we need to further explore how different transport protocols affect DNS performance, and, in turn, page load times and user experience.
Of course, sending DNS packets over TCP is not a new concept, as RFC 1035 described how to do so in 1987. However, if major browser vendors enable DoH or DoT by default, then all DNS traffic for hundreds of millions of users will be sent over TCP.
This may have profound and unforeseen performance impacts. As such, we have a huge opportunity to improve the DNS by studying the performance of different transport protocols, and how the DNS performs over these transport protocols at scale.
Austin Hounsel is a PhD student in Computer Science at Princeton University, who’s interested in Internet measurements, privacy, and censorship.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.