Can the DNS support encryption without enabling centralization?

By on 13 Oct 2021

Category: Tech matters

Tags: , , , , ,

Blog home

DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) improve the privacy of DNS queries and responses. While encryption is a positive thing, deployment of these protocols has, in some cases, resulted in further centralization of the DNS, introducing new challenges. In particular, centralization has consequences for performance, privacy, and availability. A potentially greater concern is that it has become more difficult for clients to control their choice of recursive resolver, particularly for IoT devices.

Can the DNS architecture support encryption without enabling centralization?

Read: DNS resolver centrality

It’s a question that we at Princeton University, Ruhr-University Bochum, and the University of Chicago collaborated to answer in a recent paper presented at ANRW ’21. In this post I want to share our work on supporting decentralized name resolution, while preserving the benefits of encrypted DNS. I’ll also touch on a preliminary evaluation of the performance impact of distributing DNS queries on web page load times.

Query distribution strategies

There are several potential strategies for distributing DNS queries across multiple recursive resolvers to reduce centralization. This list is not meant to be exhaustive; rather, it represents initial discussions of this problem space.

Random distribution — In the most simple strategy, clients could randomly send queries within a set of defined resolvers (R), resulting in each resolver handling R1 of the client’s queries. This is a simple strategy, and recovery from failure is simple: If clients don’t receive a response from a resolver, they can retransmit their queries to another random resolver.

Round-robin distribution — Clients could also sequentially distribute queries across a set of resolvers (R). The round-robin strategy results in each resolver being assigned R1 of the client’s queries. This strategy ensures that queries are evenly distributed over multiple resolvers, but it enables multiple resolvers to receive queries for the same domain name over time. As with random distribution, round-robin distribution may provide clients with more resilience to failure.

Hash-based distribution — In a hash-based distribution strategy, second-level domain names (SLDs) could be hashed to index into a list of resolvers. This strategy not only means that queries issued by a client for and will be sent to the same resolver, it ensures that no two resolvers receive queries for the same SLDs from a client. However, some resolvers may receive a larger share of queries. Furthermore, this strategy may be less robust to failure: If a resolver fails, users may not be able to perform DNS resolution for certain domain names, which may be difficult for users to debug.

Performance impact of distributing queries

We forked the open-source dnscrypt-proxy stub resolver to implement and evaluate these query distribution strategies with four popular recursive resolvers.

We performed page loads using these strategies from several Amazon EC2 vantage points for 20 days. We also performed page loads with each recursive resolver for comparison. We measured the top 1,000 websites on the Tranco top list for December 12th, 2020. Our paper provides more details about our methodology.

Figure 1 shows page load times for each query distribution strategy, and Figure 2 shows page load times using each individual resolver.

Graph showing page load times from each vantage point using query distribution models.
Figure 1 — Page load times from each vantage point using query distribution models.
Graph showing page load times from each vantage point using individual resolvers.
Figure 2 — Page load times from each vantage point using individual resolvers.

Table 1 and Table 2 show median page load times.


Table 1 — Median page load times (in seconds) from each vantage point using each query distribution strategy.

N. Virginia2.05s2.18s2.03s2.12s

Table 2 — Median page load times (in seconds) from each vantage point using a single resolver for all DNS queries.

We found that for most vantage points, each strategy performs similarly in terms of median page load times. The largest performance gap was in Oregon between the hash strategy and round-robin strategy, with the hash model performing 150ms slower. The largest difference between two strategies was lower in other vantage points, with 50ms in California, 10ms in Ohio, and 110ms in North Virginia. We observed similar behaviour when performing page loads with each individual resolver.

We also sought to understand whether distributing queries across multiple recursive resolvers could negatively affect Content Delivery Network (CDN) localization. To do so, we extracted domain names from HTTP requests in the HTTP Archive’s requests_desktop table for October 2020. We then used information provided by the HTTP Archive to determine which CDN each domain name hosts its content on, if applicable. For each request that was hosted by either Cloudflare or Google’s CDN networks, we resolved the domain names twice (once each, using Cloudflare and Google’s DNS), and measured the latency for TCP and SSL connection setup to the resolved IPs from a 500 Mbps residential fiber connection.

Figure 3 shows the cumulative distribution function (CDF) for combined TCP and TLS setup times for a given resolver and CDN.

CDF graph showing TCP and SSL setup times to CDN servers operated by Cloudflare and Google. Each line shows setup times when a particular DNS resolver is used for either Cloudflare or Google hosted content.
Figure 3 — TCP and SSL setup times to CDN servers operated by Cloudflare and Google. Each line shows setup times when a particular DNS
resolver is used for either Cloudflare or Google hosted content.

We found that concerns over whether distributing queries over multiple resolvers will affect CDN localization are not significant in our experiment. When either Google’s resolver or Cloudflare’s resolver is used to resolve Google-hosted content, TCP and TLS setup times follow the same distribution. The distributions for each resolver are slightly different when Cloudflare content is resolved, but for the most part, the distributions are very similar.

Where do we go from here?

Our preliminary evaluation suggests that clients can distribute DNS queries across a set of popular recursive resolvers without performance degradation. Future work should explore various alternative strategies for resolving and distributing encrypted DNS queries to further reduce DNS centralization. Our work provides a starting point for evaluating various strategies.

There are additional questions related to addressing DNS centralization that we believe are worth exploring. For example, although the network research community has some consensus that DNS centralization is an issue, there is a lack of discussion on how to formalize the privacy risks. In particular, it may be useful to develop a framework for understanding the various risks and remediations associated with a resolver having access to an increased amount of DNS queries. It may also be useful to perform survey/lab-based studies to investigate user understanding of major encrypted DNS deployments and their perceived impact on privacy.

Our paper provides a more in-depth discussion of the challenges surrounding DNS centralization.

Austin Hounsel is a PhD student in Computer Science at Princeton University. Generally speaking, he is interested in Internet measurements, privacy, and censorship.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *