The use of network traffic encryption technologies, such as HTTPS/TLS, is on the rise since obtaining a TLS certificate has become easier and free of charge. However, even when encryption is enabled, users’ online activities are still leaked through domain names, which are exposed via DNS queries/responses and the Server Name Indication (SNI) extension of TLS.
Several domain name encryption technologies have been proposed to ameliorate this issue, including DoT (RFC 7858), DoH (RFC 8484), and Encrypted Client Hello (ECH). In this blog post, we will discuss our findings on assessing the privacy benefits offered by these technologies from the perspective of web co-location.
No more plaintext domain names? IP addresses can still reveal a lot!
In an idealistic future where domain name encryption is fully deployed (currently, both DoT and DoH have been standardized, while ECH is still under active development as an Internet Draft), destination IP addresses are the only remaining information of communicating parties that is visible to on-path observers. It is straightforward to reveal a user’s visited site if the destination IP address hosts only that particular domain. However, when a given destination IP address serves many domains, an adversary will have to ‘guess’ which one is being visited.
We performed an active DNS measurement of 7.5M domains (curated from the Alexa and Majestic top lists) to characterize the relationship between these domains and their hosting IP addresses. The privacy gain offered by domain encryption is then quantified based on the k-anonymity degree due to the co-location of domains hosted on the same IP address(es).
Our analysis shows that 20% of the domains studied will not gain any privacy benefit at all, since they have a one-to-one mapping between their hostname and IP address. On the other hand, about 30% of the domains will gain a significant privacy benefit with a k-anonymity value greater than 100, as these domains are co-hosted with more than 100 other domains.
The two ends of the privacy spectrum
Next, we investigate the top-ten hosting providers that offer the highest co-location degree per IP address. As shown in the third column of Table 1, the average number of unique IP addresses observed for each provider is very low, with half of them hosting all domains under a single IP address.
Using the Hurricane Electric BGP Toolkit, we confirmed that these providers are small providers, with many of them managing less than 10K IP addresses. When looking into the popularity of the domains hosted by these providers, as shown in the last column, the highest rank is only at the 386th position (Squarespace), while more than half of these providers host domains that are well below the top 10K position. To this end, small providers tend to co-locate a large number of less popular domains on a small number of IP addresses.
|Median k||Organization||Unique IPs||Highest rank|
|3,311||AS19574 Corporation Service||2||1,471|
|2,740||AS15095 Dealer Dot Com||1||80,965|
|2,690||AS40443 CDK Global||1||68,310|
|946||AS39570 Loopia AB||6||19,238|
|516||AS10668 Lee Enterprises||4||3,211|
Table 1 — Top hosting providers offering the highest median k-anonymity per IP address.
We then analysed the co-location degree offered by major providers that dominate the largest number of unique IP addresses. Table 2 lists the top-20 major hosting and CDN providers, with more than 5,000 unique IP addresses observed. Unlike small hosting providers, these major providers host more popular domains. The most popular domains hosted by these providers are all within the top 10K.
However, in contrast to small providers, the co-location degree per IP address offered by these providers is quite low. Except for the case of Cloudflare, which has the highest co-location degree of 16, all other providers have a single-digit co-location degree. As a result, domains hosted on these providers will gain a much lower level of privacy.
|Median k||Organization||Unique IPs||Highest rank|
|16||AS13335 Cloudflare, Inc.||64,285||112|
|5||AS16509 Amazon.com, Inc.||47,786||37|
|5||AS46606 Unified Layer||27,524||1,265|
|3||AS16276 OVH SAS||22,598||621|
|3||AS24940 Hetzner Online GmbH||21,361||61|
|4||AS26496 GoDaddy.com, LLC||16,415||90|
|2||AS14061 DigitalOcean, LLC||11,701||685|
|3||AS14618 Amazon.com, Inc.||11,008||11|
|6||AS32475 SingleHop LLC||10,771||174|
|2||AS26347 New Dream Network||10,657||1,419|
|7||AS15169 Google LLC||9,048||1|
|3||AS63949 Linode, LLC||8,062||2,175|
|4||AS8560 1&1 Internet SE||6,898||2,580|
|3||AS32244 Liquid Web, L.L.C||6,412||1,681|
|3||AS19551 Incapsula Inc||6,338||1,072|
|4||AS36351 SoftLayer Technologies||6,005||483|
|3||AS16625 Akamai Technologies||5,862||13|
|4||AS34788 Neue Medien Muennich||5,679||7,526|
|6||AS9371 SAKURA Internet Inc.||5,647||1,550|
|3||AS8075 Microsoft Corporation||5,360||20|
Table 2 — Top CDN and hosting providers with highest number of observed IP addresses.
The above two results are indicative of the two ends of the privacy spectrum, when considering the information that is revealed by destination IP addresses.
On the right-hand side of the CDF plot in Figure 2, less popular domains are hosted on smaller providers with a handful of IP addresses, which benefit from a higher co-location degree.
Read: Improving the privacy of DNS and DoH with oblivion
However, on the left-hand side of the plot, more popular domains are hosted on providers managing a much larger number of IP addresses, suffering from a lower co-location degree — and thus cannot gain any significant privacy benefit from domain name encryption.
Domain owners and hosting providers can aid in improving the privacy benefits of domain name encryption
Different users may have different privacy expectations, while not all domains on the Internet are equally privacy-sensitive, nor do they have the same amount of visitors at different geographical locations.
To maximize the privacy benefit offered by domain encryption technologies, website owners may want to seek hosting services from — the unfortunately quite few — providers that maximize the ratio between co-hosted domains per IP address.
Hosting providers, on the other hand, can hopefully aid in maximizing the privacy benefits of DoTH and ECH by increasing the unpredictability of domain-to-IP mappings. This can be achieved by co-locating many domains under the same IP address(es), and increasing the frequency of hosting IP address rotation.
For more technical detail of our study, the full paper and our presentation at the 2020 Asia Conference on Computer and Communications Security are available in ACM Digital Library.
Hoàng Nguyên Phong is a PhD candidate at Stony Brook University and a research fellow at the Citizen Lab — University of Toronto. His research interests encompass online privacy and Internet measurement.
This is a joint work with Arian Akhavan Niaki, Nikita Borisov, Phillipa Gill and Michalis Polychronakis.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.
Nice work! So it seems like Cloudflare is leading in term of adopting these new technologies to provide better privacy to its customers. Other hosting providers should also take part in implementing and adopting these new protocols, especially SNI encryption.
Thank you, Argo. Yes, Cloudflare is indeed ahead in supporting these new privacy-enhancing technologies. I believe that we will soon see HTTPS resource records with ECH configuration supported on Cloudflare’s network. FYI: https://blog.cloudflare.com/speeding-up-https-and-http-3-negotiation-with-dns/
How does this impact middle box / proxy decryption capabilities where policy is determined on being able to inspect the SNI or FQDN?
@DC Thank you for your question! This is a true concern for many network operators. We thus have discussed about this issue in section 7.2 of our paper (https://arxiv.org/pdf/1911.00563.pdf). I am pasting along the paragraph that discuss this exact question below.
While providing many security and privacy benefits, domain name encryption can be a “double-edged sword” for network administrators who want to have full visibility and control over domain resolutions in the networks under their responsibility. Until now, the operation of firewalls, intrusion detection systems, and anti-spam or anti-phishing filters has benefited immensely from the domain name information extracted from network traffic, as is evident by the series of works that employ DNS data to detect domain name abuses and malicious online activities. Under a full DoH/DoT and ESNI deployment, this visibility will be lost, and systems based on domain reputation and similar technologies will be severely impacted. While many malicious domains often hide themselves by sharing hosting addresses with other innocuous and unpopular websites, it will be challenging to detect and block them. A possible solution would be to rely solely on TLS proxying using custom provisioned certificates, in order to gain back the visibility lost by ESNI and DoH/DoT, which is already a common practice used by transparent SSL/TLS proxies. Although this will defeat any privacy benefits of these technologies, this may be an acceptable tradeoff for corporate networks and other similar environments.