Web co-location and its impact on the privacy benefits of domain name encryption

By on 5 Apr 2021

Category: Tech matters

Tags: , ,

3 Comments

Blog home

The use of network traffic encryption technologies, such as HTTPS/TLS, is on the rise since obtaining a TLS certificate has become easier and free of charge. However, even when encryption is enabled, users’ online activities are still leaked through domain names, which are exposed via DNS queries/responses and the Server Name Indication (SNI) extension of TLS. 

Several domain name encryption technologies have been proposed to ameliorate this issue, including DoT (RFC 7858), DoH (RFC 8484), and Encrypted Client Hello (ECH). In this blog post, we will discuss our findings on assessing the privacy benefits offered by these technologies from the perspective of web co-location.

Figure 1 — Domain name encryption prevents any on-path observers from seeing the domain name information exposed as plaintext on the wire.
Figure 1 — Domain name encryption prevents any on-path observers from seeing the domain name information exposed as plaintext on the wire.

No more plaintext domain names? IP addresses can still reveal a lot!

In an idealistic future where domain name encryption is fully deployed (currently, both DoT and DoH have been standardized, while ECH is still under active development as an Internet Draft), destination IP addresses are the only remaining information of communicating parties that is visible to on-path observers. It is straightforward to reveal a user’s visited site if the destination IP address hosts only that particular domain. However, when a given destination IP address serves many domains, an adversary will have to ‘guess’ which one is being visited.

We performed an active DNS measurement of 7.5M domains (curated from the Alexa and Majestic top lists) to characterize the relationship between these domains and their hosting IP addresses. The privacy gain offered by domain encryption is then quantified based on the k-anonymity degree due to the co-location of domains hosted on the same IP address(es). 

Figure 2 — CDF of the k-anonymity value (colocation degree) of all studied domains.
Figure 2 — CDF of the k-anonymity value (co-location degree) of all studied domains.

Our analysis shows that 20% of the domains studied will not gain any privacy benefit at all, since they have a one-to-one mapping between their hostname and IP address. On the other hand, about 30% of the domains will gain a significant privacy benefit with a k-anonymity value greater than 100, as these domains are co-hosted with more than 100 other domains.

The two ends of the privacy spectrum

Next, we investigate the top-ten hosting providers that offer the highest co-location degree per IP address. As shown in the third column of Table 1, the average number of unique IP addresses observed for each provider is very low, with half of them hosting all domains under a single IP address.

Using the Hurricane Electric BGP Toolkit, we confirmed that these providers are small providers, with many of them managing less than 10K IP addresses. When looking into the popularity of the domains hosted by these providers, as shown in the last column, the highest rank is only at the 386th position (Squarespace), while more than half of these providers host domains that are well below the top 10K position. To this end, small providers tend to co-locate a large number of less popular domains on a small number of IP addresses.

Median kOrganizationUnique IPsHighest rank
3,311 AS19574 Corporation Service 2 1,471
2,740 AS15095 Dealer Dot Com 1 80,965
2,690 AS40443 CDK Global 1 68,310
1,338 AS32491 Tucows.com 1 22,931
1,284 AS16844 Entrata 1 96,564
946 AS39570 Loopia AB 6 19,238
824 AS54635 Hillenbrand 1 117,251
705 AS53831 Squarespace 23 386
520 AS12008 NeuStar 2 464
516 AS10668 Lee Enterprises 4 3,211

Table 1 — Top hosting providers offering the highest median k-anonymity per IP address.

We then analysed the co-location degree offered by major providers that dominate the largest number of unique IP addresses. Table 2 lists the top-20 major hosting and CDN providers, with more than 5,000 unique IP addresses observed. Unlike small hosting providers, these major providers host more popular domains. The most popular domains hosted by these providers are all within the top 10K.

However, in contrast to small providers, the co-location degree per IP address offered by these providers is quite low. Except for the case of Cloudflare, which has the highest co-location degree of 16, all other providers have a single-digit co-location degree. As a result, domains hosted on these providers will gain a much lower level of privacy.

Median kOrganizationUnique IPsHighest rank
16 AS13335 Cloudflare, Inc. 64,285 112
5 AS16509 Amazon.com, Inc. 47,786 37
5 AS46606 Unified Layer 27,524 1,265
3 AS16276 OVH SAS 22,598 621
3 AS24940 Hetzner Online GmbH 21,361 61
4 AS26496 GoDaddy.com, LLC 16,415 90
2 AS14061 DigitalOcean, LLC 11,701 685
3 AS14618 Amazon.com, Inc. 11,008 11
6 AS32475 SingleHop LLC 10,771 174
2 AS26347 New Dream Network 10,657 1,419
7 AS15169 Google LLC 9,048 1
3 AS63949 Linode, LLC 8,062 2,175
4 AS8560  1&1 Internet SE 6,898 2,580
3 AS32244 Liquid Web, L.L.C 6,412 1,681
3 AS19551 Incapsula Inc 6,338 1,072
4 AS36351 SoftLayer Technologies 6,005 483
3 AS16625 Akamai Technologies 5,862 13
4 AS34788 Neue Medien Muennich 5,679 7,526
6 AS9371  SAKURA Internet Inc. 5,647 1,550
3 AS8075  Microsoft Corporation 5,360 20

Table 2 — Top CDN and hosting providers with highest number of observed IP addresses.

The above two results are indicative of the two ends of the privacy spectrum, when considering the information that is revealed by destination IP addresses.

On the right-hand side of the CDF plot in Figure 2, less popular domains are hosted on smaller providers with a handful of IP addresses, which benefit from a higher co-location degree.

Read: Improving the privacy of DNS and DoH with oblivion

However, on the left-hand side of the plot, more popular domains are hosted on providers managing a much larger number of IP addresses, suffering from a lower co-location degree — and thus cannot gain any significant privacy benefit from domain name encryption.

Domain owners and hosting providers can aid in improving the privacy benefits of domain name encryption

Different users may have different privacy expectations, while not all domains on the Internet are equally privacy-sensitive, nor do they have the same amount of visitors at different geographical locations.

To maximize the privacy benefit offered by domain encryption technologies, website owners may want to seek hosting services from — the unfortunately quite few — providers that maximize the ratio between co-hosted domains per IP address.

Hosting providers, on the other hand, can hopefully aid in maximizing the privacy benefits of DoTH and ECH by increasing the unpredictability of domain-to-IP mappings. This can be achieved by co-locating many domains under the same IP address(es), and increasing the frequency of hosting IP address rotation.

For more technical detail of our study, the full paper and our presentation at the 2020 Asia Conference on Computer and Communications Security are available in ACM Digital Library.

Hoàng Nguyên Phong is a PhD candidate at Stony Brook University and a research fellow at the Citizen Lab — University of Toronto. His research interests encompass online privacy and Internet measurement.

This is a joint work with Arian Akhavan Niaki, Nikita Borisov, Phillipa Gill and Michalis Polychronakis.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

3 Comments

  1. Argo Rotaru

    Nice work! So it seems like Cloudflare is leading in term of adopting these new technologies to provide better privacy to its customers. Other hosting providers should also take part in implementing and adopting these new protocols, especially SNI encryption.

    Reply
  2. DC

    How does this impact middle box / proxy decryption capabilities where policy is determined on being able to inspect the SNI or FQDN?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Please answer the math question * Time limit is exhausted. Please click the refresh button next to the equation below to reload the CAPTCHA (Note: your comment will not be deleted).

Top