Who controls the Internet?

By on 13 Jan 2023

Category: Tech matters

Tags: , , ,

Blog home

Adaptation of  XKCD 2347 ‘Dependency’
Figure 1 — Adapted from XKCD 2347 ‘Dependency’.

Why yes, the Internet is resting on a foundation of duct tape and WD40 — it is known. And the DNS is the mother of all cornerstones that, if knocked out, would quickly lead to the fall of western civilization. (And yes, it is a hard requirement to use this XKCD cartoon to illustrate this.) But at least it’s not quite as fragile as, say, whois, so yay!

But while the DNS root servers are known to be distributed, I thought it might be interesting to examine the immediate levels up from the root, and so I went to analyse the diversity or centralization of the authoritative nameservers for the generic Top-Level Domains (gTLDs) and the second-level domains in those gTLDs.

To perform this analysis, I started with the root zone, which (as of November 2022) contains 1,485 TLDs. As I discussed previously, just what exactly you find in there is already utterly fascinating, but for our purposes here, let’s note that you can then request access to all of the gTLD zone files via ICANN’s Centralized Zone Data Service, which got me access to 1,165 zones in total. In addition, you can obtain the .gov zone from CISA’s GitHub repository, as well as .arpa from most of the root servers:

$ dig +noall +answer +onesoa @f.root-servers.net arpa. AXFR | more
arpa.                   86400   IN      SOA a.root-servers.net. nstld.verisign-grs.com. 2022111501 1800 900 604800 86400
arpa.                   518400  IN      NS        m.ns.arpa.
arpa.                   518400  IN      NS        c.ns.arpa.
arpa.                   518400  IN      NS        f.ns.arpa.
[...]

This leaves us missing the .edu, .int, .mil and .post TLDs, which are not generally available (if you know how to get access, please let me know).

For the country code Top-Level Domains (ccTLDs), it’s a lot more difficult to gain access. Most operators do not provide public access, although some do — you can AXFR some of them or gather some published data from others. Commercial services exist that sell you zone data, but it seems to me that this data ought to be public, so I excluded ccTLDs from my analysis for the time being.

Anyway, with 1,168 total zone files adding up to around 7GB of data (of which the .com zone accounts for 4.8GB alone!), I went ahead and used a variety of shell scripts and some Perl glue to parse out the NS records to then see just what domains those are in, that is, who controls them.

The root

The DNS root zone itself is served by 13 root authorities, and as such is obviously and trivially diverse. The 13 authorities are managed by twelve root operators — nine US organizations (including three US government entities), of which one (Verisign) operates two roots, one Swedish company (Netnod), one organization in Japan (WIDE), and one headquartered in the Netherlands (RIPE NCC). Obviously, all are in the same domain (root-servers.net):

Figure 2 — NS records for the root zone.
Figure 2 — NS records for the root zone.
Figure 3 — Domains containing those NS records.
Figure 3 — Domains containing those NS records.

Now for the root itself, this illustration is, of course, a bit silly, but it gives you an idea of what I’m looking for in this analysis. And things do get a bit more interesting once we process all the NS records from the root zone itself, where we find 7,507 total NS records across 5,612 unique name servers, which looks reasonably diverse:

$ awk '/IN[       ]*NS[   ]/ { print $NF }' root.zone | wc -l
    7507
$ awk '/IN[       ]*NS[   ]/ { print $NF }' root.zone | sort | uniq -c | wc -l
    5625
$ awk '/IN[       ]*NS[   ]/ { print $NF }' root.zone | sort | uniq -c | sort -rn | head -20
 119 ac4.nstld.com.
 119 ac3.nstld.com.
 118 ac2.nstld.com.
 118 ac1.nstld.com.
  47 l.gmoregistry.net.
  47 k.gmoregistry.net.
  47 b.gmoregistry.net.
  47 a.gmoregistry.net.
  46 ns-tld5.charlestonroadregistry.com.
  46 ns-tld4.charlestonroadregistry.com.
  46 ns-tld3.charlestonroadregistry.com.
  46 ns-tld2.charlestonroadregistry.com.
  46 ns-tld1.charlestonroadregistry.com.
  27 anycast9.irondns.net.
  27 anycast24.irondns.net.
  27 anycast23.irondns.net.
  27 anycast10.irondns.net.
  21 j.zdnscloud.com.
  21 i.zdnscloud.com.
  21 g.zdnscloud.com.
$ 

But if you look closer, you’ll notice that many of the nameservers are in the same domain, so if we then flatten the whole thing, we see a bit more of a centralization. For example, 6.3% of the NS records are under nstld.com, which is operated by Verisign:

Figure 4 — Nameserver diversity by domain.
Figure 4 — Nameserver diversity by domain.

But thinking about this distribution a bit more quickly makes you realize that there isn’t really an even distribution in the gTLD, since not all domains have the same footprint. As you may guess, the .com zone has more records than some of the other zones. More specifically, .com has over 164 million NS records, making up 73% of all the NS records in all the gTLDs.

Figure 5 — Number of NS records by gTLD.
Figure 5 — Number of NS records by gTLD.

The NS records for .com are in the gtld-servers.net domain, but so are, for example, .net's. Similarly, the NS records for .org and .info are in the same domain, so we can flatten this data a little bit more:

Figure 6 — Number of NS records by gTLD NS.
Figure 6 — Number of NS records by gTLD NS.

In other words, almost 80% of all NS records across all gTLDs are under the gtld-servers.net domain, and thus the control of Verisign — the same Verisign that also operates two roots.

Ok, so this is the representation of the NS records for the gTLDs within the root zone, but what about the NS records for all the second-level domains within the gTLDs? Parsing all 1,168 zone files, we end up with 2,699,827 unique nameservers that we can group under 1,063,092 domains:

Figure 7 — NS diversity in gTLDs by domain.
Figure 7 — NS diversity in gTLDs by domain.

This shows a notable centralization of the NS records found in all gTLD zones, with domaincontrol.com accounting for roughly 20% alone.

Another thing that seems interesting here is that some of the cloud companies offering DNS services are choosing to use a larger number of NS records even across, in the case of AWS, thousands of second-level domains in several TLDs:

$ grep awsdns- domain-counts.full | head
52221 awsdns-02.org.
49614 awsdns-23.net.
49264 awsdns-49.com.
48276 awsdns-05.co.uk.
46392 awsdns-35.org.
45955 awsdns-53.com.
45593 awsdns-19.net.
44409 awsdns-25.com.
44176 awsdns-22.co.uk.
44140 awsdns-45.org.
$ grep -c awsdns- domain-counts.full 
978

The data now show that out of the over 534M NS records across a little over 1M domains:

  • 43% of all NS records (roughly 230M) are served by only 165 nameservers found in just 10 domains
  • 52% (~ 278M) are served by 255 nameservers in just 20 domains
  • 75% (~ 401M) are served by 1,580 nameservers in just 100 domains
  • 99% (~ 529M) are served by 345,000 nameservers in 6,000 domains

Let’s look at these 20 domains and see who controls them, and thus over half of all the domains in all the gTLDs. They are:

1. domaincontrol.com is Wild West Domains GoDaddy US (AS44273)
2. googledomains.com is Google US (AS15169)
3. cloudflare.com is Cloudflare US (AS13335)
4. ui-dns.* is IONOS United Internet AG DE (AS8560)
5. registrar-servers.com is Namecheap US (AS397213)
6. wixdns.net is Wix.com Ltd. IL (AS15169)
7. hichina.com is Alibaba Cloud Computing CN (AS37963)
8. dns.com is Comodo Xcitum US (AS21859, AS133775 CN)
9. awsdns-* is Amazon Web Services US (AS16509)
10. nsone.net is NS1 US (AS62597)
11. namebrightdns.com is NameBright Turn Commerce Inc. US (AS14618)
12. gname-dns.com is Gname SG (AS13335)
13. name-services.com is Enom US/CA (AS15348)
14. dnsowl.com is NameSilo US (AS13335)
15. squarespacedns.com is Squarespace US (AS63911)
16. worldnic.com is Network Solutions LLC US (AS13335)
17. bluehost.com is Newfold Digital, Inc US (AS13335)
18. name.com is Donuts Inc Identity Digital US (AS62597)
19. myhostadmin.net SG (AS38283)
20. wordpress.(org|com) is Automattic Inc. US (AS2635)

You may notice that of these 20 organizations, 15 are US entities, two are Chinese, one is German, one is Israeli, and there’s one from Singapore, giving you an idea of what governments could — in theory, at least — exert control over what percentage of the Internet.

Another interesting thing to point out here is that even though the domains are registered by different organizations, the nameservers in use may actually be operated from a different entity’s networks. In particular, it looks like several of the nameservers in these domains are running out of, fronted by, or otherwise utilizing Cloudflare’s network, while Wix seems to be using Google Cloud (I’m guessing) to run their name servers.

name.com is owned by Identity Digital, the rebranding of the merged Donuts and Afilias registries (that I’ve previously discussed), which also operate a significant number of TLD domains.

All in all, this is a sign that perhaps we should take a look at the Autonomous System (AS) numbers the various nameservers are in, and so, a few thousand lookups later:

Figure 8 — NS diversity in gTLDs by AS.
Figure 8 — NS diversity in gTLDs by AS.

That’s right — around 34% of the majority of NS records are resolving to IP addresses in Cloudflare’s AS13335, and over half of all are ultimately served from only four Autonomous Systems: Cloudflare (AS13335), Alibaba (AS37963), GoDaddy (AS44273), and IONOS (AS8560) — hinting at the other big load-bearing infrastructure pillar that also remains largely insecure by default.

And while that is interesting by itself, just as before when we looked at the nameservers serving the gTLD domains themselves and we tried to weigh them against how many domains they support, perhaps we should also look at not only the NS diversity in the raw gTLDs; after all, control of google.com or facebook.com surely counts more than, say, monkeyjungle.com.

So, what do people do when they want to look at popular domains? They go for the ‘Alexa Top 1 Million Domains’ list, of course! Only… Alexa was bought by Amazon, and in a sign of ‘who controls the Internet’, Amazon promptly shut it down. (As of 8 November 2022, the actual list was still available, but it looks like it has since been restricted.) Of course, there are other, similar lists (like, for example, the Cisco Umbrella or the Majestic Million), all of which intersect to some degree but remain distinct based on the heuristics used by the data collection mechanisms used. For this reason, researchers provide a normalized Top 1 Million list (see their paper for more details), which I’ve used for this project.

Iterating over that full list and looking up the NS records for 1M domains then yields a breakdown of 2,636,294 total NS records in 119,291 domains, as well as the insight that spreadsheets are surprisingly bad at handling large data sets, even of simple text data:

Figure 9 — Top 1M domains, according to the Tranco list.
Figure 9 — Top 1M domains, according to the Tranco list.

So, we see a very similar distribution to our analysis of all of the NS records in all of the gTLDs here in the top 1M domains, too; more than half of the NS records used by the top one million domains are found in just 20 of the 120K domains, served by only 1,740 NS records.

The top ten NS record domains are represented by the usual suspects (Cloudflare, Amazon, GoDaddy, Akamai, DigiCert, Google, Microsoft, Alibaba, Network Solutions, and Namecheap), although not identical to those we observed for all of the gTLD records.

Also noteworthy is that the distribution across NS domains shifts somewhat when you look at the top 100 domains (Azure, AWS, Google, Akamai), the top 1,000 domains (AWS, Akamai, NS1, Google), the top 10K domains (AWS, Akamai, Cloudflare, NS1) and the full top 1M (Cloudflare, Amazon, GoDaddy, Akamai), suggesting that more of the less popular sites use Cloudflare than do the higher ranked sites.

At the same time, when we do the same breakdown by AS as before (with many thanks to our friends at Team Cymru), we notice an even increased centralization:

Figure 10 — NS diversity in 1M domains by AS.
Figure 10 — NS diversity in 1M domains by AS.

Out of almost 10,000 IP addresses covering 75% of the top one million domains’ NS records, over 40% again land in Cloudflare’s AS13335, with most of the others being mere ‘also-rans’.

Ok, so that’s a whole lot of pie charts, and learning that there is indeed a fair bit of centralization at the gTLD level of the DNS will not come as a surprise to many. However, crunching those numbers still provides some useful insights. So, if we wanted to answer the question “Who controls the Internet?”, then I think that we may find multiple answers:

1. Verisign — In addition to operating two of the DNS root authorities, Verisign also controls the gtld-servers.net domain, which we’ve seen above is home to a whopping 80% of all gTLD NS records! Take out Verisign, and the Internet will have a bad day.

2. A handful of large companies — The usual suspects. With 43% of all NS records in all gTLDs and 44% of those in the top 1M in a combined 14 domains, any one of those could exert significant control over large chunks of the Internet. But amongst those companies, a few stand out:

3. GoDaddy — Owner of the aptly named domaincontrol.com domain, GoDaddy is responsible for 20% of all NS records in all gTLDs alone.

4. Cloudflare — Responsible for 20% of NS records in the top one million domains, Cloudflare also provides the IP space home to a total of 40% of those NS records.

What this centralization means in practice and whether, for example, the US government could realistically exert control over the root operators and companies discussed here, is a different story altogether. But no matter how you look at it, the Internet seems less distributed or decentralized than one might wish, as many businesses and organizations appear to concentrate on a handful of registries and cloud service providers.

We don’t have a single point of failure just yet, but I do see multiple points of calamity with an increasing blast radius…

Jan Schaumann is a Distinguished Infrastructure Security Architect, and Adjunct Professor of Computer Science, with an interest in information security and the overall health of the internet, as well as the safety and privacy of its users. You can follow Jan on Twitter and Mastodon.

This post is adapted from the original at Jan’s Blog and is a version of a talk given at the 5th ICANN’s DNS Symposium in Brussels, Belgium, in November 2022.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top