Who reads your email?

By on 5 Apr 2023

Category: Tech matters

Tags: , ,

3 Comments

Blog home

This is the second blog post on the topic of the centralization of the Internet. The first post, discussing the diversity of authoritative name servers, can be found here.

According to various statistics, there are somewhere around 330 billion emails being sent every day, approximately 3.82 million per second. Who reads all these emails?

Ok, ok, nobody does. Who would want to? Most of it is spam anyway. But, given how personal email is, how much we rely on email for business, how useful email can be in legal discovery, and, most importantly, how — over 40 years after RFC 821 was published — we still use a clear text protocol and have no realistic solution for end-to-end encryption of this private content… given all that, who could read that email if they wanted to? Ah, well, that’s another question altogether.

The Simple Mail Transfer Protocol (SMTP) uses MX records in the DNS to identify which server(s) it should hand the mail off to. It used to be common for domain owners to run their own mail server, but it turns out that doing that well while efficiently combating spam (both incoming and outgoing), email abuse, and the ever-increasing traffic volume is not that easy. And what do we do when things aren’t easy? We pay somebody else to do it for us — to the cloud!

In 2023, chances are that, regardless of the domain in question, your personal and/or business email is actually handled by organizations like Google, Microsoft, Yahoo, Apple, Yandex, or GMX. But even if these are your email service providers, it’s also quite likely that your domain uses another layer in front of that, which provides spam-, malware-, and data-loss prevention (DLP) and filtering features. Popular service providers here include Proofpoint, Barracuda, Sophos, Trustwave, and some other offerings from big-name companies as well as ones you likely have never heard of.

So let’s take a look at which of these various companies are fronting the most domains and could, in theory, read your email!

Methodology

Much like I did when I looked at NS record diversity, I went through all the generic Top-Level Domains (gTLDs) zone files, again leaving out country code Top-Level Domains (ccTLDs), extracted all second-level domains, and then went to work with nothing but my little, trusty bind9 caching resolver running on my personal VPS.

Performing millions of parallel DNS lookups leads to some interesting problems in different areas, which are probably worth a separate blog post all on their own.

For each gTLD zone file, I extracted the full list of domains within that TLD, defined as any unique label in the zone file with an NS record. This yielded a grand total of approximately 203 million domain names: >164 million in .com alone, with all other gTLDs adding up to roughly 39 million domain names. For each of those domains, I then performed DNS lookups for its MX records, and a few million queries later I ended up with a whole bunch of mail server Fully Qualified Domain Names (FQDNs).

A single domain may, of course, have multiple MX records that may or may not be in the same domain (which itself may or may not be within the original domain):

$ dig +short mx netmeister.org        # <--+ 1 MX
50 panix.netmeister.org.              # <--+ within the same domain

$ dig +short mx akamai.com            # <--+ 4 MX
20 mx0b-00190b01.pphosted.com.        #    |
10 mxa-00190b01.gslb.pphosted.com.    #    | all in a different domain
10 mxb-00190b01.gslb.pphosted.com.    #    |
20 mx0a-00190b01.pphosted.com.        # <--+

$ dig +short mx twitter.com           # <--+ 5 MX
30 ASPMX3.GOOGLEMAIL.com.             #    + in two different domains
20 alt1.aspmx.l.google.com.           #    | (owned by the same org)
10 aspmx.l.google.com.                #    |
30 ASPMX2.GOOGLEMAIL.com.             #    +
20 alt2.aspmx.l.google.com.           # <--+

$ dig +short mx whynot.coffee         # <--+ 4 MX
10 mailin.mx-hub.cz.                  #    | in four different domains
10 mailin.mx-hub.eu.                  #    | in four different TLDs
10 mailin.mx-hub.sk.                  #    |
10 mailin.mx-hub.net.                 # <--+
$ 

So we need to flatten the data a bit and reduce the individual MX servers to their second-level domain. With the help of some perl and the Public Suffix List, I mapped the approximately 30 million unique MX servers listed for the 203 million domains into around 21 million second-level domains.

So… who does read host everybody’s email?

Stats by MX

No MX

As noted above, I found approximately 30 million unique mail servers, but of course, not every domain has an MX record. In that case, SMTP assumes an ‘implicit MX‘ and attempts to deliver the mail to the IP address (if any) of the bare domain name.

As it turns out, no explicit MX record is indeed the most widely found configuration — almost 119 million domains (58% of all domains) are lacking any such resource record. Of those, 76 million (64%) do have an IP address and thus could at least theoretically receive mail; reversing those IP addresses again, we note that 28.8 million are AWS IPs (in the amazonaws.com., awsglobalaccelerator.com., and cloudfront.net. domains), 18 million Google’s (for 1e100.net. and googleusercontent.com., 34.102.136.180 is used by 12.8 million domains alone), and 7.3 million Wix’s (wixsite.com).

That leaves around 42 million domains that do not have any means of accepting mail simply by not having either an MX record or an IP address. However, there are other ways that a domain owner may signal that it does not accept mail: 1.5 million (or 0.7% of all) domains have their MX set to localhost (and 425 to localhost.localdomain), which of course is a bit of a janky a way of telling folk not to bother you. Because this isn’t quite ideal, we now have a much better way of expressing the fact that a domain does not want any mail — the ‘Null MX’ No Service Resource Record, specified in RFC 7505. That is, simply set an MX record with a preference number of 0 and a zero-length label (.):

$ host -t mx livemediastreaming.com
livemediastreaming.com mail is handled by 0 .
$ 

This approach appears to be marginally more popular than using localhost. Around two million or just about 1% of all domains have a Null MX record set (that approach also has the advantage that it can help in combating impersonation without having to specify an SPF policy. A receiving mail server can reject mail upon encountering an undeliverable MailFrom/From address).

So all in all, just about 46 million domains or around 23% of all domains do not have any way of getting mail.

Number of MX records

Now let’s take a look at the ~40% (approximately 81 million) of domains with MX records. Most domains have between one and five mail exchange records, but of course, there are outliers: 464 domains have more than ten MX records, 28 more than 20, and four domains have over 100! For example, the ever so aptly named everymailbox.com domain has 398 MX records, whiteinbox.net has 253, and rm02.net has 235. All of these MX records have the same priority, suggesting they are trying to aim for some DNS round-robin load balancing here.

gaodong.com is another outlier: 123 MX records with 117 distinct priorities, similar to connectingdonors.net, which has 59 records with unique priorities from 1 to 58.

And then there are domains that spread their MX records across multiple second-level domains, although some of them are clearly misconfigured and include what appear to be non-fqdn names as well as some that simply don’t resolve at all:

$ host -t mx trustedomain.com
trustedomain.com mail is handled by 10 imtat4.       # these appear to be
trustedomain.com mail is handled by 5 imta6.         # non-fqdn names under
trustedomain.com mail is handled by 5 imta21.        # the trustedomain.com domain
[... 40 more records like that ...]
$ host -t mx dabafunk.xyz
dabafunk.xyz mail is handled by 10 mail.dabafunk.xyz.
dabafunk.xyz mail is handled by 0 smtp.dabafunk.xyz.
dabafunk.xyz mail is handled by 2 mail.bhargo.       # similarly, some are non-fqdn
dabafunk.xyz mail is handled by 1 smtp.wesak.
dabafunk.xyz mail is handled by 1 smtp.maitreya.
dabafunk.xyz mail is handled by 1 smtp.shamballa.
dabafunk.xyz mail is handled by 2 mail.wesak.        # but others don't resolve
dabafunk.xyz mail is handled by 1 smtp.bhargo.
dabafunk.xyz mail is handled by 2 mail.maitreya.
$ 

And my favourite: moshelasky.net, which set MX records for a number of completely unrelated and necessarily mutually exclusive big-name domains, basically saying ‘go give my mail to Cisco, and if that doesn’t work out, try Microsoft, Intel, Google, Yahoo… whatever’:

$ host -t mx moshelasky.net
moshelasky.net mail is handled by 70 mail.facebook.com.
moshelasky.net mail is handled by 100 mail.thunderbird.com.
moshelasky.net mail is handled by 100 mail.yahoo.com.
moshelasky.net mail is handled by 90 mail.pirisoft.com.
moshelasky.net mail is handled by 30 mail.moshelasky.com.
moshelasky.net mail is handled by 40 mail.moshelasky.net.
moshelasky.net mail is handled by 100 mail.walla.co.il.
moshelasky.net mail is handled by 20 mail.outlook.com.
moshelasky.net mail is handled by 50 mail.intel.com.
moshelasky.net mail is handled by 80 mail.grc.com.
moshelasky.net mail is handled by 100 mail.mailchimp.com.
moshelasky.net mail is handled by 100 mail.digicert.com.
moshelasky.net mail is handled by 100 mail.noip.com.
moshelasky.net mail is handled by 100 mail.google.com.
moshelasky.net mail is handled by 60 mail.microsoft.com.
moshelasky.net mail is handled by 100 mail.windows.com.
moshelasky.net mail is handled by 10 mail.cisco.com.
$ 

Valid MX records

But ok, let’s look at the domains with reasonable MX records. In the 30 million unique servers listed, we expect to see several of the popular email and hosting providers’ mail servers, but of course, less popular domains will have their own MX records that are likely to be unique. In fact, almost 98% of all domains have a globally unique mail server, making only a single appearance. Of the other 380k mail servers, around 2k appear more than 1,000 times. The top 20 most frequently used mail servers are shown in Table 1.

Rank Number of instances Hostname Company / organization
1 10.3M mailstore1.secureserver.net. GoDaddy Hosted Mail
2 10.3M smtp.secureserver.net.
3 9.6M aspmx.l.google.com. Google
4 9.5M alt1.aspmx.l.google.com.
5 9.5M alt2.aspmx.l.google.com.
6 6.7M alt3.aspmx.l.google.com.
7 6.7M alt4.aspmx.l.google.com.
8 3.9M eforward1.registrar-servers.com. Namecheap
9 3.9M eforward5.registrar-servers.com.
10 3.9M eforward4.registrar-servers.com.
11 3.9M eforward2.registrar-servers.com.
12 3.9M eforward3.registrar-servers.com.
13 2.7M aspmx2.googlemail.com. Google
14 2.7M aspmx3.googlemail.com.
15 1.1M mx3.mail.ovh.net. OVH / OVH Groupe SAS
16 804K mx01.1and1.com. IONOS / United Internet AG
17 802K mx00.1and1.com.
18 793K mx4.mail.ovh.net. OVH / OVH Groupe SAS
19 784K mail.h-email.net. Unknown / parked domains?
20 784K smtpin.rzone.de. Strato AG / United Internet AG

Table 1 — The top 20 most frequently used mail servers.

You can see an obvious trend in Table 1: Google’s mail servers are rather popular (although not the most popular), and of course, chances are that domains that have, for example, alt1.aspmx.l.google.com. as one MX will likely have also alt2.aspmx.l.google.com. as a second record. This suggests that we can gain more insights by reducing them to their domain name.

MX record domains

To better understand who the operators of these mail servers are, I flattened the data such that a domain that contains MX records pointing to, say, aspmx.l.google.com., alt1.aspmx.l.google.com., and smtp.secureserver.net. would be counted once each for the domains google.com and secureserver.net.

This breaks down our data set to 21 million unique domains and the top 20 domains in which we find the most MX records are shown in Table 2.

Rank Number of instancesDomainCompany / organization
146.7Mgoogle.com.Google
222.5Msecureserver.net.GoDaddy Hosted Mail
319.7Mregistrar-servers.com.Namecheap
47.4Moutlook.com.Microsoft
56.9Mgooglemail.com.Google2
63.4Movh.net.OVH / OVH Groupe SAS
72.4Mmailspamprotection.com.SiteGround
81.8Mhostedemail.com.Tucows / OpenSRS
91.7M1and1.com.IONOS / United Internet AG
101.6Mzoho.com.Zoho Corporation
111.6Mjellyfish.systems.Namecheap
121.3Mone.com.One.com
131.3Mqq.com.Tencent QQ
141.2Mionos.com.IONOS / United Internet AG
151Mgandi.net.Gandi SAS / Your.Online
161Mrzone.de.Strato AG / United Internet AG
17992Kkundenserver.de.IONOS / United Internet AG
18973K123-reg.co.uk.123 Reg
19835Kh-email.netUnknown / parked domains?
20765Koxcs.netEuroDNS / Datacenter Group
Table 2 — The top 20 domains in which we find the most MX records.

Note: h-email.net appears to be a domain used primarily or exclusively for parked domains by, for example, Parking Crew. A peculiarity of the domain is its SPF record (ip6:fd96:1c8a:43ad::/48 -all), which allows only traffic on an IPv6 Unique Local Address (ULA), despite mail.h-email.net having only IPv4 addresses that belong to Digital Ocean and Hetzner Online GmbH.

Obviously, we can combine some of the domains by company or organization to better reflect the concentration of the mail servers. With that, we find that Google takes the lion’s share of domains with about 34%, GoDaddy around 14%, Namecheap 13.5%, and Microsoft trailing behind with about 4.7% (Figure 1). Note, the percentages here are not quite accurate, since they are over only those mail servers that are used by 1,000 or more domains. Overall 21 million mail servers are reduced somewhat, but the proportional dominance of the top domains remains.

Pie chart of distribution of the top MX domains listed in Table 2, by organization.
Figure 1 — Distribution of the top MX domains listed in Table 2, by organization.

Note, figure 1 shows all generic second-level domains but excludes ccTLDs. Necessarily, this skews the findings: We expect European economies, for example, to favour non-American service providers.

Spot-checking 100,000 domains each from .ch, .fr, and .se — three of the only 17 ccTLD zone files / domain name listings I was able to access — shows OVH and Gandi ahead of Google in .fr, Hostpoint AG and Infomaniak in the top three in .ch, and the Swedish One.com not surprisingly taking the top spot in .se, but a full analysis of all ccTLD zones would obviously be needed to get a complete view.

Stats for top 1M domains

Looking at all domains tells us which mail servers are listed most frequently, but that of course includes hundreds of thousands if not millions of parked domains, spam domains, one-time or dormant domains, and so on. Let’s instead look at the Tranco Top 1 Million list and see if our distribution changes.

For those 1 million domains, we find around 433k distinct MX servers in 230k domains. The top 20 mail server domains there are slightly different from those for all domains:

RankNumber of domainsDomainCompany / organization
1138Kgoogle.com.Google
294Koutlook.com.Microsoft
359Kgooglemail.com.Google2
415.8Kyandex.net.Yandex LLC
513Kmimecast.com.Mimecast Limited
612.8Kpphosted.com.Proofpoint, Inc.
79.5Kqq.com.Tencent QQ
89.2Kregistrar-servers.com.Namecheap
98Ksecureserver.net.GoDaddy Hosted Mail
105.7Kbarracudanetworks.com.Barracuda Networks
115.5Kzoho.com.Zoho Corporation
124.7Kamazonaws.com.Amazon Web Services, Inc.
134.4Kemailsrvr.com.Rackspace Technology
144.1Kyandex.ru.Yandex LLC
153.7Kiphmx.com.Cisco IronPort Hosted MX
163.5Kmail.ru.VK / Mail.ru Group
173.5Kovh.net.OVH / OVH Groupe SAS
183.4Kmailspamprotection.com.SiteGround
193Kppe-hosted.com.Proofpoint, Inc.
203Kbeget.com.Beget LLC
Table 3 — The top 20 mail server domains from the Tranco Top 1 Million list.

Note, in economies where ‘gmail’ was already trademarked, Google uses the googlemail.com domain. This includes, for example, the UK, Germany, Russia, and Poland.

Pie chart of distribution of the top 1M domains' mail servers listed in Table 3.
Figure 2 — Distribution of the top 1M domains’ mail servers listed in Table 3.

We observe that amongst the top 1M domains, many outsource mail not just to the big providers (Google and Microsoft together account for 60% of all!), but often add another layer of email protection via different, more specialized service providers such as Proofpoint, Barracuda Networks, or Cisco / IronPort. Those may then well also hand the mail to, for example, Google or Microsoft, further increasing their share, but that remains opaque to us from the outside.

Summary

In summary, some of the information were we able to pull out of our MX data collection includes:

  • 58% of all domains (119 million) have no MX record (42 million of those have no IP).
  • 1% of all domains (~2 million) use an RFC 7505 ‘Null MX’ (0 .).
  • 0.7% of all domains (~1.5 million) use localhost.
  • 40% of all domains (81 million) have an MX record, yielding around 30 million unique records in 21 million unique domains.
  • 98% of those are unique, and around 380k mail servers are used by more than one domain.
  • ~2,000 mail servers are used by >1,000 domains each; the most frequently used MX records are GoDaddy’s mailstore1.secureserver.net. and smtp.secureserver.net. (used by 10.6 million domains each) and Google’s aspmx.l.google.com. (used by 9.6 million domains).
  • 34% of all domains (53.7 million) use one of Google’s mail servers, 14% (22.5 million) one of GoDaddy’s, 13.5% (~21.3 million) one of Namecheap’s.
  • For the Top 1M domains, over 60% use Google’s (41%) and Microsoft’s (20%) mail servers.
  • Many mail protection services dominate the remainder.

So all in all, the answer to the question of who can read your email pretty much boils down to — yep — ‘Google and Microsoft’. Even if your domain doesn’t use one of their mail servers, chances are that whoever you are sending mail to does.

To be fair, these companies are going to be doing a much better job at running and securing your email than you are, and outsourcing this critical functionality often makes good sense. And yet, this is another example of the continuously increasing centralization of the Internet. Our businesses, just like our personal online lives, are concentrated in the hands of just a few companies.

Jan Schaumann is a Distinguished Infrastructure Security Architect, and Adjunct Professor of Computer Science, with an interest in information security and the overall health of the internet, as well as the safety and privacy of its users. You can follow Jan on Twitter and Mastodon.

This post is adapted from the original at Jan’s Blog.

Rate this article
Discuss on Hacker News

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Top