TXT records are perhaps the most flexible type of Domain Name System (DNS) records available — but have you ever wondered how they’re really used? To see if we can answer this, I examined TXT records of 1 million domains to see if there’s any rhyme or reason as to how people employ this quirky, open-ended record type.
The TXT talk
Probably the most common way that the DNS is described is as a way to map domains to IP addresses. But this only tells part of the story; domains can have all sorts of records associated with them other than just IP addresses. These different resource record types, known as RRTYPEs, are used for things like aliasing a domain to another one (CNAME and DNAME records), specifying which mail servers to use for a domain (MX records), or securing domains against poisoning attacks with DNSSEC (DS, DNSKEY, and so on) — and a whole bunch of other things.
One of the most common, and versatile, types are TXT records. They can contain any arbitrary text, and domains can have multiple TXT records associated with them. Each TXT record can also have multiple strings associated with it. They’re often used to configure mail security for a domain, verify domain ownership, and just record various bits of information.
Because they’re arbitrary though, they can really be used for anything. So I decided to take a closer look, by checking the TXT records for 1 million domains and seeing what I could find.
Due to their flexibility, it’s hard to predict how TXT records will be used. But where did they come from? In 1987, TXT records were first defined in RFC 1035 as “descriptive text”. The only note provided was that “the semantics of the text depends on the domain where it is found”. So that left it pretty open.
Later, in 1993, RFC 1464 was published — “Using the Domain Name System To Store Arbitrary String Attributes”. This formalized the use of TXT records to store configuration settings for a domain in the format ‘key=value‘. While this format wasn’t required when using TXT records in this way, it definitely seems to have become the most common method used since then.
Another common use of TXT records is in other RFCs. Because of their flexibility, TXT records can be used as a place to store details about a protocol or framework, as in RFC 7208 (Sender Policy Framework). These types of RFCs specify the exact values and format that should be used, as an alternative to defining a whole new RRTYPE like with LOC.
Notes on the format of TXT records
Your basic TXT record looks like you’d expect — a simple character string:
“Example text here”
Domains can also have multiple TXT records associated:
“Example” “text” “here”
There is no way to specify any kind of order, so multiple records can be returned in a different order each time you ask for them.
But as mentioned earlier, it’s also valid to specify multiple strings:
“one” “two” “three”
Something I discovered while writing this is that RFC 7208 has an interesting definition of how this usage is interpreted:
3.3. Multiple Strings in a Single DNS Record As defined in [RFC1035], Sections 3.3 and 3.3.14, a single text DNS record can be composed of more than one string. If a published record contains multiple character-strings, then the record MUST be treated as if those strings are concatenated together without adding spaces. For example: IN TXT "v=spf1 .... first" "second string..." is equivalent to: IN TXT "v=spf1 .... firstsecond string..." TXT records containing multiple strings are useful in constructing records that would exceed the 255-octet maximum length of a character-string within a single TXT record.
I think this applies to all DNS records, but it might just be for Sender Policy Framework (SPF) records in particular. If I’m right, then this doesn’t appear to be a well-known fact, because several domains out there specify multiple strings but seem to assume that a space would be added between them when concatenating them together. One particular example is for one of the DNS giants of the Internet, Akamai, who have the following set for akamai.net:
"This" "is" "not" "the" "nameserver" "you" "are" "looking" "for."
Which, according to RFC 7208, should end up as:
TXT work (source data and methodology)
I wrote a pretty basic shell script that I’m not particularly proud of, but worked well enough to go through two different lists of top domains and check TXT records for each entry:
#!/bin/bash domainsfile="domains.txt" while read domain do echo "--" echo "[$(date)] CHECKING DOMAIN $domain" echo "--" host -W 5 -t txt "$domain". # -W 5 for a 5 second timeout done < "$domainsfile"
The general intent was to ensure the output was human-readable because I wanted to be able to look through it myself, but also useful to allow for parsing to get totals and more.
I then ran this and captured the output:
./get-txt-records.sh > host.out 2>&1
This produces output in the file host.out for each domain that looks like this:
-- [Mon 03 Apr 2023 04:12:36 PM BST] CHECKING DOMAIN amazonaws.com -- amazonaws.com descriptive text "pf2vv39dfkf9tszsg5lggfs6tp6bkjn4" amazonaws.com descriptive text "v=spf1 include:amazon.com ~all" amazonaws.com descriptive text "spf2.0/pra include:amazon.com ~all"
The script took about 48 hours to run each time against a nameserver that hadn’t been specifically warmed on the results. The first run was with the Tranco list generated on 10 March 2023. This wasn’t as interesting as I’d hoped though, I think because it was all effective Second-Level Domains (eSLDs, which I call ‘parent domains’). So I ran it again with the Cisco top 1M domains list downloaded on 13 March 2023.
I did a lot of checking with just cut, grep, and so on, but then I wrote a little script to import all records into an SQLite database to make things easier.
All of the files mentioned here, including scripts, the SQLite database, and files created while looking at record lengths, unique records, and more, have been uploaded to GitHub under a Creative Commons Zero licence:
TXT by numbers
|Number of TXT records||765,650|
|Number of unique TXT records||595,398|
|Domains with TXT records||584,244|
|Domains without TXT records||415,756|
|Average number of TXT records per domain (that has them)||1|
|Longest TXT record||7,886 characters|
|Second-longest TXT record||5,498 characters|
|Total length of all TXT records concatenated||49,813,321 characters|
|SPF1 records with “include”||131,892|
|SPF1 records with just “v=spf1 -all”||8,444|
|key=value TXT records||630,317|
|Empty records with just “”||183|
|Empty records with spaces (“\s+”)||12|
|Empty records with just “~”||109|
|Verification / confirmation records||402,230|
|68 characters||170,941 (mostly Google site|
|13 characters||48,763 (mostly MS= TXT records)|
|32 characters||48,648 (random strings)|
|59 characters||33,205 (Facebook verifications)|
|26 characters||25,559 (random strings)|
|Swear words not appearing in domains||0|
|Embedded DNS records (“IN …”)||225|
|Mentions of “ALIAS for” another domain||363|
|References to tickets||28|
So, what are TXT records used for exactly? Well, we can see that key-value settings are the most common use case, with domain verification records being the majority of those. SPF records also make a strong showing, as well as a lot of seemingly random fixed-length records that are probably being used for encoding data somehow.
But overall, they really are used for anything and everything. There are some patterns we can pick out, but the lack of rigid rules means that the freedom to put whatever you like in a TXT record has been liberally accepted by the Internet as a whole. Which, if you ask me, is a good thing. Having something in the DNS that can act as a config store, notes field, playground for new standards, or even the basis for file storage (not that this would really be recommended), has meant that we haven’t had to wait for standards to catch up in order to continue making use of this wonderful system that underpins a fundamental part of the Internet.
Peter Lowe is the FIRST DNS Abuse Ambassador and Co-Chair of the DNS Abuse SIG. He has worked in or around Internet protocols since joining the second Internet cafe in the UK in 1995, co-hosts the Not So Critical Update podcast, and maintains one of the popular blocklists used by ad blockers and tracking prevention software.
Adapted from the original post on RIPE Labs.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.