The joy of TXT

By on 19 Jul 2023

Category: Tech matters

Tags: , ,

Blog home

TXT records are perhaps the most flexible type of Domain Name System (DNS) records available — but have you ever wondered how they’re really used? To see if we can answer this, I examined TXT records of 1 million domains to see if there’s any rhyme or reason as to how people employ this quirky, open-ended record type.

The TXT talk

Probably the most common way that the DNS is described is as a way to map domains to IP addresses. But this only tells part of the story; domains can have all sorts of records associated with them other than just IP addresses. These different resource record types, known as RRTYPEs, are used for things like aliasing a domain to another one (CNAME and DNAME records), specifying which mail servers to use for a domain (MX records), or securing domains against poisoning attacks with DNSSEC (DS, DNSKEY, and so on) — and a whole bunch of other things.

One of the most common, and versatile, types are TXT records. They can contain any arbitrary text, and domains can have multiple TXT records associated with them. Each TXT record can also have multiple strings associated with it. They’re often used to configure mail security for a domain, verify domain ownership, and just record various bits of information.

Because they’re arbitrary though, they can really be used for anything. So I decided to take a closer look, by checking the TXT records for 1 million domains and seeing what I could find.

TXTual history

Due to their flexibility, it’s hard to predict how TXT records will be used. But where did they come from? In 1987, TXT records were first defined in RFC 1035 as “descriptive text”. The only note provided was that “the semantics of the text depends on the domain where it is found”. So that left it pretty open.

Later, in 1993, RFC 1464 was published — “Using the Domain Name System To Store Arbitrary String Attributes”. This formalized the use of TXT records to store configuration settings for a domain in the format ‘key=value‘. While this format wasn’t required when using TXT records in this way, it definitely seems to have become the most common method used since then.

Another common use of TXT records is in other RFCs. Because of their flexibility, TXT records can be used as a place to store details about a protocol or framework, as in RFC 7208 (Sender Policy Framework). These types of RFCs specify the exact values and format that should be used, as an alternative to defining a whole new RRTYPE like with LOC.

Notes on the format of TXT records

Your basic TXT record looks like you’d expect — a simple character string:

“Example text here”

Domains can also have multiple TXT records associated:

“Example”
“text”
“here”

There is no way to specify any kind of order, so multiple records can be returned in a different order each time you ask for them.

But as mentioned earlier, it’s also valid to specify multiple strings:

“one” “two” “three”

Something I discovered while writing this is that RFC 7208 has an interesting definition of how this usage is interpreted:

3.3.  Multiple Strings in a Single DNS Record

   As defined in [RFC1035], Sections 3.3 and 3.3.14, a single text DNS
   record can be composed of more than one string.  If a published
   record contains multiple character-strings, then the record MUST be
   treated as if those strings are concatenated together without adding
   spaces.  For example:

      IN TXT "v=spf1 .... first" "second string..."

   is equivalent to:

      IN TXT "v=spf1 .... firstsecond string..."

   TXT records containing multiple strings are useful in constructing
   records that would exceed the 255-octet maximum length of a
   character-string within a single TXT record.

I think this applies to all DNS records, but it might just be for Sender Policy Framework (SPF) records in particular. If I’m right, then this doesn’t appear to be a well-known fact, because several domains out there specify multiple strings but seem to assume that a space would be added between them when concatenating them together. One particular example is for one of the DNS giants of the Internet, Akamai, who have the following set for akamai.net:

"This" "is" "not" "the" "nameserver" "you" "are" "looking" "for."

Which, according to RFC 7208, should end up as:

"Thisisnotthenameserveryouarelookingfor"

Move along.

TXT work (source data and methodology)

I wrote a pretty basic shell script that I’m not particularly proud of, but worked well enough to go through two different lists of top domains and check TXT records for each entry:

#!/bin/bash
domainsfile="domains.txt"

while read domain
do
    echo "--"
    echo "[$(date)] CHECKING DOMAIN $domain"
    echo "--"

    host -W 5 -t txt "$domain". # -W 5 for a 5 second timeout
done < "$domainsfile"

The general intent was to ensure the output was human-readable because I wanted to be able to look through it myself, but also useful to allow for parsing to get totals and more.

I then ran this and captured the output:

./get-txt-records.sh > host.out 2>&1

This produces output in the file host.out for each domain that looks like this:

--
[Mon 03 Apr 2023 04:12:36 PM BST] CHECKING DOMAIN amazonaws.com
--
amazonaws.com descriptive text "pf2vv39dfkf9tszsg5lggfs6tp6bkjn4"
amazonaws.com descriptive text "v=spf1 include:amazon.com ~all"
amazonaws.com descriptive text "spf2.0/pra include:amazon.com ~all"

The script took about 48 hours to run each time against a nameserver that hadn’t been specifically warmed on the results. The first run was with the Tranco list generated on 10 March 2023. This wasn’t as interesting as I’d hoped though, I think because it was all effective Second-Level Domains (eSLDs, which I call ‘parent domains’). So I ran it again with the Cisco top 1M domains list downloaded on 13 March 2023.

I did a lot of checking with just cut, grep, and so on, but then I wrote a little script to import all records into an SQLite database to make things easier.

All of the files mentioned here, including scripts, the SQLite database, and files created while looking at record lengths, unique records, and more, have been uploaded to GitHub under a Creative Commons Zero licence:

TXT by numbers

Number of TXT records765,650
Number of unique TXT records595,398
Domains with TXT records584,244
Domains without TXT records415,756
Average number of TXT records per domain (that has them)1
Longest TXT record7,886 characters
Second-longest TXT record5,498 characters
Total length of all TXT records concatenated49,813,321 characters
Table 1 — TXT record totals.
DMARC1 records4,218
SPF1 records164,459
SPF1 records with “include”131,892
SPF1 records with just “v=spf1 -all”8,444
SPF2 records5,091
SPF3 records808
Table 2 — DMARC and SPF records.
key=value TXT records630,317
Empty records with just “”183
Empty records with spaces (“\s+”)12
Empty records with just “~”109
Table 3 — key=value TXT records.
Verification / confirmation records402,230
Table 4 — Verification / confirmation records.
google-site-verification170,225
MS56,160
Facebook28,273
Globalsign17,396
Apple17,201
Table 5 — Top 5 verification records.
68 characters170,941 (mostly Google site
verifications)
13 characters48,763 (mostly MS= TXT records)
32 characters48,648 (random strings)
59 characters33,205 (Facebook verifications)
26 characters25,559 (random strings)
Table 6 — Fixed-length TXT records.
URLs708
Non-HTTP URLs16
Email addresses4,487
Hello worlds3
Greetings14
Swear words not appearing in domains0
<script> tags1
Embedded DNS records (“IN …”)225
Mentions of “ALIAS for” another domain363
Security code769
Please27
References to tickets28
Table 6 — Miscellaneous observations.

Conclusion

So, what are TXT records used for exactly? Well, we can see that key-value settings are the most common use case, with domain verification records being the majority of those. SPF records also make a strong showing, as well as a lot of seemingly random fixed-length records that are probably being used for encoding data somehow.

But overall, they really are used for anything and everything. There are some patterns we can pick out, but the lack of rigid rules means that the freedom to put whatever you like in a TXT record has been liberally accepted by the Internet as a whole. Which, if you ask me, is a good thing. Having something in the DNS that can act as a config store, notes field, playground for new standards, or even the basis for file storage (not that this would really be recommended), has meant that we haven’t had to wait for standards to catch up in order to continue making use of this wonderful system that underpins a fundamental part of the Internet.

Peter Lowe is the FIRST DNS Abuse Ambassador and Co-Chair of the DNS Abuse SIG. He has worked in or around Internet protocols since joining the second Internet cafe in the UK in 1995, co-hosts the Not So Critical Update podcast, and maintains one of the popular blocklists used by ad blockers and tracking prevention software.

Adapted from the original post on RIPE Labs.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top