At IETF 108, for a meeting that was originally meant to take place in my hometown, a few of us developed and presented what the IETF calls a Technology Deep Dive, focused on the DNS.
During preparations, it soon became evident that a single session would not be enough to cover the topic, so we deferred a lot of content for a second session to be delivered at IETF 110.
I was tasked with putting together a section on ‘Myths and misconceptions’ and this text is a companion to that presentation, which is now available to view online.
Until the (more or less) recent explosion of new DNS RFCs (aka the ‘DNS Camel’, loosely referring to the straw that broke the camel’s back), I used to think of the DNS as something similar to chess, with a fairly simple set of rules that developed into complex systems when deployed. However, with a protocol that is both at the core of the Internet and over 30-years-old, and whose development started in much simpler times and in a reduced circle of colleagues, the documents that described it took for granted a lot of shared knowledge in that initial community and were, at times, somewhat liberal in their level of detail and precision (see the section on wildcards, for instance).
As a result, a few things were not always as clear as they might have been. This, in time, led to some myths and misconceptions. We’ve tried to address a few of those in the IETF Deep Dive.
Everyone wants a pony
The DNS has enjoyed astounding success and universal deployment on the Internet. It is at the very core of the start of pretty much every transaction on the Internet and has been able to scale in what appears to be a boundless manner.
As a result, many people try to resolve their new technology deployment barriers by ‘just putting it in the DNS’ (there’s a mug for that). This is just fine, if you know what you’re doing and how you’re doing it.
Instigated by the IETF’s own documents, some technologies saw the TXT Resource Record as the way to address their needs.
That would work reasonably well if your application was the only one in the world, but the moment you show something nice to people, everyone wants one too. So, you can add your own. The DNS does not limit you to a single TXT record for a name, and if you want one of your own you can just add another TXT record, formatted using your own secret sauce!
However, the DNS defines the concept of a Resource Record Set as the indivisible set of all resource records of the same type (in this case, TXT) present at a given name. What this means is that if you want to get a particular set of TXT records for a domain for your particular application, instead you get all of the TXT records present at that name. The task is then to filter out the irrelevant ones before processing the resulting sub-set that, in theory, is intended for your application.
Here is a small example; a DNS query for TXT records in a well-known domain. At least four different applications are in use:
google.com. 3569 IN TXT "v=spf1 include:_spf.google.com ~all" google.com. 269 IN TXT "docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e" google.com. 3569 IN TXT "globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8=" google.com. 3569 IN TXT "facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95" google.com. 269 IN TXT "docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
A better way
If you need to use the DNS, write down your thoughts and needs for your application in an Internet draft with the specification, and a standing panel of subject-matter experts will review it reasonably quickly.
In the absence of any special-processing requirements (which would require new code in the DNS servers, meaning that it is a simple key/value fetch associating a domain name to some data, in much the same way people are using TXT records), you would be up and running in a very short time and all the installed base of standards-compliant DNS servers will treat it transparently, so your application/service/technology can then consume the information.
And a most definitely worse way
Some applications have tried to solve the problem of clashing record names or types, or similar problems, by creating their own name space or sub-space.
In the first case, you are no longer using the Internet’s DNS, so perhaps this is not the way to achieve universal deployment for your new technology.
In the second case, a new domain subtree from a Top-Level Domain (TLD) is created, either by applying for a new TLD to be included in the Internet’s DNS with ICANN, or by just making one up.
This latter option, making one up, causes two types of problems: first, you are not in the Internet namespace and second, if you (or anyone else for that matter) ever wants this name to be included in the Internet’s DNS name space, it will probably cause a name collision.
In the meantime, people and applications that think your name is a DNS name — because it looks like one — will have a hard time trying to figure why things don’t work because they are not included in your pseudo-private ‘special’ name space.
Getting a new ICANN TLD is, however, both expensive and time consuming.
None of the options here are good and you are better off working with the option in the previous section. If you want to distinguish a value attribute for a particular purpose, the lowest friction path is to go through the process of obtaining a new DNS Resource Record type.
Are (domain) names for humans?
Today domain names are all about marketing, positioning in search results, and so on, but that is not how it started, and this has a few consequences.
Originally, domain names and host names were meant to identify organizations and machines, for service connection and delivery, in a way that humans could remember. Before DNS, these were stored in a text file. There was a centrally produced file that got distributed, with some frequency, to all participants but one could just as well have one’s own local version with local differences.
The DNS defined labels as eight-bit clean strings of bytes, but both applications and implementations of DNS software interpreted name labels as ASCII characters (a sign of the times). Not only that, but domain names were also deemed to be case insensitive for comparison purposes.
These domain names were used by many applications and many protocols at once, so it would be horrendously confusing if the same host system was called ‘Chris’ on DECnet, ‘Alice’ on SNA, and ‘Sam’ on the Internet. So, the names of hosts were constrained to fit within the requirements of many network protocols and many applications. Accordingly, we adopted the lowest common denominator approach to the rules of how to build names.
This resulted in a limited character set being available for use in domain names and when the Internet went fully global, there were sustained requests to expand the DNS with the ability to represent names in all the world’s scripts.
After the failed attempt of the bit-label introduction, Internationalized Domain Names (IDN) were invented, and then reinvented, and reinvented again and these days most international scripts (and other assorted symbols such as emojis) can be represented in the DNS through an encoding mechanism (known by the fancy name of punycode), which transforms the panoply of Unicode characters into ASCII strings that good-old DNS software is happy with. For instance, ‘João’ becomes the punycode (fun!) string ‘xn--joo-nla.
While this sounds great, and is good progress, we soon realized that any new opportunity could be used for bad as well as for good and the bad actors usually move quicker than the good ones.
I am referring to the way we present those names to the user. Turns out that glyphs (the visual representation of a given Unicode ‘character’ in each font) sometimes look the same even if they are representing different ‘character’ codes (aka code points). This can be used to trick users into connecting to sites other than their intended endpoints, because it is near impossible, without help from the web browser or application, to visually differentiate the intended from the forged name, and if you can make one domain name look precisely like another then bad things can follow. In these examples: http://google.com, http://gοοɡle.com, http://gοogle.com, which one takes you to the real Google? Do any of them?
It is useful to understand that the DNS is not a search engine. It answers questions precisely. It does not find ‘close matches’. The only level of name transformation in the DNS is case-insensitivity; a query for Www.ExamPLE.com will produce the same result as a query for www.example.com. But that’s it.
The problem with attempting to map Unicode into the DNS is that there are many ways to represent the same pattern of glyphs on a screen. But while they may all look the same in the DNS, they are all different domain names. Odd things can happen such as the name ‘dots….example.com’ which may look like a badly formed domain name until you realise that it uses the Unicode ellipsis character, and it’s stored in the DNS as the validly formed DNS name ‘xn—dots-tc7a.example.com’.
The debate on how to make things better across a wide diversity of languages, scripts, and glyphs will continue for a bit longer, as in forever.
Wildcards in the DNS
Did you know the DNS has wildcards? You know, what geeks use to match multiple character strings according to defined patterns. Well, it does, sort of. The DNS does have one, and only one, wildcard feature, but being the DNS, even one is enough to create confusion in people’s minds. So much so that there was an entire RFC (published 19 years after the original introduction of the DNS) devoted to explaining this to the experts themselves. How can this possibly be?
Well, the ‘*’ character only stands for a wildcard, as opposed to being the plain asterisk ‘*’ character, if it appears on its own in a label, AND, if that label is the leftmost label in the DNS name. Anywhere else, or combined with anything else in a label, the asterisk character is just another character. It is probably not a terribly good idea to include it in a name though, as it can create some amount of confusion.
Another peculiarity of DNS wildcards is that even though there is nothing to their left, they can match things to any number of labels. For instance, the name ‘*.example.com’ would match a query for ‘a.example.com’ but it would also match ‘e.d.c.b.a.example.com’. This is in contrast with wildcards in DNS names used in X.509 domain name certificates, commonly used in HTTPS servers, that only allow a wildcard to match one level of label. A X.509 certificate for ‘*.example.com’ would match the name ‘a.example.com’, but not match ‘e.d.c.b.a.example.com’.
When what you ask for doesn’t exist
Sometimes you query the DNS for information and the answer comes back empty. There are several ways this can happen.
First, you might have encountered an error at some of the queried servers. This will be made clear by the server returning an error code, such as REFUSED, NOTIMP (not implemented) or SRVFAIL (this is the catch-all error code that says ‘Yes, I’m working, and on any other day at some other time I might’ve given back an answer, but not today and not right now. And, no, I’m not going to tell you why’).
Oddly enough, none of these errors is terribly useful and many DNS implementations, upon receiving such DNS error responses, interpret them as saying that the particular server or resolver that you’ve asked is not answering you, but you may have better luck if you repeat the question to another server, if one exists, or another recursive resolver if you have another one configured.
In some cases, such as DNSSEC validation being unable to validate the response, the more appropriate response would be: ‘This is bad, as the response I received cannot be validated. I’m withholding the response so as not to mislead you’. The IETF is currently working on extending the palette of DNS error messages.
However, you can also get back two different types of answers with no data in them, with different meanings:
- Getting NXDOMAIN (originally ‘Name Error’ but not usually referred to by that name) means that DNS Name does not exist, at all, at least not at that time, and neither do any other names ‘below’ it, meaning that there are no subdomains for this name.
- Getting NOERROR and no data, usually referred to as a NODATA response
- One possibility is that this response means there is no data for the exact question you asked but there might be other data. For instance, if you ask for the IPv6 address pointed to by the name but the DNS only has an IPv4 address at that name, it means that the data TYPE you asked for does not exist but there is data for other TYPEs associated with that name
- A second possibility is that the exact name you asked for does not exist but there are subdomains that do exist. For instance, imagine that ‘a.b.example.com’ exists in the DNS but ‘b.example.com’ does not. When asking for ‘b.example.com’ you get an empty NOERROR response. The ‘b.example.com’ name is now referred to as an ‘empty non-terminal’. In the DNS tree containing all the names, it represents a node that is empty (has no data associated) but from which further branches of the tree hang.
CNAMEs are bad neighbours
Another type of data defined in the original DNS specification that still causes no end of trouble for people is the CNAME resource record. The CNAME, an abbreviation for CANONICAL NAME, is the DNS way of doing an alias. If a name is represented in the DNS by a CNAME, then what the DNS is telling you is that the ‘real’ name with the ‘real’ data, is the CNAME’s target name and the name you asked for is just an alias.
This means there can NOT be any other DNS data at the original name because it is not the real name, so a CNAME precludes the existence of any other data at the original name. No data at all: not a SOA record, an NS record, or any other data associated with this name. Nothing. This implies that there can NOT be a CNAME at the apex of a zone (such as at the ‘example.com’ name) since a zone apex must, by definition, have an SOA record, and one or more NS records, to list its name servers.
Having said that, you might encounter CNAME records at the zone apex on the Internet and it seems to work. This is only possible because the DNS implementors have long stood by Postel’s Robustness Principle (aka Postel’s Law), which goes something like: be conservative in what you do, be liberal in what you accept from others.
CNAMES are very commonly used in the area of content hosting, where the client’s service name is mapped into the Content Distribution Network’s namespace using a CNAME, and the inability to perform this mapping at the zone apex is a major source of frustration, particularly as it appears to mostly work, even if the DNS specification says ‘no’. There have been various attempts to ‘fix’ this, including proposals for a ANAME record, a more generic alias form. The latest incarnation of a ‘fix’ is the SVCB record, which attempts to generalize the alias form in the DNS.
What does ANY mean in the DNS?
This is another query type that causes some people to be surprised. ‘ANY’ is a type of query that was defined in the original specification, and was precisely defined, but many of us appeared to think it actually meant something else!
This is because ANY is not ALL.
The ANY query type literally means ‘give me any information you have’ associated with the domain name I’m asking about. If the responder has all the information, for instance, if you were to ask an authoritative server for the name using the ANY query type, then you will get the lot, or ‘ALL’.
However, if the question is posed to a recursive resolver it may have some information associated with the name in its cache from previous responses it has answered. That is likely to be not all the information, because it didn’t need to get all the information to answer those earlier questions (if it was asked about the A records but not the AAAA records so it only got the ones it needed, and that is what sits in its cache), so when asked about ANY, it will provide any-information-it-has, not all-there-might-be, elsewhere.
The ANY query type has been abused in UDP-based DDOS attacks, as the response is often far larger than the original query. These days there is a trend to reinterpret ANY to mean ‘provide something, but preferably as little as possible.
What transport does the DNS use? UDP? TCP? Anything else?
These days the answer to this question is ‘all of the above’. Since its initial specification, the DNS was meant to use UDP and TCP as valid transports, with UDP being the primary choice and TCP being limited for use as a backup, to be invoked when UDP had issues (which has been the case more often than one might think), and for certain specific operations.
From the beginning, not allowing the DNS to use TCP in a network would be a good way of getting in trouble without needing to, yet we deployed a whole collection of firewalls that blocked TCP port 53.
Today, UDP is still the main transport for the DNS, though it is now being challenged both by the DNS’s own evolution with its corresponding growth in response sizes, and by the appearance of new transport options such as DNS over HTTPS (DoH), DNS over TLS (DoT) and eventually DNS over QUIC and no doubt others to come.
This is an area that is quite interesting to follow right now because new options in transport can potentially provide an interesting and viable path for DNS evolution, beyond the addition of what is increasingly feeling as hacks and patches today.
Do all DNS servers answer the same question with the same data?
In short, nope! Although, mostly, they do, except when they don’t!
It is often the case that engineers refer to the DNS as a loosely coherent, distributed database. The looseness of the coherency was meant to describe the outcome of some elapsed time between the publication of new data by a primary server and its propagation to the other authoritative servers for the zone, as well as acknowledging that caching servers will retain data in a response for some time, the duration of this cache retention time proposed by the value of the Time To Live (TTL) value that was provided with the data. If things change at the authoritative servers while the cache still has non-expired data, there will be a difference for some time, which will self-correct upon cache expiration, hence the ‘loose coherence’ characterization.
More recently, however, the coherency of the DNS has been altered by ingenious uses of the DNS that provide different answers depending on who is asking, a consequence of the appearance of CDNs and their aim to steer users to delivery points as close to them as possible. A good example is the use of the EDNS Client Subnet (ECS) extension in DNS queries, where metadata relating to the location of the original client is added to a DNS query. This allows a DNS server to generate a response that is ‘tailored’ to the querier, such as the address of a server that is the ‘closest’ to the client.
Lest you think that to be a good idea, I should remind you of the caveat attached to the Client Subnet specification: “We recommend that the feature be turned off by default in all nameserver software.”
Not everything published in an RFC is necessarily a good thing to use!
The DNS: Loosely coherent and loosely defined
The loose coherence of the DNS is a feature that everyone has accepted as a slightly undesired property that you just have to accept in a system of its nature.
However, all these years have shown that as much as engineers strive to define protocols in a precise manner, they will also push the existing protocols into new directions and uses, and in so doing, will ask questions that the standard specifications leave unanswered.
This is nothing but a testimony to the long life of a protocol that was defined in the very early days of the Internet and has evolved and adapted to a completely different landscape. The DNS protocol is now older than most Internet users. How much longer can the DNS continue to be part of the bedrock of the Internet? When will the time come to replace it with something else?
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.