The Domain Name System (DNS) has intricate features that interact in subtle ways, making it a complex and challenging protocol to characterize. Such complexity can arise when practitioners design new protocol features or extensions targeted towards realistic use cases without considering the collective impact such extensions can have on the system through their mutual interactions.
Understanding the theoretical complexity of a networked system is important because it has broad ramifications related to the ease with which humans and machines can analyse the system. For example, even for finite network topologies, simply determining if Border Gateway Protocol (BGP) will converge is NP-complete.
In this post, I will give a brief overview of key findings from a paper that my colleagues and I from the University of California and Microsoft presented at the recent HotNets 2021 conference into the theoretical complexity of the DNS.
The DNS can recognize regular languages and generate strings in a context-free language
As one of the oldest and most widely used distributed networking protocols on the Internet, understanding the complexity of the DNS has implications for the cost of verification and for understanding potential attack vectors. Further, unlike BGP, the DNS’s power can be directly used by applications.
Figure 1 shows the different classes of automata and the languages they recognize, with the Turing machine being the most powerful and complex. From our investigation, we found that the DNS has surprising complexity to the point it can simulate:
1. Deterministic finite automaton (DFA) and consequently can recognize arbitrary regular languages.
2. Pushdown systems (PDSs) and consequently can generate strings of arbitrary context-free grammars.
The main reason behind this surprising complexity is the DNAME record type for domain redirection, which is a partial rewrite, unlike CNAME. With the help of DNAME, all the DFA transitions are encoded into a single zone file.
To test whether a given string is accepted by the DFA, the resolver encodes the string as the query domain name and sends it to the nameserver containing the encoded zone file. The text record response from the nameserver will contain ‘accept’ if, and only if, the string is in the language. For more details and an example, check out our paper: How Complex is DNS?
An application of this DFA simulation by the DNS is that one can use it to control access to domains in an organization or for parental control. If the allowed domains can be represented as a regular expression, then the validation for a domain can be done in the DNS as part of the DNS lookup for the domain. Regarding PDSs, the DNS can express nondeterministic PDSs and generate strings in arbitrary context-free languages by extension.
A PDS is a transition system whose states involve a stack of unbounded lengths. An important use of the PDS is in representing sequential programs with (possibly recursive) functions. A PDS can be encoded in the DNS, in general, with the help of multiple nameservers. Specifically, the combination of DNAME records and non-determinism due to nameserver delegation allows the DNS to encode both deterministic and nondeterministic PDSs. Refer to our paper or talk for details.
How can we use this information
Our work represents an initial investigation into the complexity of the DNS. There are several directions for future research, including understanding:
- The impact on DNS zone file verification: The most efficient algorithms known for PDS reachability — determining whether a given configuration can be reached in a given PDS — have near-cubic time complexity in the number of rules. Therefore, DNS zone-file verification, which requires reasoning about all possible query lookups, also has at least this complexity today. This has not only theoretical implications but is also a problem for real zone-file verifiers like GRᴏᴏᴛ. Even a simple four-record zone file with interacting DNAME loops can create close to a million query equivalence classes in GRᴏᴏᴛ, quickly blowing up its verification time. Interestingly, however, verification in GRᴏᴏᴛ is linear time in the absence of DNAME records. Can we design new verification algorithms that scale well with the number of DNAME records for real-world configurations?
- Tighter bounds: Is the DNS even more expressive than a PDS or not? We believe and are working to show that the DNS and PDS are equivalent.
- Applications: Can we build some real applications that take advantage of these results? The DNS already can do spam filtering and load balancing, among others, and we can get inspired by them.
- New record types: Contributors frequently add new drafts and RFCs to the DNS specification, with new record types intended to enable new use cases. Contributions should assess how these newly proposed types affect DNS complexity before making them a standard.
- Security implications: Also, a natural direction to explore is whether this complexity leads to new attack vectors.
Siva Kesava is a PhD candidate at the Department of Computer Science, UCLA.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.