In order to better understand how the DNS network is used, my colleague Jelena Mirkovic (USC Information Sciences Institute) and I (Northeastern University) analysed 28 billion DNS query traces at a root server, B-Root.
Bad queries tend to propagate to the root zone due to the hierarchical nature of DNS, so studying traffic at a root server can provide key insights into overall network usage. We sampled data from 10 years of B-Root’s datasets from CAIDA’s Day in the Life of the Internet (DITL) project, which coordinates simultaneous data collection events across all root servers on a single day per year.
General classification scheme
To classify DNS queries by validity, we referenced RFC 1034, the specification describes the structure of domain names. We first subdivided queries hitting B-root into one of three categories: empty (“.”), has-TLD (for example, “foo.com.”), or one-word (such as “foobar.”). We then split has-TLD into two separate categories, valid-TLD and invalid-TLD, by comparing the query’s TLD with IANA’s valid TLD list. Queries that are not in “valid-TLD”, and queries that aren’t part of an explicit standard (priming queries, minimized queries), are considered malformed.
Additionally, within the categories one-word, valid-TLD, and invalid-TLD, we identified some interesting classes of queries, including:
- Queries originating from Chromium-based web browsers.
- Queries from new specifications, namely query name minimization and priming queries (as per RFC 7816 and RFC 8109 respectively).
- Queries from Apple’s proprietary networking stack, Appletalk.
- Queries containing explicit hex encoding (for example, “\x0a\x0b\x0c”).
Chromium queries
As described in a previous APNIC blog, Chromium includes a critical feature, Omnibox, which allows users to enter a website name, URL, or search terms. To determine if a user is on the Internet or an intranet, while at the same time preventing NXDomain hijacking, Omnibox will send three random DNS queries 7 to 15 characters long. As Chromium’s market share grew, this feature caused a flood of malformed queries to reach DNS roots. In 2020, almost 50% of all queries hitting B-Root were from Omnibox, and only 8% of queries that year were valid under our criteria.
Top senders
Next, we measured queries from top senders to B-Root in 2022. As shown in Figure 6, we find Amazon Web Services (AWS) accounts for 14% of all queries to B-Root, and the large majority of those queries are malformed under our criteria. Microsoft Azure, while only accounting for 3% of all queries, had a similar predominance of malformed queries.
Query minimization and priming queries
Because DNS is highly distributed, a change in DNS protocol specification can take upwards of years to be adopted throughout the entire DNS ecosystem. We quantified this propagation by measuring the presence of minimized (QMIN) queries as per RFC 7816 and priming queries as per RFC 8109. Both QMIN and priming queries are clearly observable in B-Root data — QMIN queries appear as a single valid TLD (for example “com.” or “net.”), and priming queries appear as empty “.” queries of type NS.
As shown in Figure 7, QMIN queries have seen a gradual increase since the introduction of RFC 7816 in 2016. On the contrary, priming queries remained less than 1% of overall queries from after their specification was introduced in 2017 through 2021. Then, in 2022, priming queries jumped to 35% of all queries, as shown in Figure 2.
Dataset quirks
Aside from our main results, we made some less critical yet still interesting discoveries:
- In 2020, 8.38% of queries hitting B-Root had a “.consul” TLD — invalid by IANA’s list of valid TLDs. This seems to be a large leak from Hashicorp’s networking platform, Consul. If this is the case, the B-Root data predated the discovery of this leak by two years!
- Appletalk queries consistently accounted for about 1% of all queries sent to B-Root, possibly indicating legacy Apple product usage.
- In 2014, 1.22% of queries sent to B-Root were of the Invalid TLD “.com/wawa” — was this due to a server misconfiguration at Wawa, the popular Philadelphian convenience store?
- In 2021, 0.66% of queries had the invalid TLD “.novalocal” — this seems to be a widespread misconfiguration of OpenStack software as discussed on Stackoverflow and in the documentation.
- 0.18% of queries in 2022 had the invalid TLD “.rac2v1a” — was this due to a misconfiguration of this router?
Takeaways
The evolving landscape of the DNS ecosystem, as observed from B-Root server analyses, reveals a dynamic interplay between new Internet technologies, like Chromium and cloud services, and the changing nature of DNS queries. The significant increase in malformed queries over the last decade underscores the need for continuous oversight and collaboration among developers, administrators, and researchers.
By leveraging comprehensive datasets, such as those from the DITL project, the Internet community can gain invaluable insights into how the DNS is being used, ensuring that the DNS remains robust, secure, and responsive to future challenges and innovations.
For more information, read our paper Understanding DNS Query Composition at B-Root.
Jacob Ginesin is an undergraduate studying computer science at Northeastern University and a researcher in the NDS2 Lab. His research focuses on formally verifying critical systems and infrastructure.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.