Understanding DNS query composition at B-Root

By on 1 Nov 2023

Category: Tech matters

Tags: , , ,

Blog home

In order to better understand how the DNS network is used, my colleague Jelena Mirkovic (USC Information Sciences Institute) and I (Northeastern University) analysed 28 billion DNS query traces at a root server, B-Root.

Bad queries tend to propagate to the root zone due to the hierarchical nature of DNS, so studying traffic at a root server can provide key insights into overall network usage. We sampled data from 10 years of B-Root’s datasets from CAIDA’s Day in the Life of the Internet (DITL) project, which coordinates simultaneous data collection events across all root servers on a single day per year. 

General classification scheme

To classify DNS queries by validity, we referenced RFC 1034, the specification describes the structure of domain names. We first subdivided queries hitting B-root into one of three categories: empty (“.”), has-TLD (for example, “foo.com.”), or one-word (such as “foobar.”). We then split has-TLD into two separate categories, valid-TLD and invalid-TLD, by comparing the query’s TLD with IANA’s valid TLD list. Queries that are not in “valid-TLD”, and queries that aren’t part of an explicit standard (priming queries, minimized queries), are considered malformed. 

Additionally, within the categories one-word, valid-TLD, and invalid-TLD, we identified some interesting classes of queries, including:

  • Queries originating from Chromium-based web browsers.
  • Queries from new specifications, namely query name minimization and priming queries (as per RFC 7816 and RFC 8109 respectively).
  • Queries from Apple’s proprietary networking stack, Appletalk.
  • Queries containing explicit hex encoding (for example, “\x0a\x0b\x0c”).
Image of 2022 sankey of 4.11 billion DNS queries at B-Root.
Figure 1 — Classification of 4.11 billion DNS queries at B-Root in 2022.
Image of 2013 sankey of 1 billion DNS queries at B-Root.
Figure 2 — Classification of 1.00 billion DNS queries at B-Root in 2013.
Categorization breakdown over 10 years of B-Root data.
Figure 3 — Categorization breakdown over 10 years of B-Root data. From our definition of query validity and new additions to DNS protocol such as query minimization, we find malformed query traffic increased from 39.57% in 2013 to 67.91% in 2022.

Chromium queries

As described in a previous APNIC blog, Chromium includes a critical feature, Omnibox, which allows users to enter a website name, URL, or search terms. To determine if a user is on the Internet or an intranet, while at the same time preventing NXDomain hijacking, Omnibox will send three random DNS queries 7 to 15 characters long. As Chromium’s market share grew, this feature caused a flood of malformed queries to reach DNS roots. In 2020, almost 50% of all queries hitting B-Root were from Omnibox, and only 8% of queries that year were valid under our criteria.

Image of Chromium query growth and decline.
Figure 4 — Chromium-initiated queries from 2013 through 2022 in DITL datasets at B-Root.
Image of 2020 sankey of 3.52 billion DNS traces at B-Root.
Figure 5 — Classification of 3.52 billion DNS traces at B-Root in 2020.

Top senders

Next, we measured queries from top senders to B-Root in 2022. As shown in Figure 6, we find Amazon Web Services (AWS) accounts for 14% of all queries to B-Root, and the large majority of those queries are malformed under our criteria. Microsoft Azure, while only accounting for 3% of all queries, had a similar predominance of malformed queries. 

Image of top sender breakdown.
Figure 6 — Top query senders in the 2022 DITL dataset at B-Root.

Query minimization and priming queries

Because DNS is highly distributed, a change in DNS protocol specification can take upwards of years to be adopted throughout the entire DNS ecosystem. We quantified this propagation by measuring the presence of minimized (QMIN) queries as per RFC 7816 and priming queries as per RFC 8109. Both QMIN and priming queries are clearly observable in B-Root data — QMIN queries appear as a single valid TLD (for example “com.” or “net.”), and priming queries appear as empty “.” queries of type NS.

As shown in Figure 7, QMIN queries have seen a gradual increase since the introduction of RFC 7816 in 2016. On the contrary, priming queries remained less than 1% of overall queries from after their specification was introduced in 2017 through 2021. Then, in 2022, priming queries jumped to 35% of all queries, as shown in Figure 2.

Image of QMIN query growth since 2016.
Figure 7 — Breakdown of minimized DNS queries at B-Root from 2016 through 2022.

Dataset quirks

Aside from our main results, we made some less critical yet still interesting discoveries:

  • In 2020, 8.38% of queries hitting B-Root had a “.consul” TLD — invalid by IANA’s list of valid TLDs. This seems to be a large leak from Hashicorp’s networking platform, Consul. If this is the case, the B-Root data predated the discovery of this leak by two years!
  • Appletalk queries consistently accounted for about 1% of all queries sent to B-Root, possibly indicating legacy Apple product usage.
  • In 2014, 1.22% of queries sent to B-Root were of the Invalid TLD “.com/wawa” — was this due to a server misconfiguration at Wawa, the popular Philadelphian convenience store
  • In 2021, 0.66% of queries had the invalid TLD “.novalocal” — this seems to be a widespread misconfiguration of OpenStack software as discussed on Stackoverflow and in the documentation.
  • 0.18% of queries in 2022 had the invalid TLD “.rac2v1a” — was this due to a misconfiguration of this router?

Takeaways

The evolving landscape of the DNS ecosystem, as observed from B-Root server analyses, reveals a dynamic interplay between new Internet technologies, like Chromium and cloud services, and the changing nature of DNS queries. The significant increase in malformed queries over the last decade underscores the need for continuous oversight and collaboration among developers, administrators, and researchers.

By leveraging comprehensive datasets, such as those from the DITL project, the Internet community can gain invaluable insights into how the DNS is being used, ensuring that the DNS remains robust, secure, and responsive to future challenges and innovations.

For more information, read our paper Understanding DNS Query Composition at B-Root.

Jacob Ginesin is an undergraduate studying computer science at Northeastern University and a researcher in the NDS2 Lab. His research focuses on formally verifying critical systems and infrastructure.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top