Nearly all Internet-connected applications depend on the Domain Name System (DNS) to map human-readable domain names to their associated IP addresses. As such, its failure has the potential to create global Internet outages.
While large outages are reported in the news, smaller DNS query failures that happen daily tend not to be apparent outside of the host and user. And yet, according to recent research by my colleagues and me at the Institute of Computing Technology, Chinese Academy of Sciences, they account for around 13.5% of all DNS queries in China, with two in every three AAAA queries failing!
A primer on DNS query failures
Our findings are based on the analysis of around three billion passive DNS logs that were obtained by parsing the DNS response traffic from recursive resolvers in China to end users. User-related information, such as IP addresses, was anonymized before the data was shared with us.
Our definition of a DNS query failure was if no resource record (RR) in the response matches the query type of the requested domain (note that a CNAME response matches any query type). Because we were interested in failures caused by DNS infrastructure (as opposed to typos in domain names), we filtered out the log entries containing a tuple (query type, requested domain name) of which all the queries fail. This left us with 2.8 billion queries for subsequent analysis.
We began by computing the number and types of failed queries. Table 1 shows the percentage of the number of queries and the overall success rate of the four most popular query types.
|Number of queries||86.2%||10.4%||2.8%||0.1%||0.5%|
Overall, A and MX queries are successfully resolved most frequently, while AAAA and PTR manifest lower success rates. Specifically, the failure rate of AAAA queries is surprisingly over 64.2% — two out of three AAAA queries failed.
The majority of failures are the responsibility of a small set of domains
Figure 1 presents the cumulative distribution functions (CDFs) of the success rates across domains for A (red solid line) and AAAA (blue dash-dot line) queries. For A queries, 93.7% of domains had a success rate exceeding 95%, suggesting high reliability.
To eliminate the impact of low-frequency domains on the results, we filtered out domains that issued fewer than 100 requests and plotted the success rate of the remaining domains (the red dash-dot line) — 84.9% of remaining domains had a success rate exceeding 95% and as many as 7% of domains experienced a success rate lower than 50%. This suggested that the majority of failures are the responsibility of a small set of domains.
For AAAA queries, only 34.3% of domains had a success rate exceeding 95%. When limiting to domains whose query frequency exceeded 100, only 7.8% of domains had a success rate exceeding 95%, while around 60% of domains had almost never been successfully resolved.
Around 20% of the resolvers almost never succeed in resolving AAAA queries
Another explanation for failed queries is that the resolvers may not correctly handle queries. To explore this, we calculated the success rate of queries issued to each DNS resolver (identified by the recursive resolver’s IP address). Figure 2 presents the CDF of the success rate for the domains per resolver.
The majority of resolvers had very high success rates when serving A queries — around half experienced almost no failures. In contrast, 60% of resolvers serving AAAA queries could successfully resolve just 20% of the queries. Surprisingly, around 20% of the resolvers almost never succeed in resolving AAAA queries.
How reliable are resolvers in China?
We also investigated the reliability of public resolvers that are popular in China. Table 2 shows the success rates of A and AAAA queries handled by each public DNS resolver. We conservatively labelled the logs where the resolver had the same /24 as the end user, as processed by resolvers automatically assigned by the ISP.
We observed that DNSPod succeeded in almost all its A queries, while OpenDNS achieved just 86.3%. Again, we could see a notably lower success rate across all resolvers for AAAA queries.
|114DNS||DNS Pai||AliDNS||DNSPOD||Google DNS||Open DNS||ISP||Other|
We also found the success rates for new generic Top-Level Domains (gTLDs) and Internationalized Domain Names (IDNs) were lower than that of well-established domains, primarily because of the prevalence of malicious domains.
Use larger negative caching time-to-lives for AAAA records
Our analysis revealed a higher failure rate (13.5%) for DNS queries than what we expected. Specifically, around two-thirds of AAAA queries failed. Such a high failure rate would undermine the quality of experience perceived by end users, especially when Happy Eyeballs is enabled, as failed AAAA queries will delay the DNS resolution time.
We recommend a larger negative caching time-to-live for AAAA records associated with domains that only map to IPv4 addresses reliably.
Our analysis also implies that the resolvers are diverse in terms of query success rate, which can be used as a general guideline for choosing public resolvers.
For more information, please read out paper A deep dive into DNS behavior and query failures.
Contributors: Donghui Yang, Institute of Computing Technology, Chinese Academy of Sciences.
Zhenyu Li is a Professor at the Institute of Computing Technology, Chinese Academy of Sciences.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.