NXNSAttack: upgrade resolvers to stop new kind of random subdomain attack

This article describes NXNSAttack, a newly discovered Domain Name System (DNS) protocol vulnerability that affects all recursive DNS resolvers. It allows for random subdomain attacks to be executed using the DNS delegation mechanism, resulting in a big packet amplification factor.

First things first

If you operate your own DNS resolver, no matter what brand it is, please upgrade to the latest version now.

(Also, if you are disappointed you have to rush with the upgrade now, talk to your vendor and ask about early notification for security releases.)

If you want to know how the attack works and how you can protect your systems next time read on.

NXNSAttack principle

The newly discovered vulnerability abuses the DNS delegation mechanism to force DNS resolvers to generate more DNS queries to authoritative servers of the attacker’s choice. How is that possible?

The whole DNS is built on the delegation principle, where authoritative DNS servers responsible for upper levels of DNS hierarchy delegate (we could also say redirect) questions for lower-level domains to different servers, thus preventing the need to maintain one huge database with DNS data for the entire Internet.

For example, this is how authoritative DNS server named ‘a.gtld-servers.com.’, which is responsible for the ‘com.’ domain, delegates questions ‘example.com. A’ to a different set of servers:

$ kdig @a.gtld-servers.com example.com A
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10976
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;example.com. IN A

;; AUTHORITY SECTION:
example.com. 172800 IN NS a.iana-servers.net.
example.com. 172800 IN NS b.iana-servers.net.
;; From 2001:503:a83e::2:30@53(UDP) in 46.9 ms

Here we can see that even though we asked for the name ‘example.com. type A’, the server authoritative for the ‘com.’ domain sent us the delegation to “example.com.”, which contains the names of two other authoritative DNS servers, but does not contain the answer for our original question.

This is so-called ‘glueless’ delegation, that is, a delegation that contains only the names of authoritative DNS servers (a.iana-servers.net. and b.iana-servers.net.), but does not contain their IP addresses. Obviously, DNS resolvers cannot send a query to ‘name’, so the resolver first needs to obtain an IPv4 or IPv6 address of an authoritative ‘a.iana-servers.net.’ or ‘b.iana-servers.net.’ server and only then it can continue resolving the original query ‘example.com. A’.

This glueless delegation is the basic principle of the NXNSAttack: the attacker simply sends back a delegation with fake (random) server names pointing to victim DNS domains, thus forcing the resolver to generate queries towards victim DNS servers (in a futile attempt to resolve fake authoritative server names).

How NXNSAttack works. — Figure 1 — How an NXNSAttack works.

Impact

The main discovery in the NXNS Attack paper is that attackers can amplify a single DNS query towards a DNS resolver + single DNS answer with fake delegations (two packets) to fire multiple random queries at victim authoritative servers, effectively using standard-compliant DNS resolvers as an amplifier for a random subdomain attack.

In practice, the packet amplification factor (PAF) very much depends on the strategy employed by the DNS resolver implementation involved in the attack. For example, the:

BIND 9.12.3 resolver resolves IPv4 and IPv6 addresses for all NS names obtained from delegation in parallel, leading to a packet amplification factor of 1000x.
Knot Resolver 5.1.0 resolves NS names one at a time and places other limits on the number of resolution steps generated by a single client query, limiting PAF to the order of tens. In fact, half of the packet amplification factor 48x is caused by a workaround for non-compliant authoritative servers. Without workarounds for RFC 8020 and RFC 7816, non-compliance PAF of Knot Resolver would be only 24x. (Yet another example that workarounds are bad, but that’s another story.)

None of these strategies are inherently wrong, they just represent different trade-offs between resources invested into single client query vs. processing multiple client queries in parallel.

In the end, spare capacity on the resolver and authoritative servers determines which party will be the ‘victim’ of the NXNSAttack because one of them gets overloaded first. As long as the capacity is sufficient, servers will continue to operate just fine, possibly making one of the parties virtually unaffected, and in the absence of appropriate monitoring, oblivious to the attack.

Unfortunately, NXNSAttack abuses the very basic principle of the DNS protocol, which practically means there is no fix, only mitigation. Luckily researchers followed responsible disclosure protocol and allowed vendors to implement and release mitigation before making the attack public.

NXNSAttack is a special case of a well-known random subdomain attack, so mitigation approaches fall into two categories: specific for NXNSAttack and generic for random subdomain attacks.

NXNSAttack mitigation

Unlike traditional random subdomain attacks, NXNSAttack queries are generated by the resolver itself. This difference allows vendors to implement simple mitigation techniques like limiting the of number names resolved when processing a single delegation.

The obvious advantage is that it is simple, at least in theory.

The disadvantage of mitigation based on counters is that it requires vendors to invent arbitrary limits not based in the DNS protocol specification, basically determining the maximum packet amplification factor. At the same time, these arbitrary limits might break resolution for some domains because they put additional limits on the resolution process.

This is a very practical problem because recently published research estimates that 4% of second-level domains (example.com.) have a problem in their delegation from the top-level (com.), so any change that adds arbitrary limits to retries during the resolution process has to be weighed very carefully.

In upcoming days we will see how successful vendors were in determining their magic numbers and if they get away without breaking any major domains.

Generic random subdomain attack mitigation

Any random subdomain attack, NXNSAttack included, generates random query names to bypass the DNS cache. Generic mitigation has to prevent attackers from bypassing the cache — and luckily we already have the technology to do that!

Aggressive Use of DNSSEC-Validated Cache (RFC 8198) uses DNSSEC ‘metadata’ in the form of NSEC(3) and RRSIG records to generate negative answers without the need to contact authoritative servers. How does that work? First, let’s have a look at example NSEC records:

$ kdig +dnssec @l.root-servers.net example.
;; ->>HEADER<<- opcode: QUERY; status: NXDOMAIN; id: 55933
;; Flags: qr aa rd; QUERY: 1; ANSWER: 0; AUTHORITY: 6; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: do; UDP size: 4096 B; ext-rcode: NOERROR

;; QUESTION SECTION:
;; example. IN A

;; AUTHORITY SECTION:
events. 86400 IN NSEC exchange. NS DS RRSIG NSEC
events. 86400 IN RRSIG NSEC 8 1 86400 20200531170000 20200518160000 48903 . bWcSkQHURJGO...
. 86400 IN NSEC aaa. NS SOA RRSIG NSEC DNSKEY
. 86400 IN RRSIG NSEC 8 0 86400 20200531170000 20200518160000 48903 . Ru23msHh23...
. 86400 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2020051801 1800 900 604800 86400
. 86400 IN RRSIG SOA 8 0 86400 20200531170000 20200518160000 48903 . pIolh2KxjZbgtwuePLA4...

We sent the DNS query example. A to one of the root DNS servers, and it answered back with an NXDOMAIN answer, indicating the name does not exist. At the same time, we received two proofs-of-nonexistence in the form of NSEC records (and their DNSSEC signatures in the RRSIG records).

The first NSEC record events. 86400 IN NSEC exchange. NS DS RRSIG NSEC means that root zone contains domain events. with record types NS DS RRSIG NSEC, and more importantly, there are no domains in between names events. and exchange.

The second NSEC record . 86400 IN NSEC aaa. NS SOA RRSIG NSEC DNSKEY means that the root zone contains DNS root . (surprise!) with record types NS SOA RRSIG NSEC DNSKEY, and also that there are no domains in between names . and aaa.. This proves there is no wildcard record *. and thus NXDOMAIN is really the correct answer to query example. A.

Each of the records has time-to-live specified as 86,400 seconds. This allows resolvers to synthesize NXDOMAIN answers for any queries falling into indicated ranges (. – aaa., events. – exchange.) for one full day, effectively cutting traffic towards authoritative servers.

As a consequence, querying DNS zones that contain N names at random will populate a resolver’s cache in roughly O(N) answers. In other words, the cost of eliminating random subdomain attacks between DNSSEC-validating resolvers and authoritative servers for the duration of TTL is linear with the number of names in the target DNS zone. It works surprisingly well even for large zones with one million domains in them — pretty charts about this setup can be found in my older presentation (from 2018).

What next?

First of all, upgrade your DNS resolvers to get at least some NXNSAttack mitigation.

Once the dust settles please consider deploying DNSSEC on authoritative servers, and also on DNS resolvers.

Aggressive Use of DNSSEC-Validated Cache (RFC 8198) limits the impact of random subdomain attacks. It is already implemented in Knot Resolver. Unbound also has partial support (NSEC only) and BIND has a prototype as well. If your DNS resolver vendor does not offer it currently ask for the feature and stop random subdomain attacks once and for good!

If you are not used to speaking to your DNS software vendor, please fill in the cross-vendor survey.

Adapted from original post which appeared on the CZ.NIC Blog.

Petr Špaček devotes his professional life to DNS and leads the Knot Resolver project at CZ.NIC labs.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.