In parts one and two of this series, we’ve seen that DNS fragmentation is real and can happen in the Internet. There are operating systems and DNS servers out there in the Internet that are vulnerable, and with the help of DNS fragmentation, malicious actors can make changes to DNS responses received by DNS resolvers. Without DNSSEC signing and validation, these changes will go unnoticed.
What can be done to mitigate these threads? The best solution is the deployment of DNSSEC: DNSSEC signed zones and DNSSEC validation on DNS resolvers. For various reasons, DNSSEC deployment is slow and might not be possible in all zones. So the question is whether there are alternative mitigation strategies.
To answer that question we tested several mitigation options in a lab environment. The lab environment was built as a scale model of Internet DNS deployment, with multiple root DNS servers, top-level domains (com., net., org., ccTLDs like fr, ch, at, de) and hundreds of second-level domains, all across multiple authoritative DNS servers (some with, most without DNSSEC to also simulate large DNS responses from DNSSEC-signed zones).
On the other side, we had hundreds of simulated DNS clients hitting a large provider-scale DNS resolver. As we’ve used similar setups in the past to test commercial provider DNS resolver installations at scale, we knew this setup could be used to measure the performance impact of the mitigation in a way that would also be seen in the real Internet. On the resolvers, different flavours of open-source DNS server software (BIND 9, Knot recursor, Unbound, and PowerDNS resolver) and commercial DNS software (Windows Server DNS) have been used.
The lab setup was also used to verify that the mitigation was indeed able to stop DNS fragmentation attacks. For that, we made sure that we could successfully perform a DNS fragmentation attack without the mitigation in place, and that the same attack was not possible anymore with the mitigation in place.
TCP, other than UDP, hardly suffers from IP fragmentation. A TCP-only DNS stack would not fall prey easily to DNS cache poisoning via IP fragmentation. But its immunity against IP fragmentation attacks comes at a price. Queries over TCP involve more work for both DNS resolvers and authoritative DNS servers and more work implies it might increase the latency of the DNS resolution.
Three different versions of TCP-only DNS communication between the DNS resolvers and the authoritative DNS servers have been tested:
- Vanilla TCP: TCP DNS connections as implemented in today’s DNS server software, opening (and closing) a TCP connection for each query.
- TCP keep-open: Using a specially patched version of Unbound (not publicly available) that kept the TCP session open for the most used upstream authoritative DNS servers, so skip the TCP handshake on DNS queries with these servers.
- TCP with TLS 1.3: TLSv1.3 is currently the most heavily optimized for the speed version of TLS. It has a lean handshake and requires zero Round-Trip-Time (RTT) when resuming a previous session. Though expected to be less performant than vanilla TCP, the question was, how much slower it would be, because the security gain using an encrypted layer might justify the performance loss.
Our measurements of the authoritative DNS server deployments show that a substantial number of servers do not offer DNS-over-TCP (even though that is required nowadays by DNS protocol standards; see part 1 of this series). Switching to a ‘TCP-only’ operation will break these deployments.
The idea behind ‘opportunistic’ TCP is to first try all DNS queries from a DNS resolver to the authoritative server via TCP. If the TCP query fails, retry the same query over UDP (today’s standard) and mark this server in the cache as ‘UDP only’.
As opportunistic TCP has not been implemented in any production DNS resolver today, we’ve implemented a (non-optimized) version of opportunistic TCP into the Unbound resolver.
Use UDP for small responses only, use TCP for the rest
Fragmentation can only happen in case the DNS response from the authoritative DNS server to the DNS resolver reaches a size that triggers fragmentation in the TCP/IP stack. The idea of this mitigation is to set the EDNS (extended DNS) signalling on both the authoritative DNS and the DNS resolver to only allow UDP responses below a certain threshold that prevents fragmentation. This threshold is 1,232 bytes, as this DNS payload size prevents fragmentation in ethernet networks for both IPv6 and IPv4.
Discard fragmented DNS responses
In this mitigation, we’ve used the host firewall nftables on Linux on the DNS resolver to throw away all DNS responses that came in as fragments. The main question of this test was how good would the DNS resolver recover from ‘lost’ DNS responses, as the fragmented responses would be thrown away and never reach the DNS resolver process.
All tested DNS resolvers will lower the EDNS UDP response size signalled to the authoritative DNS server in case of missing responses. This will, after a timeout, have the same effect as the previous mitigation. But how large is the delay experienced by a DNS client?
Discarding small fragments only
For most DNS responses, the UDP data is not large enough to cause fragmentation in ethernet networks. Malicious actions would need to artificially lower the MTU of the authoritative DNS server network to trigger fragmentation (see part 2 of this blog series for details). These attacks will create unusual small fragments, which could be detected and blocked in a firewall.
During our tests, we found that this mitigation has no real benefit over the previous one (discard all fragmented DNS responses) and that configuring a firewall to throw away only small responses is a burden for the firewall, and is also not (easily) configurable in (most) commercial firewall products.
Ignoring the additional section in DNS resolver
Malicious actors can alter the content of the second fragment of a fragmented DNS response. The first fragment will contain the DNS message header, the answer and (part of) the authority section. Spoofing will most likely happen in the ‘additional section’ of the DNS response. The idea of this mitigation is that if the DNS resolver could ignore the data in the additional section, the attacks would not be possible.
However, this is not how the DNS works. While most of the time, the data in the additional section is not necessary for the DNS protocol to work (it is delivered as a convenience and performance feature), there is one situation where the data inside the additional section cannot be ignored, and that is when the authoritative DNS server sends a DNS referral (telling the DNS resolver to find the data elsewhere in the DNS namespace tree). These referrals are crucial for the operation of the DNS and cannot be ignored. Unfortunately, DNS referrals are the one DNS response where an attack creates the most impact, as a spoofed referral can potentially redirect a whole delegated domain (including its children) onto an authoritative DNS server under the control of the attacker.
Because of this limitation, this mitigation has been ruled out and has not been tested. However, we kept it in the report (and in this post) as it is a mitigation often suggested as a countermeasure to fragmentation attacks.
Securing DNS communication with transaction signatures
Transaction signatures (TSIG) create signatures for symmetric keys used on both ends of a DNS communication. One interesting side effect of TSIG is that it adds an ever-changing bit of data to the end of a DNS message that an attacker cannot predict. The details of TSIG security to secure against DNS fragmentation have been documented by Marc Andrews of ISC in the Internet-Draft “Defeating DNS/UDP Fragmentation Attacks“.
This mitigation is currently not available in production-ready DNS software but can be simulated with BIND 9 using TSIG for all communication between two DNS servers (DNS resolver and authoritative DNS).
Results of the mitigation tests
All tested mitigations were able to stop a DNS fragmentation attack. But some mitigation had a negative impact on the performance, as shown in Table 1.
|TCP-only DNS||-30% / -50%|
|UDP for small responses only||-3.2% / +5.2% (depending on DNS resolver product)|
|Discarding fragmented packets||-3.7% / +4.7% (depending on DNS resolver product)|
|DNS using transaction signatures TSIG||+0.3%|
The TCP-based mitigation resulted in a large drop in performance, which makes it unlikely that these mitigations will be implemented.
The DNS communication secured with TSIG showed almost the same performance as unsecured DNS. But the use of TSIG for regular (non-DNS zone transfer and DNS update) DNS traffic would need to be implemented in DNS software before it could be used in the Internet. TSIG also requires a configuration change of both sides, the DNS resolver and the authoritative server, making it harder to deploy.
Limiting UDP to small DNS responses can be done with today’s DNS software and is already widely used. Early results from our study reinforced the changes implemented at DNS Flag Day 2020, which changed the default maximum DNS message size from 4,096 bytes to 1,232 bytes.
Depending on the DNS resolver software used, we’ve even seen a slight performance increase in the cases where the UDP DNS message size was restricted via EDNS (which also happens when discarding small fragments). This is likely because smaller EDNS message sizes result in less data sent in the additional section (if at all), which leads to less data processing on the DNS resolver side.
A nice property of this mitigation is that it can be deployed unilaterally, meaning either on the DNS resolver side, on the authoritative DNS server side, or both. Even if it is only implemented on one side of the communication, it already prevents the attacks.
Based on our findings, we recommend the following changes to operators of DNS resolvers and authoritative DNS servers to lower the risk of fragmentation-based DNS attacks:
- Deploy DNSSEC: DNSSEC not only prevents DNS fragmentation attacks, but also other kinds of DNS spoofing, including attacks not known today.
- Restrict the response size via EDNS settings on both DNS resolvers and authoritative DNS servers. See the DNS Flag Day 2020 webpage for instructions.
- Enable DNS-over-TCP: With the restriction of UDP DNS messages to a size of 1,232, it might happen that the DNS protocol will need to switch to TCP for certain communication. Make sure the authoritative DNS server will be reachable over TCP, and that the DNS resolver can communicate over TCP port 53 with the outside world.
- With the EDNS UDP size restriction in place, there should not be any fragmented DNS response seen on the DNS resolver. Or, the other way around, a fragmented DNS response seen must be an attack and can be rejected. The authors of the study had run a large provider infrastructure for many months where all fragmented IP packets had been dropped without any operational issues.
- Evaluate the security risks of running long-term supported operating systems: As we’ve discussed in part 2 of this blog post series, Linux operating systems running an older version of the Linux kernel are more vulnerable to fragmentation attacks. Sometimes operating systems get code changes that increase the security of the whole infrastructure, but these changes are not fixes for security vulnerabilities and therefore not backported to older versions of these operating systems. Operators running DNS services on long-term supported operating system versions are missing out on these security enhancements.
Read the full study, available for free in English.
Study contributors: Roland van Rijswijk-Deij (NLnet Labs), Patrick Koetter and myself (sys4), and Markus DeBrün and Anders Kölligan (BSI).
Carsten Strotmann is a DNS/DHCP/IPv6/Linux/Unix security trainer for Linuxhotel, Men & Mice, and Internet Systems Consortium (ISC).
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.