It’s time to consider avoiding IP fragmentation in the DNS

By Kazunori Fujiwara on 12 Jul 2019

There have been a number of research papers that have described effective DNS cache poisoning attacks using IP fragmentation. The concept was first presented by Amir Herzberg and Haya Shulman in Fragmentation Considered Poisonous in 2012, and then by Tomas Hlavacek in IP fragmentation attacks on DNS, at the RIPE 67 meeting in 2013.

A new paper, published in 2018 by Markus Brandt et al, Domain Validation++ For MitM-Resilient PKI, adds to the growing list of literature by describing how the group poisoned the full-service resolvers of several Certificate Authorities and successfully issued some certificates. The attack comprised two separate attacks. One was to the path MTU discovery, and the other was a DNS cache poisoning using IP fragmentation.

In this post, I will explain how these two attacks work and discuss the effectiveness of available countermeasures.

Details of attack to path MTU discovery

Off-path attackers can set a path MTU value from authoritative servers to victim full-service resolvers.

The above papers all showed that some implementations accept ICMP ‘fragmentation needed and DF set’ with small MTU values (less than 576 octets) and record specified values as path MTU values. Path MTU value can be decreased to 552 octets on Linux (3.13 or older) and may be decreased to 296 octets or lower on some servers (as described in ‘Domain Validation++ For MitM-Resilient PKI’).

In order to evaluate attacks to path MTU discovery, I wrote a small Perl program that generated crafted ICMP/ICMPv6 packets.

The crafted ICMP packets contained an IPv4 header, ICMP header (unreachable fragmentation needed and DF set with a small MTU value), inner IPv4 header (with large packet size) and an inner UDP header.

Figure 1 — How to generate crafted ICMP packets, DNS OARC 30.

The crafted ICMPv6 packets contains an IPv6 header, ICMPv6 header (packet too big with small MTU value), an inner IPv6 header (with a large packet size value), an inner UDP header, and a fill zero to end of packet, the size of which is specified by the path MTU size.

Figure 2 — How to generate crafted ICMPv6 packets, DNS OARC 30.

To evaluate the effect of the ICMP attacks, the ‘attacker’ would send crafted ICMP/ICMPv6 packets to the ‘victim servers’, then verify the result on the victim servers by using the following commands:

On Linux, "ip route get <IPaddress>"
On FreeBSD, "sysctl -o net.inet.tcp.hostcache.list"
 
----------------------------------------------------------------
Result of Linux 2.6.32
 
% ip route get 2001:503:ba3e::2:30
2001:503:ba3e::2:30 via 2001:503:ba3e::2:30 dev venet0  src 2001:2e8:602:0:2:1:0:9e metric 0
cache  expires 583sec mtu 1280 advmss 1440 hoplimit 0 features 8
% ip route get 203.178.129.44
203.178.129.44 dev venet0  src 183.181.168.158
cache  expires 597sec mtu 552 advmss 1460 hoplimit 64

%% the cache entry for target IP address should exist before attack on IPv4.
----------------------------------------------------------------

----------------------------------------------------------------
Result of FreeBSD 12
% sysctl -o net.inet.tcp.hostcache.list
net.inet.tcp.hostcache.list:
IP address        MTU SSTRESH RTT   RTTVAR CWND SENDPIPE …
2001:503:ba3e::2:30  1272 0 0ms      0ms 0 0 0   …
-----------------------------------------------------------------

The experiment showed that:

Linux 2.6 accepts crafted ICMPv4 ‘fragmentation needed and DF set’ for UDP, decreasing path MTU to 552 octets.
FreeBSD 12 and Linux 2.6 accept crafted ICMPv6 ‘packet too big’ for UDP, decreasing path MTU to 1280 octets (minimal IPv6 MTU value).

I also evaluated Linux 4.18.20, and checked the NetBSD source code — the results of the ICMP attack are as follows:

OS / source	crafted ICMPv4	minimal IPv4 MTU	crafted ICMPv6	minimal IPv6 MTU
[Brandt2018]	accept	552/296	unknown	unknown
Linux 2.6.32	accept	552	accept	1280
Linux 4.18.20	ignore?		accept	1280
FreeBSD 12	ignore		accept	1280
NetBSD	ignore		accept	1280

The experiment showed that:

BSD systems and newer Linux systems ignore ICMPv4 ‘fragmentation needed and DF set’ for UDP.
BSD and Linux systems accept ICMPv4 ‘fragmentation needed and DF set’ for TCP, and change path MTU for the matching TCP session.
Many BSD and Linux systems accept crafted ICMPv6 ‘packet too big’ for UDP, and decrease path MTU to 1280 octets.
Note, it is easy to set this remotely.

Details of DNS cache poisoning attacks using IP fragmentation

DNS cache poisoning attacks using IP fragmentation are performed by using a combination of path MTU attacks, which target authoritative DNS servers, and cache poisoning attacks, which target full-service resolvers.

An attack is performed as per the following steps — the path MTU attack is performed at step 3 and the cache poisoning attack is performed at steps 4 to 6.

Choose a victim full-service resolver and target the domain name.
Get the correct response from the authoritative servers of the target domain name.
Send a crafted ICMP/ICMPv6 packet to the authoritative servers of the target domain name. The crafted ICMP packet indicates a small path MTU size from the authoritative server to the victim full-service resolver. If control of the path MTU is successful, proceed to the next step.
Generate second fragments from the correct response retrieved at step two with specified path MTU size, and calculate the partial checksum value of the second fragment. Generate the crafted second fragment that has the same partial checksum value. (If the partial checksum value of the correct second fragment and the partial checksum value of the crafted second fragment are the same, the UDP checksum value is the same.)
Send the trigger query (target domain name / type) to the victim full-service resolver.
Send the crafted second fragment to the victim full-service resolver with the assumed fragment ID (or all possible IDs, at most 65536 on IPv4).
If the victim full-service resolver accepts the crafted second fragment, the attack is successful.

The keys of the attack are:

The attacker can control the fragmentation.
The attacker can generate a second fragment that generates the same UDP checksum value as the original response.
The query source port and DNS ID field exist in the first fragment.
The reassembly process holds the received second fragment until the arrival of the first fragment (timing is not strict),
The IPv4 fragmentation ID field has only 16 bits.
Some IPv6 implementations use predictable fragment identification values (RFC 7739).

The probability of spoofing a resolver is described in Section 7.2 of RFC 5452. The DNS cache poisoning attack using IP fragmentation changes to P=1 and I=1 (source port and ID are in the first fragment and need not be predicted), and adds the number of fragment IDs as a denominator.

On IPv6, the attack does not change the probability because the IPv6 fragment ID field has 32 bits.

On IPv4, the attack changes the probability from 1/2^32 to 1/2^16 because the IPv4 fragment ID field has only 16 bits.

Countermeasures to cache poisoning attacks using IP fragmentation

There are a number of countermeasures that claim they are effective against cache poisoning attacks using IP fragmentation. These include:

Fragmentation Considered Poisonous [PDF 1MB], which was proposed to limit the EDNS requestor’s payload size to be smaller than the path MTU (1500) and reduce the maximal number of fragments cached. The successor paper, ‘Domain Validation++ For MitM-Resilient PKI’ however, made this obsolete by decreasing MTU to 552/292 octets.
Domain Validation++ For MitM-Resilient PKI [PDF 3MB] also proposed sending multiple queries and choosing the majority. However, this idea is rather complex, compared to using query by TCP.
IP fragmentation attacks on DNS [PDF 310KB] proposed to use DNSSEC and a small EDNS requestor payload size (1220/1232 octets). However, again, ‘Domain Validation++’ is more efficient at decreasing MTU to 552/292 octets.
T.Suzuki proposed [PDF 306KB] to use an EDNS0 limit of 512 octets. However, the proposal decreases DNSSEC performance, and some authoritative servers ignore EDNS0 limits and send fragmented responses. Then, the article proposes to avoid IP fragmentation in DNS.

To avoid cache poisoning attacks using IP fragmentation, full-service resolvers can set EDNS0 requestor’s UDP payload size to 1220 octets (minimal size defined by DNSSEC [RFC 4035]) and drop fragmented UDP responses related to DNS. When under attack, this results in a failure of name resolution.

There is an exception though: if authoritative servers are located on a network with a small MTU (smaller than 1280 octets), setting the EDNS0 responder’s maximum payload size fit to the MTU value or name resolution sometimes fails.

Under normal conditions, the EDNS0 requestor’s payload size is decreased to 1220. Some of the queries may be truncated and need to be retried using TCP. Otherwise, there are no performance problems. Under path MTU attacks, responses are fragmented and name resolution fails.

Other highly tolerant countermeasures include using TCP between full-service resolvers and authoritative servers because many cache poisoning attacks are based on UDP (there still may be performance issues); or, in the future, DNS over TLS/HTTPS between full-service resolvers and authoritative servers.

With regard to the former, to drop fragmented UDP responses related to the DNS, drop the UDP fragments before reassembling in stateful inspection. For example, on Linux, drop the first fragment, which is UDP source port 53.

[ iptables -t raw -A PREROUTING -m u32 --u32 "6&0xFFFF00FF=0x20000011&&18&0xffff=53" -j DROP ]

Or, drop the first fragment, which is UDP source port 53.

[ iptables -t raw -A PREROUTING -p udp -f -j DROP ]

and

[ ip6tables -A INPUT -p udp -m frag --fragfirst -m udp --sport 53 -j DROP ]

On FreeBSD, drop the second fragment, which is UDP.

[ ipfw deny log udp from any to me in frag ]

Survey of current fragmentation status

To evaluate the current status of popular domain names, I sent DNS queries to Alexa’s top 1M names (the name itself and prepended with “www.”, using qtype A and AAAA) using Unbound 1.8.3 with edns-buffer-size,max-udp-size 4096 or 1220, v4 only, DNSSEC validation enabled, and captured packets between full-service resolvers and authoritative servers.

QueryGenerator---Unbound--[capture]---Internet

(QueryGenerator retries queries once when there are errors)

Unbound, using EDNS0 size 4096 received 64,334 fragmented and 16,736,365 total responses. Around 2,438 IPv4 addresses sent fragmented responses. I assumed that the maximum packet size from each address was the path MTU size.

In this test, 2,379 (97.5%) addresses sent 1,500 octet fragment packets, 50 addresses sent fragments larger than or equal to 1280, and the remaining 8 addresses sent fragments smaller than 1280 octets. However, they all sent TCP packets larger than 1280 octets. So, all of Alexa’s 1M domain names have name servers with MTU >= 1280 or small responses (< 1500), or no response (not checked).

Upon evaluation, I found a strange behaviour related to the EDNS0 payload size. Eleven addresses ignored the EDNS0 requestor’s UDP payload size of 1220 octets and sent 1500 octet packets with fragments. These addresses may have problems with my proposal (EDNS0 size 1220 octets and drop fragmentation) because they ignore the EDNS0 size and generate fragments. This abnormality violates the EDNS protocol, and we need to fix it. Some of these strange addresses were fixed after OARC 30 presentation (May 12, 2019).

Summary of measures against DNS cache poisoning attacks

Path MTU discovery is vulnerable and fragmentation may cause protocol weakness. DNS cache poisoning attacks using IPv4 fragmentation are possible if authoritative servers run on old Linux systems. However, avoiding IP fragmentation at full-service resolvers is possible and there are countermeasures against such attacks.

It is said that the DNS is the biggest user of IP fragmentation. However, it is possible to avoid IP fragmentation because truncation and TCP works well. As such, in light of this experiment, it’s recommended that users consider avoiding IP fragmentation in the DNS.

Kazunori Fujiwara is a Senior Researcher at the Japan Registry Services Co., Ltd.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

One Comment

Jozef July 30, 2019 at 12:38 am

Hi,
shouldn’t that iptables/ipwf rule be with source port 53 filter option? ex.: `iptables -t raw -A PREROUTING -p udp –sport 53 -f -j DROP`
Cheers
Jozef

Reply ↓