The amount of activity in the DNS at the IETF seems to be growing at every meeting. I thought the best way to illustrate the considerable body of DNS work being undertaken at the IETF these days would be to take a snapshot of DNS activity that was reported to the DNS-related Working Group meetings at the recent IETF 110.
DNSOP — DNS Operations
DNSOP is the general Working Group for most DNS topics, and has been for some years. This group carries most of the ‘core’ DNS activity for the IETF. The topics considered at IETF 110 are as follows.
RFC 8976 — Zone Digest
It only took some 35 months and 22 revisions of the Internet draft to get this specification published. The concept is simple: It’s a resource record that contains the message digest of the DNS zone file. The specification has been crafted very carefully and reflects the extraordinary levels of attention given these days to changes in the DNS repertoire that impact the content of the root zone of the DNS.
In this case, it’s an adjunct to the DNSSEC security framework for DNS. DNSSEC allows individual entries in a DNS zone to be signed, allowing a receiver to verify that the response they received is authentic, but does not provide a similar level of protection for an entire zone file. Zone Digests fill this space. This is helpful in the work of smearing the root zone (and potentially other zones) over a broader surface, allowing the DNS to scale under increasing growth pressure by using the so-called ‘Hyperlocal’ approach to locally serving the root zone (RFC 7706).
DNS server cookies
The DNS has always represented as good an attack vector as is possible to construct. The DNS protocol is universally supported, it uses UDP and has the ability to penetrate most firewall filter configurations.
The conventional response to an attack in the DNS is to pull down the shutters and limit the UDP response rate. Unfortunately, this crude response penalizes legitimate DNS transactions as well as malicious ones. Cookies are one approach to mitigate this blanket response. A server can identify repeat clients and potentially give them preferred service. On the client, cookies can be used to distinguish between expected responses from a DNS server and spoofed responses.
YANG types for DNS classes and resource record types
When SNMP gathered favour in the IETF it was the practice to define MIBs for just about everything. We’ve moved on and now YANG (RFC 7950) has become the de facto standard as a language for modelling configuration and state data, as well as specifying management operations and asynchronous notifications. It is reasonable to expect that the approach, based on utilizing such data models, can be effectively used in DNS operations using tools such as NECONF or RESTCONF to manage servers, resolvers, and zone data.
Query Name Minimisation
RFC 7816 is an experimental specification that proposes a change to DNS resolvers to withhold extraneous information when performing a name resolution. Specifically, it uses a truncated query name at each point in the name resolution process, only exposing the next level label to each server. In response to DNS privacy concerns, this approach has been widely implemented, although there are some differences between observed resolver behaviour and the experimental specification in RFC 7816. The Working Group is working on a standard specification.
The changes in this include an upper bound of the number of successive additional single label queries, and a more general way to deal with names with a large label count. The document also considers the query type to use in these queries and suggests that while any query type can be used — where the authority always lies below the zone cut — a good type to use is QTYPE=A, because of the potential to blend into editing query streams. This matches current resolver behaviour.
NSEC and NSEC3 TTLs
The DNS is a surprising complex specification these days. There is an attempt to illustrate this complexity, exposing the document dependency state in November 2017. Obviously, more have been added since then!
It’s no surprise that at times the documents contain misleading or contradictory guidance, and one area where this is problematic is in the caching of denial of existence records (NSEC and NSEC3). As the working draft states:
“Due to a combination of unfortunate wording in earlier documents, aggressive use of NSEC and NSEC3 records may deny names far beyond the intended lifetime of a denial. This document changes the definition of the NSEC and NSEC3 TTL to correct that situation. This document updates RFC 4034, RFC 4035, RFC 5155, and RFC 8198.”
This is not the first such clarification update, and definitely won’t be the last in the DNS.
Service binding in the DNS
Work continues to change the DNS from a simple name resolution mechanism to a generalized service rendezvous tool.
The Service Binding Record (SVCB) provides both a name alias facility and connection parameters in the DNS. SVCB records allow a service to be provided from multiple alternative endpoints, each with associated parameters (such as transport protocol configuration and keys for encrypting the TLS ClientHello in TLS 1.3). Much of the motivation here lies in the world of hosted content and the requirements of Content Distribution Networks (CDN).
It’s still early days, so it’s not clear if this service record will be widely adopted, but it has the potential to significantly improve the connection experience for clients and make improvements in the performance of hosted content for CDNs, allowing greater diversity of hosting solutions for content publishers. It allows a richer set of control parameters to be used between content and service providers, content, and service publishers, and equally allows the client to be provider with a set of connection parameters prior to the connection.
So, if common sense and a shared motivation to improve the speed and versatility of connection establishment drives adoption then this particular refinement of the DNS deserves to be adopted universally. However, it is a complex record, and our experience in populating the DNS with simple address records have not been all that inspiring.
The DNSOP Working Group will be doing a Last Call review of this document, and there is some expectation that it will head to IETF review in the coming months.
DNSSEC and delegation-only zones
There are some zones in the DNS that do not contain terminal labels but have delegated zones. These ‘delegation only’ zones have presented some ambiguities for implementation of QName Minimisation, where NXDOMAIN responses were being generated in error.
The more subtle issue here relates to DNSSEC, as delegation records in the parent zone are not DNSSEC-signed and the possibility exists for unauthorized manipulation of these unsigned records. While the root and most TLD zones are assumed to be exclusively delegation-only zones, there is currently no interoperable and automatable mechanism for auditing these zones to ensure they behave as a delegation-only zone. This specification defines a mechanism for delegation-only zone owners to create a DNSKEY that indicates it will only delegate the remainder of the DNS tree to lower-level zones. This allows for easier delegation policy verification and logging and auditing of DNS responses served by their infrastructure.
DNS and TCP
The DNS has a love/hate relationship with its transport protocols.
UDP is incredibly efficient and cheap, but at the same time admits a large set of issues with abuse as well as clunky adaptation mechanisms to cope with large data bundles. TCP can cope with large transactions with ease and can act as a counter to abuse, based on IP address manipulation in queries. So why doesn’t the DNS shift to TCP?
The web seems to cope with massive volumes of data transactions and it’s all TCP (or QUIC). So why not shift the DNS to use TCP all the time? The major issue is cost. TCP is a higher-overhead protocol, and while it is possible to think about a DNS infrastructure based entirely on TCP, there is a cost to be paid in terms of infrastructure and in terms of performance times. So, we use this rather clunky approach where the current default mode is to use UDP for DNS queries and responses and then shift to use TCP only when the response has the truncation flag set, triggering the client to re-query using TCP.
There has been a lot of mythology about the true capabilities of UDP in the public Internet. The introduction of EDNS(0) and a suggested default UDP buffer size setting of 4,096 octets gave the impression that UDP could be used for almost everything in the DNS, and TCP was unnecessary. This was an act of bravado unsupported by data, and the issue of fragmentation and packet loss has been very prominent in our thinking on this topic, in recent years.
The issue with UDP is twofold:
- UDP admits a family of amplification and reflection attacks because of the very nature of UDP. Like it or not, RFC 2827 as published over 20 years ago, is simply not widely deployed and source address spoofing is still possible in many, if not most, parts of the public Internet.
- UDP uses IP fragmentation to carry large payloads and IP fragmentation is filtered in many parts of the Internet. Delivery failure of fragmented IP packets appears to impact around one quarter of all Internet users.
If we want to continue to have the efficiency dividend of UDP in the DNS, then we must do something about both these issues. There has been much work on DNS attack mitigation, while the work on fragmentation is only gathering pace more recently.
DNS Flag Day 2020 is one possible approach, and it relies on the use of the EDNS(0) buffer size setting to limit the maximal response size that will be used by UDP in DNS servers. A more general discussion of measures that can avoid UDP fragmentation, including reducing the number of name servers, changing the DNSSEC signing algorithm to an Elliptical Curve algorithm, and use of the IP DONT FRAGMENT flag are all considered in a current Working Group document, titled Fragmentation Avoidance in DNS.
We’ve possibly overreacted with this work, shifting the DNS to use UDP in a ‘fragmentation-safe’ manner by avoiding sending fragmented UDP responses at all. This measure imposes a cost on all to improve the service for a minority of end points. Admittedly, it’s a big minority of around a quarter of users but nevertheless, the solution of cutting over to TCP before the onset of UDP fragmentation has its own associated inefficiencies.
Obviously, this issue is not solved, and the conversation will continue. This document has rather modest goals, namely, to remind everyone that TCP is a part of the DNS and setting up firewall rules to block the use of port 53 over TCP is ill-advised.
DNS delegation records
Whenever information is duplicated in distributed systems there is always the problem of which source takes precedence when the sources vary from each other. Both the parent and the child list the delegation records for the child zone. RFC 1034 provides the direction that: “The administrators of both zones should ensure that the NS and glue RRs that mark both sides of the cut are consistent and remain so” but fail to provide clear guidance on what to do when they differ.
In the DNS, the child NS records are authoritative. On the other hand, the parent has the ability to re-delegate the zone to a different child server, or even remove the delegation completely. So, what NS set should a recursive resolver cache?
Evidently, there is wide variability in the way that DNS resolvers handle delegation records. Some resolvers prefer to cache the parent NS set, and some prefer the child set. For others, what they preferentially cache depends on the dynamic state of queries and responses they have processed.
This document aims to bring more commonality and predictability by standardizing resolver behaviour with respect to delegation records:
“When following a referral response from an authoritative server to a child zone, DNS resolvers should explicitly query the authoritative NS RRset at the apex of the child zone and cache this in preference to the NS RRset on the parent side of the zone cut. Resolvers should also periodically revalidate the child delegation by re-querying the parent zone at the expiration of the TTL of the parent side NS RRset.”
What is the difference between a ‘Domain Name’ and a ‘Label’? Or the difference between a ‘Stub Resolver’ and a ‘Recursive Resolver’? The DNS has invented its own terminology, and it’s reached the point where a list of DNS terms and their definition is very useful. This is an ongoing effort and is intended to replace RFC 8499 (which, in turn, replaced RFC 7719) as we invent new terms and subtly redefine existing terms. The current working draft of DNS terminology can be found here.
With so many documents being considered by this Working Group, it’s not surprising that a few seem to have expired, presumably temporarily.
Recommendations for DNSSEC resolvers operators
This is an interesting informational document, that attempts to clarify the responsibilities of DNS resolver operators when the resolver performs DNSSEC validation, as well as recommended some operational practices. The document appears to reflect upon some of the issues that were found in the 2018 root key roll, in relation to the management of the DNSSEC trust anchor material, which is helpful.
Glue in DNS referral responses is not optional
This document asserts that there is a widespread misbelief that all additional section records in a DNS response are optional. This document states that NS RRs are to be placed in the additional section of a DNS referral response, and glue RRs added if the addresses are not available for the authoritative data or the cache. The subtle change added to the specification in this draft is to require setting the truncation bit if the glue RRs cannot fit into a UDP response.
This is the latest contribution in a long and messy story. I’ll try to give an all-too brief personal synopsis here. A long time ago (late ’90s), the IETF at the time came to the realization that it was singularly ill-equipped to deal with the myriad of complex issues with determining the policies associated with the distribution of names, after a bruising episode with an IAB-sponsored International Ad Hoc Committee, which was supposed to deal with the issues of access to generic top-level domains.
The mess was passed over to the fledgling ICANN body, and IETF promised to take a step back from the issues of names and name policies and strictly deal in technical matter thereafter. ICANN headed down an economic rationalist path, and the generic name space was effectively monetized. Some parties felt excluded by this shift but there was little they could do, other than simply use as-yet unassigned DNS labels.
Over time, memories and commitments fade and the IETF published RFC 6762 in 2013, and unilaterally assigning the top-level domain (TLD) ‘.local’ for a ‘special use’, based on Apple’s multicast DNS technology. Soon after, the TOR folk submitted to the DNSOP Working Group a similar case to have ‘.onion’ declared a ‘special use’ TLD, and, predictably, a whole line of hopefuls formed behind them. The IETF subsequently backed away from this approach and, once more, the only alternative to the ICANN process was simply self-allocation of a TLD.
One line of thought in response was that rather than a ‘wild west’ of ranchers and squatters clashing over names, it might be helpful to follow the lead of the address system and declare a TLD label as a distinguished ‘uber’ TLD where self-assignment of second-level domains (2LDs) in the name space was not only permitted but encouraged.
But where to make such a case? ICANN? Or the IETF? Or both? As it has turned out ‘both’ has been the response, but with quite different approaches.
In the ICANN world, the Security and Stability Advisory Committee (SSAC) came out with SAC113, recommending that: “ICANN reserves a string specifically for use as a private-use TLD for namespaces that are not part of the global DNS, but are meant to be resolved using the DNS protocol.”
At the same time, a proposal was submitted to the DNSOP Working Group suggesting to use the ‘available’ (technically, ‘user-assigned’ classification points) of the two-letter country code registry, maintained by ISO in the ISO3166-1 registry. In other words, proposing to squat on available space in the two-letter code space rather than reserving a particular TLD for private use.
Where to from here is a confusing question. It would seem that if the DNSOP Working Group were to submit this as an RFC then it would appear to be counter to previous IETF commitments to stay out of this area of name allocation policy and weaken the IETF’s commitment to allow ICANN a free hand in the task of such name policies. On the other hand, the proposal for ‘free’ TLDs in the ICANN space appears to be counter to the entire stance of ICANN in imposing a uniform policy framework on the assignment of top-level domain names, undermining its own policy process.
At least one cynic has observed: Watch this space. Bring popcorn!
NSEC3 records have always been one of the more challenging areas of the DNSSEC secure zone framework. The problem was to create verifiable assertions of non-existence in a zoner, and the NSEC approach was to order the labels in a zone and then sign across the ‘strides’, asserting that no names between two adjacent labels existed.
The domain name business is odd, and it turns out that the names that don’t exist are probably more valuable commercially than those that do for many zones. If you can use NSEC to enumerate all the names that exist in a zone, then it’s clear that all the other names don’t!
How to prevent this form of zone enumeration? The approach used by NSEC3 is to hash all the names in a zone using a SHA-1 hash and create striding non-existence records using the order of the hash of the names. Zone enumeration is harder.
Now, SHA-1 is not the most robust of hash algorithms and there has been some concern that this was just making life harder for both zone signing and response validation while not imposing a major barrier to efforts to enumerate the zone. For this reason, NSEC3 has added a NSEC3PARAM to the zone file, specifying the number of hash iterations to apply, and a salt value, with the intention of making reverse engineering the hashed name set more challenging. While these attributes are commonly used, do they make it harder to reverse the hash, or do they simply add cost to the signer and the validator without any appreciable benefit?
The Guidance for NSEC3 parameter settings document suggests that both measures are ineffectual in most cases and the integration value should be zero, and no salt value should be used.
Clarification of clarifications
With so many specification documents on the DNS you’d think that it’s all nailed down and there is little latitude for creative interpretation of the specification. This is not the case, and many aspects of the DNS would benefit from some further clarification. One of those topics is the ‘priming query’ used by resolvers when booting to obtain the current root zone servers. This was already the subject of RFC 8109, published in 2017, but it seems that there is more to say about the priming query, and this draft proposes some further clarification on the priming query.
These days, the default mode of operation is to pass your domain to somebody to ‘host’ it for you. You can take advantage of a DNS provider to create a server network for your domain across a distributed cloud of servers, in anycast configuration, with aggregate capacity in the service platform to withstand many forms of hostile attack.
But how can you support multiple platform operators without going into contortions over key management?
RFC 8901 discussed several ways multiple operators can provide service to a signed zone. But there was at least one question that was not addressed in this document, namely the way to change providers in a secure manner. This document outlines a number of scenarios for operators being added, or leaving, a zone and the way CDS/CDNSKEY and CSYNC records can be managed to effect a secure change in the operators of a zone.
The repertoire of DNS error code is a rather sparse set of codes that are relatively unhelpful in terms of understanding the basic cause of the issue. This draft proposes adding a URI to the EDNS(0) option to further elaborate on the reasons for generating a DNS error code.
Frankly, I’m unconvinced about the value of this approach. Despite domain names being constructed on strings of human-friendly labels (except for IDNs, of course) the DNS is largely a machine-to-machine protocol and exposing a URI with potential explanations of the error condition strikes me as a fundamentally poor idea!
Error reporting in reverse
Normally, resolvers receive error codes from servers and are not expected to report errors back in the other direction. However, there are some circumstances where this is the best option, such as when an authoritative server is serving stale DNSSEC records. This document describes a facility that can be used by validating recursive resolvers to report errors in an automated way.
To report an error, the reporting resolver encodes the error report in the QNAME of the reporting query. The reporting resolver builds this QNAME by concatenating the extended error code, the QTYPE and QNAME that resulted in failure, the label “_er”, and the reporting agent domain. All this sounds like a re-run of RFC 8145, an approach that was intended to permit a recursive resolver to signal its trust anchors to a root server. This approach was not that all that useful for the KSK roll in 2018 and it appears to me that this is not sufficiently different to make the approach any more useful in this content.
DPRIVE — DNS privacy
The DPRIVE Working Group is working on the specification of technologies to improve the privacy of the DNS. Its work to date has included specifications of DNS over TLS (DoT), and DNS over DTLS. DNS over HTTPS (DoH) was completed in the now concluded DoH Working Group.
DNS over QUIC
The distinction between DoT and DoH can be a subtle one. In both cases, there is a TCP transport state between client and server and upon this TCP state is laid a TLS session. Within this session, DNS packets (queries and responses) are passed. The distinction is that in the case of DoH, an HTTP session is established, and the DNS query and response is framed as an HTTP GET or POST method (assuming that we are using HTTPS/2 and using a TCP/TLS substrate). There is a similarly subtle distinction between DNS over QUIC (DoQ) and DoH where the version of HTTPS is HTTP/3 and the transport used is QUIC.
This work looks at the base transport case, layering DNS packets directly into a QUIC transport socket. As the DoQ document points out, the DoQ mapping goals are to provide the same DNS privacy channel protection as DoT, including the option for the client to authenticate the server, to provide a superior level of source address validation for DNS servers as compared to UDP, avoiding IP fragmentation and UDP issues. In this case, the comparison is also DoH, but it’s a comparison with HTTP/3 with QUIC selected.
As with the existing DoT and DoH work, it’s looking at the stub-to-recursive environment. In most other respects, this is a conventional use of QUIC as an encrypted transport channel, passing DNS messages. The comparison between DoQ and DoT are pretty much the same as the comparison of TLS over TCP compared to TLS over QUIC over UDP. There are opportunities to bypass head-of-line blocking when using QUIC and the relatively conservative setting of the QUIC packet size is intended to bypass path MTU issues.
In the same way that DoT uses a reserved TCP port 853, DoQ is intended to reserve UDP port 8853.
Oblivious DNS over HTTPS (ODoH)
Channel encryption still admits one other party into the privacy circle: the recursive resolver at the other end of the query. While no others are able to eavesdrop on the conversation between a stub and recursive resolver, the recursive resolver knows the identity of the end system hosting the stub resolver and the queries coming from that end system. In terms of privacy, that eliminates one class of risks but still admits a major weakness of being forced to include one other into the query stream.
There have been several approaches to improve this situation. A general approach is the onion network (TOR), which attempts to obscure the link between end point identity and the content payload. A rather devious approach was written up as Oblivious DNS (the now expired draft), where the DNS query was itself encrypted and then passed through the normal DNS infrastructure to a target helper who deciphered the original query name, performed a recursive resolution for the name, and then passed the result back using a session key provided by the original querier. The limitation was in the query name length.
This approach uses a conventional DNS query but uses two-layer encryption. The DNS query and the session key are encrypted using the public key of the target ODoH server. This encrypted payload is then passed as an opaque payload using HTTPS to an ODoH proxy. The proxy’s task is to pass the encrypted query to the targets, and similarly, pass encrypted responses back to the client. As long as the proxy and target cannot conspire and share information then no party has knowledge of the end entity and the DNS queries (and responses) being made.
There is no clear consensus in the DPRIVE Working Group to adopt this document at the moment. However, it is an interesting experiment and there may be sufficient interest to pursue a path of an Experimental RFC.
Encryption between recursives and authoritatives
Encrypting the DNS query traffic between stub and recursive resolvers can address a large proportion of the DNS privacy agenda, as an onlooker can associate the identity of the end point with the DNS queries they make.
The case to encrypt the cache-miss queries between a recursive resolver and authoritative servers has a different privacy ‘value’. It is not possible to identify the end point of the query (unless Client Subnet has been turned on, which itself is a massive privacy leak). The costs are different in the recursive-to-authoritative scenario, as compared to stub-to-recursive. There is limited session reuse in so far as a recursive will query many authoritative servers so the overheads of setting an encrypted session may need to be amortized over a single query.
A draft has been authored on this scenario. There are some dubious assumptions that probably would not stand further scrutiny at present, such as: “Both parties understand that using encryption costs something but are willing to absorb the costs for the benefit of more Internet traffic being encrypted.”
The remainder of the document describes the process a recursive resolver may use to discover if an authoritative server supports queries over DoT and cache this capability for subsequent use. It is certainly a good demonstration that it ‘can be done’, but that’s not really the question for this particular technology. The question is more about incremental costs and benefits, and on this crucial topic, the draft is somewhat silent.
The opportunistic discovery method described in this draft is a ‘try it and see’ that records the success, or otherwise, of an attempt to set up a DoT session with an authoritative server. Another draft advocated use of the SVCB method, using the DNS to convert the capability (and willingness) to support TLS connections.
Other DNS activity at IETF 110
That’s not quite the full picture as there is further work in the Adaptive DNS Discovery Working Group (ADD), addressing issues of discovery and selection of DNS resolvers by DNS clients in a variety of networking environments, supporting both encrypted and unencrypted resolvers.
There is also the Extensions for Scalable DNS Service Discovery Working Group (DNSSD), looking at automation of service provisioning by using the DNS.
Further afield, there is the Home Networking Working Group (homenet), looking at name resolution and service discovery in edge (home) networks.
That’s a lot of DNS, and there is every expectation that the level of activity will continue at this level for IETF 111.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.