Increasingly, many organizations are employing multiple distinct providers to cooperatively operate their Domain Name System (DNS) service. This allows them to survive the complete and catastrophic failure of any single provider, with no downtime. If these organizations are additionally deploying DNS Security Extensions (DNSSEC — a system to verify the authenticity of DNS data), this can pose some challenges depending on the specific features in use.
The traditional method of multi-provider operation is the zone transfer model, where the zone owner runs a backend master (primary) server that holds the authoritative copy of the zone data and pushes out the zone data to each provider.
Zone transfer model limitations
The zone transfer model can support DNSSEC just fine, has been deployed in the field successfully, and is well understood. DNSSEC signing happens on the backend primary server and the signed zones are then transferred to providers via the DNS zone transfer protocol, typically authenticated via transaction signatures (TSIG — a channel protection mechanism).
However, one notable limitation of the zone transfer model is that it can only support ‘standardized’ DNS features that can be represented as resource record types in a DNS zone and can be predetermined at the primary server. Non-standard and often dynamic features that are now widely used in the industry cannot be supported by this model. These non-standard mechanisms go by many names but are commonly referred to by the term ‘traffic management’.
This traffic management includes things like Global Server Load Balancing (GSLB), failover pools with health checks, weighted answers, and even potentially arbitrarily programmed responses. Often the responses are querier specific or dependent on inspecting some sort of dynamic state in the network. Typically, the response and associated DNSSEC signature must be determined at the authoritative servers themselves, at query time.
DNS tricks or evolution?
More than a decade ago, Paul Vixie famously critiqued these features in his 2009 article titled What the DNS is not. He offered arguments that ‘DNS tricks’ make the DNS more complex, and therefore harder to debug and troubleshoot. He also argues that these features should have been implemented in a different layer, and represent a form of (unfair) cost shifting from certain application providers to DNS infrastructure operators.
I agree with many of these arguments, yet we cannot ignore the reality of how pervasively these mechanisms are already deployed and used today. Most organizations that run a CDN edge network often gravitate to their use. So, this is a long-standing tussle that is here to stay, and DNSSEC deployment needs to grapple with this reality too.
Furthermore, at the 2018 ICANN DNS Symposium, none other than Paul Mockapetris — the inventor of the DNS — remarked that ultimately the DNS should evolve to store programs, not just data. To do this properly, significant revisions to the DNS protocol specifications are needed. To some extent, such evolution is already happening with many of these dynamic traffic management features, albeit in an unstandardized and haphazard manner.
An alternative model that can support dynamic traffic management features is one where the zone owner updates DNS data at the DNS provider not using zone transfer, but with some other provider-specific API (often REST/HTTPS) or portal. The provider then signs the zone data themselves. This model can support non-standardized DNSSEC features, if the provider is able to sign the response data generated by these features. The easiest way to do this is ‘online signing’ or ‘on-the-fly signing’. Alternatively, if the provider enumerates all the possible response sets in advance, they can pre-compute all their signatures and algorithmically return the correct data set and associated signature at response time.
This model may be attractive to organizations even if they aren’t using non-standard features because they can forgo the task of running their own backend hidden primary DNS server infrastructure if they don’t have the expertise (or don’t want to).
Extending this scheme to multiple providers involves some challenges. Here, multiple providers still cooperatively serve the same DNS zone data, but each provider independently signs that same zone with their own DNSSEC private keys. To make such a configuration work, some new key management mechanisms are needed. The main requirement here is that we need to configure and maintain the contents of the DNSKEY and Delegation Signer (DS) record sets in such a way that validation of responses is always possible, no matter which provider’s servers you query and get the response from.
Details of this scheme are published by the Internet Engineering Task Force (IETF) in a relatively new specification, RFC 8901. The main idea is that each provider must import the zone signing (public) keys (ZSK) of the other providers into their respective DNSKEY record sets. Why? So that problems won’t arise when a resolver caches the DNSKEY RRset for a zone from one provider, but subsequently gets a DNS answer for another query from another provider. The latter answer will be signed by a zone signing key that was not present in the previously cached DNSKEY RRset, and thus can’t be authenticated. Section 3 of the Multi-Signer specification elaborates on these potential problems.
DNS provider APIs need to be extended to support the new key management functions, for example, to cross import ZSKs to and from external providers.
Two models are described. They differ in how the Key Signing Keys (KSK) and Zone Signing Keys (ZSK) are managed and coordinated. The descriptions below are slightly simplified for ease of understanding — they talk about single KSKs and ZSKs. In theory, the zone owner and providers could have KSK and ZSK ‘sets’ if they are pre-publishing keys or rolling them over, for example.
Model 1: Common KSK, unique ZSK per provider
In Model 1, the zone owner solely holds the KSK, and each provider maintains their own ZSK, as shown in Figure 1. Alternatively, the zone owner could designate one of the providers as the holder of the KSK.
The zone owner retrieves the ZSKs from the providers, builds a common DNSKEY set, signs it, and distributes it back to the providers. Each provider signs all other zone data (apart from the DNSKEY set) with their own ZSK private key. ZSKs are cross imported across the providers. The zone owner publishes the DS record set that references the KSK in the parent zone.
Model 2: Unique KSK and ZSK per provider
In Model 2, each provider has their own unique KSK and ZSK. The DNSKEY set for the zone is signed independently by each provider with their respective KSKs. Zone data is also signed independently (as in Model 1) by each provider’s respective ZSK.
As in Model 1, ZSKs are cross-imported in the DNSKEY RRset of each provider. The zone owner manages the common DS record set in the parent zone, which in this case, references each of the provider’s KSKs.
Note that Model 2 can also support Combined Signing Keys (CSK). In such a configuration, any or all providers could employ a CSK. The DS record would reference the provider’s CSK instead of KSK, and the public CSK (rather than the ZSK) would need to be imported into the other providers. Model 1 is not compatible with CSKs because the zone owner would then hold the sole signing key, and providers would not be able to sign their own zone data.
DNSSEC Key Rollovers are a bit more complex in Multi-Signer models, since they involve coordinated actions on the part of providers and zone owners. For a detailed treatment of this topic, please consult the Multi-Signer specification.
Compatibility of traffic management features
Can proprietary traffic management features really be deployed in a compatible manner across distinct DNS providers? After all, these features are often touted as a ‘secret sauce’ or a differentiating feature of a specific provider. This may be true in a general sense, but there is a set of baseline capabilities, like Global Server Load Balancing (GSLB), failover pools with health checks, and similar, that most providers implement and can be successfully deployed. For fine-tuned (as opposed to coarse grained) load balancing across providers, augmented systems may be needed in some cases. But many commercial providers already sport generic APIs that can talk to a variety of external systems and data feeds and integrate that information into their response decisions.
A related question is whether we should attempt to standardize these traffic management features, so they can work with the zone transfer model, or at least be implemented across providers in a uniform way. This would require encoding them in newly defined DNS resource records that also include instructions for programmatically generating responses from the record data. Some discussions with commercial DNS vendors on this topic have indeed happened in the past, but currently this idea does not seem to have much support.
The Multi-Signer protocol is quite new but has generated some interest and early implementations. We’ve deployed successful lab and prototype configurations several times, including at an IETF Hackathon. NS1 has implemented the first model and is working on implementing the second, while Neustar, Cloudflare, and Godaddy have all expressed interest. A variety of open source DNS software implementations already support some of the API functions to enable Multi-Signer DNSSEC, and we are working with many of them to fill functionality gaps where they are still present.
Transfer of signed zones across operators
These models have applicability beyond large organizations deploying DNSSEC across multiple providers (the most expected case). Importantly, non-disruptively transferring a DNSSEC signed zone from one DNS operator to another operator can be accommodated by Model 2 (Unique KSK and ZSK per provider). In fact, operator handoff is just a transitional state of a Model 2 Multi-Signer configuration. More details of this use case, along with mechanisms to fully automate a Multi-Signer configuration — including automation of DS updates — will be discussed in an upcoming blog article.
Shumon Huque is a software engineer and technologist at a large American cloud computing company.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.