In June, I participated in a workshop organized by the Internet Architecture Board on the topic of protocol design and effect, looking at the differences between initial design expectations and deployment realities. These are my impressions from the discussions that took place at this workshop.
In this first of four posts, I’ll report on case studies of two protocol efforts and their expectations and deployment experience. These are the Border Gateway Protocol (BGP) and the security extensions to the DNS (DNSSEC).
Routing protocols have been a constant in the Internet, and BGP is one of the oldest still-used protocols. Some aspects of the original design appear to be ill-suited to today’s environment, including the general approach of session restart when unexpected events occur, but this is merely a minor quibble. The major outcome of this protocol has been its inherent scalability.
BGP is a protocol designed in the late 1980s, using a routing technology described in the mid-1950s, and first deployed when the Internet had less than 500 component networks (Autonomous Systems) and less than 10,000 address prefixes to carry. Today, BGP supports a network that is approaching one million prefixes and heading to 100,000 ASNs.
There were a number of factors in this longevity, including: the choice of a reliable stream transport in TCP — instead of inventing its own message transport scheme; the distance vector’s use of hop-by-hop information flow — allowing various forms of partial adoption of new capabilities without needing all-of-network flag days; and a protocol model that suited the business model of the way that networks interconnected. These days, BGP also enjoys a position of entrenched incumbent, which itself is a major impediment to change in this area, and the protocol’s behaviour now determines the business models of network interaction rather than the reverse.
This is despite the obvious weakness in BGP today, including aspects of insecurity and the resultant issue of route hijacks and route leakage, selective instability, and the bloating effects of costless advertisement of more specific address prefixes.
Read: Happy Birthday BGP
Various efforts over the past thirty years of BGP’s lifetime to address these issues have been ineffectual. In each of these instances, we have entertained design changes to the protocol to mitigate or even eliminate these weaknesses. However, the consequent changes to the underlying cost allocation model, or the business model or the protocol’s performance, are such that change is resisted. Even the exhortation for BGP speakers to apply route filters to prevent source address spoofing in outbound packets — known as BCP 38 — is now twenty years old. And, it is ignored by the collection of network operators today to much the same extent that it was ignored twenty years ago, despite the massive damage inflicted by a continuous stream of UDP denial of service attacks that leverage source address spoofing.
The efforts to secure the protocol are almost as old as the protocol itself, and all have failed. Adding cryptographic extensions to BGP speakers and the protocol in order to support verifiable attestations that the data contained in BGP protocol packets is, in some sense, ‘authentic’ rather than synthetic, impose a level of additional cost to BGP that network operators appear to be unwilling to bear. The issues of security itself, where it can only add credentials to ‘good’ information, imply that universal adoption is required if we want to assume that everything that is not ‘good’ is necessarily ‘bad’, only adds the formidable barriers of universal adoption and the accompanying requirement of lowest bearable cost, as every BGP speaker must be in a position or accept these additional costs.
We have not seen the end of proposals to improve the properties of BGP, both in the area of security and in areas such as route pruning, update damping, convergence tuning and such. Even without knowledge of the specific protocol mechanisms proposed in each case, it appears that such proposals are doomed to the same fate as their predecessors. In this common routing space cost and benefit are badly aligned, and network operators appear to have little in the way of true incentive to address these issues in the BGP space. The economics of routing is a harsh taskmaster and it exercises complete control over the protocols of routing.
If BGP is a mixed story of long-term success in scaling with the Internet and at the same time a story of structural inability to fix some major shortcomings in the routing environment, it is interesting to compare this outcome with that of DNSSEC.
DNSSEC was intended to address a critical shortcoming to the DNS model, namely through the introduction of a mechanism that would allow a client of the DNS to validate that the response that the DNS resolution system has provided is authentic and current. This applies to both positive and negative response, so that when a positive response is provided, this is verified as a faithful copy of the data that is served by the relevant zone’s authoritative name servers, and where a negative response is provided, then the name really does not exist in the zone.
We have all heard of the transition of the Internet from an environment of overly credulous mutual trust and lack of scepticism over the authenticity of the data we receive from protocol transactions that occur over the Internet to one of suspicion and disbelief, based largely on the continual abuse of this original mutual trust model. A protocol that would be clearly informed of efforts to identify when the DNS is being altered in various ways by third parties would have an obvious role and would be valued by users. Or so we thought. DNSSEC was a protocol extension to the DNS intended to provide exactly that level of assurance and yet, it has so far been a failure.
In terms of protocol design stories, failures are as informative, or even more so, as stories of success. In the case of DNSSEC, the stories of its failure stretch across its twenty years of progressive refinement.
The initial approach, described in RFC 2535, had an unrealistic level of inter-dependency such that a change in the apex root key required a complete rekeying of all parts of the signed hierarchy. Subsequent efforts were directed to fix this ‘rekeying’ problem. What we have today is more robust, and within the signed hierarchy rekeying can be performed safely. However, the root key roll still presents major challenges.
Every endpoint in the DNS resolution environment that performs validation needs to synchronize itself with the root key state as its single ‘trust anchor’. This use of a single trust point is both a feature and a burden on the protocol. It eliminates many of the issues we observe in the Web PKI, where multiple trusted Certificate Authorities (CAs) create an environment that is only as good as the poorest quality CA, which in turn destroys any incentive for quality in this space. Every certificate is equally trusted in that space. In a rooted hierarchy of trust, all trust derives from a single trust entity, which creates a single point of vulnerability and also creates a natural point of monopoly. It is a deliberate outcome that the root key of the DNS is managed by IANA in its role of trustee representing the public interest.
Yet even with this care and attention to a trusted and secure root, DNSSEC is still largely a failure, particularly in the browser space. The number of domains that use DNSSEC to sign their zone is not high, and the uptake rate is not a hopeful one. From the perspective of a zone operator, the risks of signing a zone are clearly evident, whereas the incremental benefits are far less tangible. From the perspective of the DNS client, a similar proposition is also the case. Validation imposes additional costs, both in time to resolve and in the reliability of the response, and the benefits are again less tangible.
Perhaps two additional comments are useful here to illustrate this point. When a major US network operator first switched on DNSSEC in their resolvers, the domain name nasa.gov had a key issue and could not be validated. The DNSSEC model is to treat validation failure as ground to withhold the response. So nasa.gov would not be resolved by these resolvers. At the time, there was NASA activity that had generated significant levels of public interest, and the DNS operator was faced with either turning DNSSEC off again or adding the additional measure of manually maintained ‘white lists’ where validation failure would be ignored, adding further costs to this decision to support DNSSEC validation in their resolution environment.
The second issue is where validation takes place. So far, the role of validation of DNS responses has been placed on the recursive resolver, not the user. If a resolver has successfully validated a DNS response, it sets the AD bit in the response to the stub resolver. Any man-in-middle that sits between the stub resolver and the recursive resolver can manipulate this response if the interaction is using unencrypted UDP for the DNS. If the zone is signed and validation fails then the recursive resolver reports a failure of the server, not a validation failure. In many cases (more than a third of the time) the stub resolver interprets this as a signal to re-query using a different recursive resolver, and the critical information of validation failure and the implicit signal of DNS meddling is simply ignored.
Surely there is a market for authenticity in the namespace? The commercial success of the Web PKI, which was an alternative approach to DNSSEC, appears to support this proposition. For many years while name registration was a low-value transition, the provision of a domain name certificate was a far more expensive proposition, and domain holders paid. The entrance of free certificates into the CA market was not an observation of the decline in value of this mechanism of domain name authentication but an admission of the critical importance of such certificates in the overall security stance of the Internet, and a practical response to the proposition that security should not be a luxury good but be accessible to all.
Protocol failure or market failure?
Why has DNSSEC evidently failed? Was this a protocol failure or a failure of the business model of name resolution? The IETF’s engagement with security has been variable to poor, and the failure to take a consistent stance with the architectural issues of security has been a key failure here. But perhaps this is asking too much of the IETF.
The IETF is a standardization body, like many others. Producers of technology bring their efforts to the standards body, composed of peers and stakeholders within the industry, and the outcome is intended to be a specification that serves two purposes. The first is to produce a generic specification that allows competitive producers to make equivalent products, and the second is to produce a generic behaviour model that allows others to build products that interact with this standard product in predictable ways. In both cases, the outcome is one that supports a competitive marketplace, and the benefit to the consumer is one based on the discipline of competitive markets.
But it is a stretch to add ‘architecture’ to this role, and standards bodies tend to get into difficulties when they attempt to take a discretionary view of the technologies that they standardize according to some abstract architectural vision. Two cases illustrate this issue for the IETF.
When Network Address Translators (NATs) appeared in the early 1990s as a means of forestalling address exhaustion, the IETF deliberately did not standardize this technology on the basis that it did not sit within the IETF’s view of the Internet’s architecture. Whatever the merits or otherwise of this position, the outcome was far worse than many had anticipated. NATs are everywhere these days, but they have all kinds of varying behaviour because NAT developers had no standard IETF specification of behaviour to refer to. The burden has been passed to the application space because applications that require an understanding of the exact nature of the NAT (or NATs that they are behind) have to also use a set of discovery mechanisms to reveal the nature of the address translation model being used in each individual circumstance.
The other case I’ll use is that of Client Subnet in the DNS. Despite a lengthy prologue to the standard specification, where the IETF indicated it did not believe that this was a technology that sat comfortably in the IETF’s overall view of a user privacy architecture (and therefore should not be deployed), Client Subnet has been widely deployed, and in too many cases has been deployed as a complete client identity. For the IETF, a refusal to standardize it on architectural grounds has negative consequences if the deployment of the technology occurs in any case, and a reluctant version of standardization despite such architectural concerns again has its negative consequences, in that deployers are not necessarily sensitive to such reluctance in any case.
Even if the IETF is unable to carry through with a consistent architectural model, why is DNSSEC a failure and why has the Web PKI model — the incumbent model for web security, despite its obvious shortcomings — succeeded? One answer to this question is the first adopter advantage. The Web PKI was an ad hoc response by browsers in the mid-1990s to add a greater level of confidence in the web. If domain name certificates generated sufficient levels of trust in the DNS (and routing for that matter) that the user could be confident the site on their screen was the site that they intended to visit, then this was a sufficient and adequate answer.
Why change it? What could DNSSEC use add to this picture?
Not enough to motivate adoption it would seem. In other words, the inertia of the deployed infrastructure leads to a first adopter advantage. An installed base of a protocol that is good enough for most uses is often enough to resist the adoption of a better protocol. And when it’s not clearly better but just a different protocol, then the resistance to change is even greater.
Another potential answer lies in centralization and cartel behaviours. The journey to get your CA into the trusted set of the few remaining significant browsers is not easy. The Certificate Authority Browser (CAB) Forum can be seen both as a body that attempts to safeguard the end user’s interest by stipulating CA behaviours that are an essential set of preconditions to being accepted as a trusted CA, and a body that imposes barriers to entry by potential competitive CAs. From this perspective, DNSSEC and DANE can be viewed as an existential threat to the CA model and resistance to this threat from the CAB Forum is entirely predictable and expected. Any cartel would behave in the same manner.
A third answer lies in the business model of outsourcing. The DNS is often seen as a low maintenance function. A zone publisher has an initial workload of setting up the zone and its authoritative servers. However, after that initial setup, the function is essentially static. A DNS server needs no continual operational attention to keep it responding to queries. Adding DNSSEC keys changes this model and places a higher operational burden on the operator of the zone.
CA’s can be seen as a means of outsourcing this operational overhead. It is a useful question to ask why the CA market still exists and why are there still service operators who pay CAs for their service while free CAs exist. Let’s Encrypt uses a 90-day certification model, so the degree to which the name security function is effectively outsourced is limited. There is a market for longer-term certificates that are a more effective way of outsourcing this function, and the continuing existence of CAs who charge a price point.
Even though DNSSEC has largely failed in this space so far, should the IETF have avoided the effort and not embarked on DNSSEC in the first place? I would argue against such a proposition.
In attempting to facilitate competition in the Internet’s essential infrastructure, the IETF is essentially an advocate for competitive entrants. Dominant incumbents have no essential need to conform to open standards, and in many situations they use their dominant position to deploy services based on technologies that are solely under their control, working to achieve a future position to complement the current situation. Most enterprises who obtain a position that allows the extraction of monopoly rentals from a market will conventionally seek to use the current revenue stream to further secure their future position of monopoly.
In the IT sector, such dominant actors, when pressed, have been known to use crippling Intellectual Property Rights conditions to prevent competitors from reverse engineering their products to gain entry to the market. In light of such behaviour, the IETF acts in ways similar to a venture capital fund, facilitating the entrance of competitive providers of goods and services through open standards. Like any venture capital fund, there are risks of failure as much as there are benefits of success, and the failures should not prevent the continual seeking of instances of success.
While I’m personally not ready to write DNSSEC off as a complete failure just yet, there is still much the IETF can learn about why it has spent many years on this effort. The larger benefits of such activities to the overall health of a diverse and competitive marketplace of goods and services in the Internet is far more important than the success or otherwise of individual protocol standardization efforts.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.