More notes from IETF 111

By Geoff Huston on 6 Aug 2021

Tags: BBR, BGP, HTTP, IETF, IPv6, security, TCP

I’ve already scribed my thoughts on the DNS-related topics discussed at the July 2021 IETF 111 meeting. It may be surprising to the DNSphiles out there but there really are other topics that are discussed at IETF meetings not directly related to the DNS! These are some notes I took on the topic of current activities in some of the active IETF areas that are not DNS topics.

Transport and L4S

Transport has been undergoing a revival of study in recent years. The original Additive Increase Multiplicative Decrease (AIMD) flow control models were revised because of the emergence of very high capacity extended length network paths (so-called Long Fat Network (LFN) paths), and there was a set of proposals in the area of high-speed TCP that looked at how to tune the TCP flow control protocol to quickly adjust to very high speeds, while still remaining responsive to the so-called ‘fairness’ considerations of sharing a common transmission resource with concurrent conventional AIMD TCP sessions. The CUBIC flow control protocol gained popular acceptance, probably based on its use as the default TCP control protocol in several widely used Linux distributions.

In this family of AIMD control protocols there is a direct relationship between the performance of the protocol and the size of the buffers in the active switching elements of the networks. If the buffers are too small, the protocols fail to increase their speed to take advantage of available path capacity, while buffers that are too large simply act as a delay element. The situation has been exacerbated in recent years when network equipment has been deployed with, at times, quite extravagant buffers, and the combination of large buffers and AIMD-based control flows can induce a condition known as ‘bufferbloat’.

Read: Bufferbloat may be solved, but it’s not over yet

There have been two somewhat different forms of response to this issue. One response is to change the flow control protocol. The BBR flow control protocol is a good example of a flow control protocol that is not reliant on getting the network buffer provisioning to be exactly right. The BBR protocol makes use of continuing round-trip delay measurements to inform a sender as to the onset of network queuing, distinct from the AIMD-related signal of packet drop, which is a signal that network queues have not only formed but have become saturated.

Read: When to use and not use BBR

The other form of response is to enlist the network elements to inform the end host of the network’s state, and in particular, the onset of congested-related queue formation. A good example here is the Explicit Congestion Notification (ECN) experimental protocol, described in an experimental specification, RFC 3168. Network switching elements marked ECN-aware IP packets were processed when the switching element was experiencing a congestion load.

In TCP, this ECN marking is passed back to the sender in the TCP ACK packets, which then treats the received ECN congestion experienced signal in a manner like conventional packet drop. Ongoing experience with an increasing level of adoption of ECN has shown that ECN is most effective in conjunction with Active Queue Management (AQM). This approach that combines ECN with AQM has led to the so-called ‘L4S’ effort, intended to deliver Low Latency, Low Loss, Scalable Throughput service. And it’s this L4S effort that was reviewed in the Transport Area Working Group session at IETF 111.

Read: Control congestion, lower latency with new L4S AQM scheduler

Things are moving along and following the publication of RFC 8311, which lifts some of the earlier restrictions on the use of the ECN mechanism, there are four L4S working drafts nearing completion. There is an architecture and problem statement, an L4S identifier, a Diffserv Code Point, a Dual Queue AQM framework, and operational guidance to coexist with ‘classic’ ECN.

There are some issues with this effort to reinterpret the meaning of ECN signalling, as these days there are already hosts and network devices using ECN marking along what could be called ‘classic’ lines, and the introduction of L4S has issues with L4S hosts misinterpreting ECN signals as L4S marking, and L4S-aware network marking being misinterpreted as ECN signals. Any effort to make such changes on an in-use system is always going to have these issues, and L4S is no exception.

These days of course, transport has some further complexities, including consideration of multi-path transports (MPTCP), distributed congestion control transports (DCCP), and various forms of QoS to contend with. There is also the issue of both inadvertent and hostile signal manipulation, replay attacks and re-sequencing attacks, all of which require some robust response for an L4S system.

As the bufferbloat exercise has pointed out, queue delay is now a major impediment to service performance, and it seems somewhat crazy to spend considerable effort to reduce the user experienced delays in areas such as DNS performance, TLS session setup, and content delivery when the entire effort is negated by seconds-long network queue delays.

We must move beyond a relatively crude common flow control mechanism based on congestion collapse and packet loss to control traffic flows, and efforts such as L4S appear to offer some major promise. However, I cannot help being cynical about the prospects for this L4S network signalling approach in the context of the public Internet, and for the further prospects of ECN, for that matter. It requires deployment across all networks to perform marking, and deployment across clients to echo ECN markings for the actions by the server to use this marketing to control the end-to-end flow.

It’s a replay of a now quite old network QoS story that is only truly effective if all network elements and all hosts perform the combination of marking, recognition, and response. The more deployment is piecemeal then the more the service outcomes cannot be assured. In other words, such coupled combinations of network and host actions work as intended if everyone plays from the same book. It’s possible to orchestrate such collective actions in closed environments, but not so easy in the large and loosely coupled environment that we see in the public Internet.

When you compare this approach to BBR, which is a unilaterally applied server-side protocol that does not require any matching operation on the part of networks or even remote clients, then BBR looks like a far more viable approach if you want your TCP flows to avoid encountering the delays associated with the formation of standing queues in the network.

IPv6 maintenance — 6Man

The IPv6 effort split into two streams in the IETF — V6Ops Working Group and the 6Man Working Group. The first was to consider operational issues associated with the deployment of IPv6, and the second was to coordinate minor tweaks to the v6 protocol.

One of the topics considered at the IETF 6Man Working Group was the perennial topic of Path MTU management. As the working draft of a new MTU option points out, “The current state of Path MTU Discovery on the Internet is problematic. The mechanisms defined in [RFC 8201] are known to not work well in all environments.”

I would claim that as an understatement! This proposal is intended to replace reliance on the ICMPv6 Packet Too Big messages and replace them with a signalling protocol that is intended to discover the Path MTU. Presumably this applies to TCP and TCP-like protocols and would not be relevant for UDP transactions. Its operation is simple; when a router processes an IPv6 packet with this hop-by-hop extension header, it compares the MTU of the next hop interface with the minimum MTU value contained in the packet header. If the local value is smaller, then it rewrites the option value with this local value. Upon receipt at the intended destination, the receiver is meant to pass this value back to the sender in a return signal.

This approach appears to have exactly the same issues of host and network orchestration as ECN and L4S deployment have. IPv6 routers need to recognize this hop-by-hop extension header, perform MTU processing, and potentially rewrite the IPv6 packet extension header. Likewise, hosts need to recognize this header within a TCP or TCP-like flow and return the signal back to the sender. Perhaps if the original design process of IPv6 had spent some time considering the issues with ICMPv6 blocking in networks, they might have come up with this extension header approach. Or not.

Read: IPv6 fragmentation loss in 2021

In the timeline of a deployed technology, the deployed environment exercises a considerable inertial force, and the greater the deployment then the greater it has resistance to incremental change. This was a reasonable idea to have in 1993 when it might have been incorporated into the IPv6 design specification, but while the concept is still an interesting use of IPv6 extension headers today, its chances of adoption at a useful scale in today’s public IPv6 Internet look to be remote. Hence the draft’s caveat that this mechanism is intended to be used in closed environments such as within data centres.

The whole issue of the viability of IPv6 extension headers in the public Internet has been discussed for several years, in the light of periodically published measurement data showing considerable levels of packet drop when the packet contains an extension header. This draft attempts to address some aspects of the issue by further defining expectations of node behaviour in processing these extension headers. The observation here is that while Hop-By-Hop (HBH) processing in the first IPv6 specification was required for all nodes, there were obvious issues with the ability to process such packets at wire speed in hardware. The common design approach was to push packets with HBH options to the general processor engine (usually referred to as the ‘Slow Path’), as a means of avoiding degraded router performance for all packets. The implication was that packets with HBH extension headers experienced higher latency and higher loss probability. Packets with multiple HBH options exacerbated this problem.

The current IPv6 specification has changed this HBH processing requirement to only require such processing if the router is configured to do so. This essentially documented current operational behaviour but did not improve the situation of the viability of using HBH extension headers in the public IPv6 Internet, and possibly was one further step towards the complete elimination of such extension headers in the IPv6 Internet. The goal of this HBH-processing draft is to reverse the situation and specify ways to make HBH processing viable. It advocates that where HBH headers are present then nodes must process the first of such HBH headers and may process more if permitted by the local configuration. The expressed hope was that such a specification would allow an HBH header to be processed in the conventional processing path on a node and not shunt the packet to the Slow Path. It is unclear whether this is viable advice to designers of packet switching hardware, and the inexorable demands of faster and cheaper equipment would appear to preclude even this limited use of HBH headers as a common practice. The most viable current advice for using HBH headers in IPv6 packets in the public Internet is — don’t!

In a similar vein, the proposal this draft advocates is imposing limits on the number of extension headers, and limits on the number of options and padding in Destination Options and HBH options. The issue here is that such limits are probably contextual, and practical limits for the public Internet would be different from what is supportable in more restricted local contexts. I’m not sure that this approach changes the overall poor prognosis for extension headers in IPv6 any more than the one and only one processed HBH extension header approach advocated in the other draft proposal.

Configuring a new host in IPv6 quickly became a confusing story. While IPv4 quickly adopted the DHCP approach where a server loaded the host with configuration parameters during the boot process, IPv6 was keen to adopt a serverless approach.

So IPv6 adopted an approach using ICMPv6 Router Advertisements (RAs), but to provide other configuration parameters it also adopted both stateful DHCPv6 (RFC 3315) and stateless DHCPv6 (RFC 3736). And then the IPv6 working groups spent a large amount of time debating all kinds of proposed new options for these configuration methods.

Part of the reason behind this debate is that any adopted change to these configuration options requires updates to both IPv6 hosts and the network’s host management regime. A proposal is being considered for a universal configuration option, where the options are contained in a self-describing JSON object, modelled on CDDL, and encoded using CBOR. The RA or DHCPv6 functions contain an opaque configuration payload, whose configuration object contents are defined in an IANA registry. This seems like a useful step forward, although the discussion of this proposal was more along the lines of a ‘path not followed’ discussion.

Unless we are willing to replace RA and DHCP options with this then all we would be doing is adding a third configuration mechanism to an already confusing mélange of choices.

In any case, I was particularly struck by the Security Consideration section in this universal configuration draft that asserts: “Unless there is a security relationship between the host and the router (e.g. SEND), and even then, the consumer of configuration information can put no trust in the information received.”

True, but like many parts of the Internet, nothing would work at all if we believed everything picked up in the random noise from these networks!

BGP operations – GROW

A problem shared is a problem solved. Yes? Probably not in the general case, but that doesn’t act as a deterrent for many folks to present their problem and its proposed solution in the hope that everyone else will sign up to the solution.

One of those local problems was presented at the GROW Working Group as a proposal to add a local AS Looking Glass endpoint to the capability negotiation in an eBGP peer session. The idea is that a prefix advertiser would learn, from the local BGP session, a means to overview the BGP session from the ‘other side’ and confirm that the advertisement had been accepted by the peer.

While there is a common need to determine how a prefix is propagated in the inter-domain routing space, and there is useful work that could be done here, I’m afraid that this proposal doesn’t appear to me to be part of any generally useful solution. The real objectives appear to be the ability to track and trace the propagation of a prefix advertisement in the inter-domain routing space, then to detect what forms of alternation were made to the route object after it has been propagated, and then some hints as to why the remote BGP speaker made a local decision to further propagate or drop the route object.

This kind of diagnostic information could, in part, be made available using BMP BGP traces, but the larger issue of exposing local routing policy decisions relating to the local handling of route objects may not have any direct solution.

BGP update messages contain a structured field called a ‘community’ to express some form of policy preference or directive to be applied to an update. Some are ‘well-known’ communities, which are supposedly uniformly supported such as ‘No-Export’. Others are locally defined by a particular Autonomous System (AS).

The initial definition of a BGP community was a 4-octet field, which was large enough to encode a 16-bit AS number and a 16-bit value that is associated with some route policy or action by that AS. This was extended in the 8-octet extended community construct that used a 2-octet type field and a 6-octet value. With the introduction of 4-octet AS numbers, the extended community construct was too small so RFC 8092 extended this with large communities to use a 12-octet value.

How widely are these more recent (Extended and Large) community types supported on the Internet? How widely are they used? How many ASes are seen to propagate route objects that have attached community attributes?

The study undertaken by NIST and presented at the GROW session was interesting. Using the snapshots provided by RouteViews and RIS they observed some 3,000 ASes that propagate regular communities, but the number drops to under 500 for both Extended and Large communities. These latter two community types appear to be used for peer signalling and not broader policy attribute announcements. If we wanted to use Extended and Large communities as a transitive attribute, then what additional work and recommendations would be needed to facilitate this?

Oblivious HTTP BOF

The privacy program in the IETF continues, and there has been recent work in the concept of ‘Oblivious’ services. The overall intent is to encapsulate service transactions in multiple wrappers so that no remote party, even the ultimate service delivery point, has knowledge of the requestor and the request.

In the case of Oblivious DNS, the stub resolver encrypts the query using the public key of the recursive resolver, and then encrypts this message using the public key of a proxy agent. A proxy agent, acting in a role similar to that of a conventional forwarder, can decode the outer wrapper and pass the inner encrypted payload to the recursive resolver.

Read: Improving the privacy of DNS and DoH with oblivion

Oblivious HTTP is proposed to operate in a similar manner. The Oblivious HTTP proposal describes a method of encapsulation for binary HTTP messages using Hybrid Public Key Encryption. Like Oblivious DNS, this protects the content of both requests and responses and enables a deployment architecture that can separate the identity of a requester from the request, if the proxy and the server do not collude in any way.

This is not the first introduction of this work into the IETF, and the proposal was considered previously in the SECDISPATCH Working Group. There are several questions unique to this proposal that probably influenced the decision to discuss this further in a BOF rather than progress the work as a working draft in the HTTPBIS effort.

When the IETF considers a system for forming untraceable HTTP requests, where the information is spread in a way that no single entity can rejoin the requestor and the request, then this has some deep implications in a number of areas, including the business models of content provision that often rely on assembling an intimate profile of the end user in order to maximize the indirect advertising revenue, various public security agencies and the gathering of background intelligence data to name just two. In some ways, it’s not a change to the HTTP protocol as such, but a form of handling of HTTP requests.

The questions raised at the BOF, including the consideration of encrypted requests, may have implications that would require every content server in the Internet to be modified and equipped with OHTP handling functionality.

On the other hand, there are so few actual server and client platforms out there in this age of CDNs that this claim that ‘modifying everything would be impossible’ is not as impossible as it might sound. For example, if you modify Chrome’s WebKit engine then what’s left that needs changing on the client side?

This leads to a second concern that this actually fosters further pressure for centrality in the market by raising the barrier for entry to smaller providers, in the name of increased user privacy. Another comment questioned why this is even necessary given that the IETF could achieve a more generic outcome by standardizing TOR.

There was also the comment that the Masque proposal seems to have passed through some form of IETF evaluation without significant comment, and if Masque has not raised eyebrows, then why should OHTTP be singled out?

Given that much of today’s Internet is the supporting pedestal for the highly lucrative surveillance economy, there is a real question about the willingness of the various large actors in this space to voluntarily close down these observational windows. There is little doubt that we can do it, but questions remain as to whether we should do it and who would be motivated to do it at all!

Securing BGP – SIDROPS

In hierarchical PKI, it is quite easy to change keys. It’s a dance you perform with your parents to inform them in a robust manner of the new key value and then perform the cutover. But when you want to change the apex key in the hierarchy, or the Trust Anchor key, then the story is far more challenging. Conceivably, you could get everyone who uses the PKI to load a new key, but that’s typically infeasible.

The more conventional way is to get the outgoing, but still trusted, key to sign the incoming key. This is what we did with the recent KSK roll in the root zone of the DNS, and it’s what’s being proposed for the RPKI. This procedure is documented in RFC 5011 in the context of DNSSEC, but its guidance is general for Trust Anchor key rolls.

Read: Analyzing the KSK roll

One feature is prominent in RFC 5011 but missing in this draft, namely the use of times and timers. Without timers a hostile party who has compromised the Trust Anchor could use this key to sign a new key under the control of the attacker and then complete the roll process. At this point, there is no way back other than the hard path of getting every user of the PKI to reload Trust Anchor keys from scratch, and without any in-band assurance.

Key rolls are a challenging subject. The procedure works if the key is not compromised but fails in horrible ways if the key is ever compromised. So, if the roll procedure can’t get you out of trouble when you have fallen into a troubled spot, then why bother having a key roll at all? There are some who might claim that a key sitting in an HSM suffers from bit rot over time!

Read: What is ‘M of N’ in public/private key signing?

A less fanciful notion is that the more times a key is used, the more information about the key value is inadvertently exposed. I have arguments in favour of this notion and arguments that discount it when the key is used less frequently, for some indeterminate value of ‘less’. If the key is not compromised through use, and the key roll procedure cannot handle the case of a key compromise, then what is the goal here?

It might just be that this is something you want to exercise very infrequently and do so as a counter to the increasing risks of algorithm compromise. It’s not that the challenge to find the private key value, given the public key and samples that encrypted output, is impossible. It is theoretically possible to do so, but the idea is to devise an algorithm that is computationally infeasible. As computers get faster, or change (as in quantum computing), then previously infeasible problems may become feasible. It’s this consideration that drives the need to define a key process procedure. You might use it infrequently but when it’s time to use it, you really need to have one at hand that you prepared earlier!

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

2 Comments

Ole Trøan August 7, 2021 at 5:16 pm

The proposed MTU record mechanism for IPv6 was first described in RFC1063 from 1988.
Then I think you make good arguments as if this idea’s time has come. Or not.

Reply ↓
Geoff Huston August 9, 2021 at 9:24 am

I had forgotten about this 1988 work Ole. I don’t recall if it was ever implemented in hosts or routers at the time. I certainly don’t think I ever saw it being used in IP networks at the time, and the only work on MTU round then was Doug Comer’s work that found extensive use of 576 octets as a prevalent host MTU setting. I also recall that the Mogul and Kent SIGCOMM paper cited as a reference for this RFC made a big impact a few years later when the IPv6 design effort decided to drop fragmentation-on-the-fly and make the fragmentation fields an extension header.

Reply ↓