The fragility of transient identifiers

By on 1 Apr 2022

Category: Tech matters

Tags: , , ,

Blog home

During IETF 113, my eye was drawn to a review draft exploring a problem space in the the Internet Research Task Force (IRTF): The ‘Unfortunate History of Transient Numeric Identifiers’ (draft-irtf-pearg-numeric-ids-history). The IRTF tends to have a long-term outlook exploring the edge cases of the Internet that led to work inside the IETF. It has working groups for different areas of research, and this draft is a product of the Privacy Enhancements and Assessments Research Group (PEARG).

What are ‘identifiers’?

Normally, when discussing Internet things, an identifier is a unique identity in the global network. Sometime the discussions are about the lack of ‘Locator-Identity Separation’ — the fact that IPv4 and IPv6 addresses are used as both locator and identity — and the problems this causes. The Locator/Identifier Separation Protocol (LISP) was designed specifically to address this and is an instance of an overlay network on the basic Internet, exploring the properties of a network with clean separation of these two roles. 

In this case, we’re actually talking about a different kind of identifier — one that lies inside the IP packets, up a layer in the transport header. This IRTF draft is about the use and abuse of identifiers ‘inside’ a packet, designed for use in-flow, in end-to-end delivery. These numbers exist to help the protocol do its job.

Typically, identifiers exist to do one simple thing — to identify a stream of data between two (or more) parties against all the other streams of data either side might have, including streams going to exactly the same hosts. The ‘5-Tuple’ (a database concept that, in this case, refers to five related things comprising a TCP session), identifies a unique end-to-end flow between the hosts and routers that process it, but there is the additional problem within each host of disambiguating possibly identical 5-Tuple identified TCP sessions from each other. This is where some of the additional inside-TCP identifiers come into play; they are meant to be sufficiently unique to do this job. They’re protocol elements that help the protocol do its job. Some of the 5-Tuple elements are globally assigned by the RIRs (and downstream by the LIRs) but some are not, and are essentially ‘ephemeral’ values. Outside of the 5-Tuple, there are other protocol-specific fields that are meant to be temporary unique values to identify something about this protocol, in this specific instance of use.

Ephemeral ports are also transient identifiers

There is a well-understood set of ‘well-known’ ports in TCP and UDP that are subject to assignment by IANA. A protocol is assigned one of these numbers by application under the IETF RFC process, as a request to IANA. So, port 25 in TCP is for SMTP mail daemons to listen on, and port 443 is used by web servers to listen for HTTPS-protected web queries. There are several of these registered assignments, which were previously kept to a specific region of values below 1,024, but in practice now occupy far more of the 16-bit number field the TCP and UDP protocol both define for source and destination ports.

These ‘assigned’ ports (such as 25 for SMTP) are only a convention. In fact, they are free to be used in any way. Other people might be surprised but the owner of the host can use port 25 to send HTTPS. However, the convenience of using ‘standards assigned’ ports is to avoid having to communicate which things are used where.

However, there’s another way these ports are used; the sender picks a ‘random’ one when they emit their UDP or TCP packet, and this value can be reused later by another connection. It’s not a listening port, for unscheduled incoming packets; it’s a temporary local use for the specific job of receiving the return packets (if any) from the outbound message. These random reused port numbers are ‘ephemeral’ ports. The intention is to try and avoid reuse of their numbers too soon so that packets left undelivered from a prior connection (possibly to even the same host) are not delivered unexpectedly to this program or service.

This is only one kind of ephemeral local identifier. Aside from ports in the 5-Tuple there are also other identifiers, such as the TCP ‘initial sequence number’ of packets in the flow, or NTP sequence numbers, or parts of the IPv6 address that are meant to identify local interfaces on a specific host. 

Abusing identifiers for fun and profit

When a packet cryptography layer is absent (which in principle can exist, as defined in IPSEC), players along the path or a bad actor at any time can send packets that spoof these values. So, exactly how these ephemeral ports or other values are chosen matters because it can cause unexpected packets to be received, and thus interfere with the operation of the specific end-to-end conversation at play. If the approach is to ‘just add one’, then the risk is entirely predictable, and from seeing only one or two packets in flight, the bad actor can predict what numbers to use next and perform an attack. If the approach is to use a simple Random Number Generator (RNG) it can be replayed or predicted. 

Another problem is the implicit ‘information leakage’ aspect of how the implementation behaves. For example, if a Windows PC doesn’t use the right form of randomized port or sequence number assignment, then simply seeing packets in flight identifies the end host as a Windows PC, and an attack can be crafted based on that knowledge. This kind of host identification from packets can get quite detailed, down to specific versions of the operating system. A huge tactical advantage is for bad actors looking to exploit weaknesses in this host.

Making the same mistakes

The draft notes that despite over 30 years’ experience in designing protocols, it looks like IETF engineers are fond of repeating the same mistakes, over and over. As the draft says (with examples of each type of mistake) :

While assessing protocol specifications regarding the use of transient numeric identifiers, we have found that most of the issues discussed in this document arise as a result of one of the following conditions:

  • Protocol specifications that under-specify the requirements for their transient numeric identifiers
  • Protocol specifications that over-specify their transient numeric identifiers
  • Protocol implementations that simply fail to comply with the specified requirements

In the draft, there is a large and probably non-exhaustive list of 67 references to instances of failure in the specification or implementation of a protocol, dating back to the 1980s. The problem is that many of them are far more contemporary.

This IRTF PEARG draft will lead to more future work for protocol designers in the IETF. Expect to see stronger tests of behaviour and design to get ‘over the bar’ for publication of new work.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top