The following was written by Joao Luis Silva Damas and Geoff Huston.
In the original framework of the IP architecture, hosts had network interfaces, and network interfaces had single IP addresses. The list of active network interfaces, and the manner in which they acquire IP addresses (either by a static configuration or by some dynamic mechanism via a query to the local network) is specified in a boot-time configuration file.
iface eth0 inet static
iface eth0 inet6 static
These days, many operating systems allow a configuration to add additional addresses to network interfaces. This ‘aliasing’ of an interface with additional IP addresses allows a host to treat a number of differently IP addressed packets as local-address packets and allows applications on the host to bind to all of these addresses simultaneously.
iface eth0 inet6 static
iface eth0 inet6 static
Multiple interface addresses have also been adopted in the IPv6 architecture, where network interfaces can have multiple addresses including scoped addresses (IPv6 link local addresses); dynamic addresses (IPv6 privacy addresses); and multiple globally scoped addresses (multi-homed site prefix addresses).
However, in all these cases we are talking about binding an interface to an enumerated list of individual host addresses.
This enumeration approach has its limitations in terms of the number of addresses that can be managed in this fashion. What if, for example, in an IPv6 context you wanted the host to treat an entire /64 subnet as a collection of local addresses that are all equally and concurrently available to the host? In other words, what if you wanted an application to be able to receive packets that were addressed to any 128-bit address within this /64 subnet? Obviously, enumeration is not viable with such a large set of addresses so we have to look elsewhere for a solution.
To achieve this outcome, we could resort to installing some form of local address translation functionality, and use a translation rule that transforms all the addresses in the subnet to a single local address. By placing this rule into a host’s internal firewall/packet filter engine, all packets addressed to the subnet could be mapped into a single host address on ingress, and presumably, also mapped back on egress.
However, this approach has its limitations. Address translation, even if performed within the host, will not expose the original host address to the application. So if we would like the application to not only respond to packets addressed to any host address within a subnet, but also to be aware of which address it is responding to, then we need to find a method that directs the host to accept all packets addressed to the subnet as ‘locally addressed’ packets, and pass them to applications as necessary without changing the address values in the packet.
Another way of phrasing this question is: How can we bind a network interface to an entire subnet of IP addresses without having to enumerate each and every individual address?
The Linux local routing table
In the same way that host operating systems have evolved to manage interfaces with multiple IP addresses, we have also evolved our thinking about the internal packet forwarding structure in hosts. The original model of a host was a single routing table that specified how to handle packets. In its simplest form, each entry in the routing table specified an address prefix and an interface so that a packet that is processed by the routing subsystem is directed to the nominated host interface according to the lookup of the packet’s destination address into the routing table.
However, this view of the local routing structure has been revised by more recent versions of common operating systems. For example, the Linux kernel version 2.2 and version 2.4 support up to 254 routing tables. The original intent of this innovation was to allow flexibility in packet handling so that some local route policy setting could direct the host to use a different route table other than the default packet handling rules as defined by the main routing table.
In these Linux systems, two of these routing tables always exist: the main routing table (id 254) and the local routing table (id 255). The main routing table is the conventional one that describes the default local packet handling regime. The local routing table is normally controlled by the kernel to keep track of local (to the machine) IP addresses. For example, when additional IP addresses are added to a local interface, a host address unicast route entry is automatically added to the local route table. In this way, the operating system can track which addresses are associated with which network interface.
As well as mapping interface alias IP addresses into the local route table, it is possible to enter a route into this table without explicitly configuring a local interface. In this case, if a route is entered into the local routing table pointing to the local loopback interface, then this is directing the kernel to regard any matching packet as one that is directed to itself, rather than to be forwarded onward. Interestingly, this functionality in the local routing table applies to both unicast host routes and unicast prefix routes.
> ip route add local 2001:db8:1:1::/64 dev lo
We can also list the contents of this local routing table:
> ip -6 route list table local local ::1 dev lo proto kernel metric 256 local ::1 dev lo proto none metric 0 local 2001:db8::2 dev lo proto none metric 0 local 2001:db8:1:1::/64 dev lo metric 1024 local fe80::f03c:91ff:feb0:ffff dev lo proto none metric 0 ff00::/8 dev eth0 metric 256
What we see is the /128 address binding from an explicitly configured interface address (the entry for the host address 2001:db8::2) and the explicitly locally routed /64 subnet 2001:db8:1:1::/64.
This implies that to enable binding a local process to a full prefix of addresses without explicitly configuring each of them on an interface, we can inject a route for the prefix pointing to loopback into the local routing table and from then on the kernel will behave as if all those addresses are configured on the machine.
Of course, this is not all that needs to be done – this configuration step just allows a host system to recognize incoming packets that are directed to a network interface are classified as ‘locally addressed’ packets. The local router also needs to have an explicit route entry for the subnet that directs all packets with destination addresses that are in this subnet to this host.
Using a subnet binding
For TCP-based services, this local route table subnet entry is sufficient to allow a local application to listen and respond to incoming connections on an entire subnet.
The component of TCP API that can bind to locally configured subnets is the bind socket call, that binds the socket to an interface or a set of IP addresses. The API variant we need to use to perform the subnet binding is to pause the generic wildcard address (textually this is equivalent to binding the socket to the 0.0.0.0 IPv4 address and to the :: IPv6 address).
Given that each TCP session keeps session state, the replies that the TCP server generates uses a source address that matches the destination address of the incoming packet. In this manner the TCP server’s socket can be bound to all local addresses, including all addresses defined within these locally routed subnets simply by binding the listening socket to the wildcard IP address, and the expected thing will happen: namely the TCP server will respond to connection requests addressed to any host address in the locally defined subnet range.
This is not sufficient in the case of UDP-based servers wanting to bind to a subnet configured by this local routing table approach. As with a TCP server, a UDP server must bind to the wildcard address so that it will accept any addressed packet that is accepted on any local interface.
However, there is no saved session state at play here, so while the wildcard binding is sufficient to allow a UDP server to gather all incoming packets directed to the relevant UDP service for packets addressed to any address within the entire subnet, any response that the UDP server generates will not use the original destination address as its source address. What it will use is the IP address of the routing-determined outbound network interface for the packet.
Obviously, this can present problems to a UDP client, particularly when it is using a UDP 4-tuple for the source, destination addresses and port numbers in order to match responses to queries.
What we would strongly prefer is that the address pairing is preserved, such that the source address of a UDP server’s response is the same as the destination address used to contact the server in the first place. To achieve this, the UDP server application needs to set the source address or outgoing responses explicitly, using the destination address gathered from the incoming UDP packet it is responding to.
Picking up the destination address from the socket API is a case of adding a pointer to a message control information block to the socket recvmsg() call that collects an incoming UDP message and passes it to the UDP server. There are a number of ways to do this in IPv4, depending on the operating system, but in the case of IPv6 this has been standardized in RFC3542. We need to set the IPV6_RECVPKTINFO parameter on the socket. We can then call recvmsg() with an associated message control information block. The block that has the type value IPV6_PKTINFO has an associated data block that contains the destination IPv6 address of the UDP packet, and the local host’s interface index that received the packet.
So now that we have the original destination address from the incoming packet we’d like to set it as the source address of the outgoing packet. RFC3542 specifies the same API approach to perform this, where the control block with type IPV6_PKTINFO has a data block that contains the source IPv6 address for the outgoing packet, in the same format as the control block used to receive the packet. This control block is used with the sendmsg() socket call.
Now, we are almost there. If we set the source address to any host address associated with a network interface then that’s all we need to do. But our subnet binding is not a host address, and efforts to use an address drawn from this subnet will conventionally fail a check in the socket driver.
We need to direct the socket to operate in a more promiscuous mode and accept any of these subnet addresses as source addresses for outgoing packets. In the case of Debian systems, the socket option IP_FREEBIND is required to allow the UDP server to respond to using a source address that matches the incoming UDP packet’s destination address when using this bound subnet address space.
At this point, we are done (at least for Debian)! We can associate an entire subnet to the host, and as long as the external environment directs packet addresses to any address within the subnet of this host we can respond in a conventional manner in both TCP and UDP.
So, in conclusion, yes, it is possible to construct in both TCP and UDP a host system that:
- allows applications to bind to entire subnets
- responds to incoming traffic directed to any address drawn from the subnet
- does so efficiently without performing address translation or any other form of packet transformation
- does so without any overhead of extensive local table lookups that would otherwise be the case with large enumerated address lists
Where is this subnet binding technique useful? We will describe one application of this technique in the next article.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.