The internet is full!
It’s a great headline, but really, its not quite true. There is a problem, and its not going to go away, but what happened this week was a symptom of ageing equipment and the failure to keep systems configured, more than a Y2K end-of-the-world crisis. Actually, now I come to think of it, since Y2K turned out to be a bit of a non-event… maybe the comparison is valid!
What happened and why?
Internet Routing is performed by the Border Gateway Protocol or BGP. Every ISP who is independently routing packets into the rest of the public Internet gets a unique identity (called an ASN for Autonomous System Number: Autonomous means well.. independent). And using this ASN, they send each other lists of ‘announcements’ which detail which networks they can reach, and when they stop being able to reach them, they send ‘withdraws’
These lists of networks are more specifically called prefixes. A prefix is a block of Internet Addresses, and consists of a prefix and a length, written prefix/len. So if you have the block of Internet addresses 192.168.0.0 through to 192.168.255.255 you have 65,536 individual Internet addresses and your prefix is 192.168.0.0/16 because there are 16 bits of prefix defined in this range of numbers.
The thing is, that prefixes are like a set of Russian dolls, or a stacking tower: bigger ones can fit lots of smaller ones inside. You can break a big one up, into lots of smaller ones: in the above example the 192.168.0.0/16 prefix can be broken up into two halves 192.168.0.0/17 and 192.168.128.0/17 -or it can be broken up into four /18 or eight /19 or sixteen /20 and so on, right the way down to 65,536 /32, each one an individual fully specified unique host.
If you don’t like your neighbor spitting out that list of 65,000 odd prefixes, you can chose to ‘filter’ it down to the one encompassing /16 prefix, but if they have actually told you “I can’t reach this one over here quite the same way, its a bit slower” you won’t detect that difference.
This is used by ISPs to perform what is called Traffic Engineering or TE -they might use this to send traffic down one fiber to the USA from Melbourne, and another from Sydney. Or to balance out a set of smaller thinner links, and mix their customers over all of them to share the available bandwidth. If you don’t accept their ‘more specific’ prefixes, you can’t see their attempts at traffic engineering, and you might make their customers data travel down less efficient paths (equally, this might suit you better).
These prefixes are held in a special kind of memory in the routers running BGP. Its called Ternary Content-Addressable Memory or TCAM -And the amount of TCAM memory you have limits how many prefixes you can store, and therefore how much of the complete list of BGP prefixes carried in BGP you can store, and therefore make routing decisions on.
For some time now, Geoff Huston has been tracking the growth of BGP daily and recently, the number of prefixes has approached, and exceeded the magic 512k number. This is the default size of TCAM memory set aside by older routers from some vendors, for representing the entire IPv4 routing table. At this point, if you don’t re-tune your router configuration (or cannot) then you have no choice but to drop some prefixes. And … boom. Parts of the Internet are no longer reachable, because you can’t see them.
Oh no, oh no, woe is me. Hang on. You mean I can re-tune TCAM? Well. for some older than old routers, this isn’t going to work: they can’t handle more than this magic limit anyway. You can filter, and so limit how many prefixes you have to hold. But for the ‘newer’ older routers, it turns out that you can in fact tune the TCAM memory, and tell it to use more or less than the default. And, that’s what the ISPs of the world have been doing, since the 512k limit was reached. The NANOG operations list has a lively discussion going on as ISPs discuss who was hit, and which devices can be re-tuned, and how, and how much to tune.
End of the Internet at 11? No. But.. its a signal that good times are over.
- This problem isn’t going to go away. Tuning TCAM in an older router is only delaying the point where it will have to deal with a larger routing table, and either you have to start filtering more aggressively, or upgrade the hardware
- At any point, an ISP on the other side of the world using BGP can decide to de-aggregate a prefix. This isn’t a ‘rules based’ world where somebody doles out the right to assert a prefix, the routing community decides for itself what to accept and reject, using mechanisms like the RPSL routing policy specification language and the IRRtoolset
- IPv6 is now a significant deployment world wide. While the current world IPv6 routing table fits in under 15,000 prefixes (compared to over 500,000 in IPv4) its much more efficiently allocated, each ISP has a significantly larger footprint of addresses and so needs to announce fewer prefixes.. but there are some catches:
- They can’t stop announcing IPv4 prefixes while we still run a mix of IPv4 and IPv6
- IPv6 is capable of being dis-aggregated just like IPv4 and potentially represents even more prefix table size
- You have to balance how much TCAM is made available for IPv4 and IPv6
- IPv6 addresses and prefixes consume more TCAM per prefix than IPv4.
Part of this problem is that the occupancy of a routing slot in the global BGP system is an externality: you announce as many prefixes as you like and everyone else has to handle it.
This has the makings of a classic Tragedy of the commons: the global BGP routing system is something which lets all of us benefit from a remarkably flat information sharing system that depends on mutual behavior by the participants. If somebody decides to play bad, and emits too many prefixes, it isn’t neccessarily them who suffer the consequences.
How to decide the best way to manage these problems isn’t going to change soon. As it stands, we depend on the goodwill of technical operations folk to take their localized costs into their company and fund upgrades, and the opportunities for policy development in the various regional network operations meetings. Mostly, the operational behavior is informally defined. If problems like this keep happening, pressure to formalize the behavior may get stronger.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.
Pingback: Routing table growth causes some hiccups
Pingback: Route to the Internet’s future | APNIC Blog