War story: RPKI is working as intended

By on 18 Nov 2024

Category: Tech matters

Tags: , ,

Blog home

To be very forward, this really is a story about something that turned out to be no problem at all. But sometimes boring stories deserve to be told. To provide context for this one, we have to go back to February 2008.

Back then — through no fault of their own — one of the world’s most popular video-sharing platforms suffered a disastrous multi-hour outage, interrupting millions of video viewings. The impact was so significant that even mainstream media reported extensively on what was essentially an arcane routing incident. But, nowadays we’re hearing less and less about incidents like these, even though the Internet is bigger than ever.

Recently, Fastly was the target of a BGP hijack, similar to what happened in 2008, but this time barely anyone noticed. Why is that? Something has changed. In this article, I’ll delve into one of the Internet’s most remarkable, yet untold, success stories.

A crash course on how Internet routing works

At its core, the Internet is a backbone spanning hundreds of thousands of interconnected routers managed by roughly 85,000 organizations to deliver data to millions of digital destinations. To establish what part of the Internet is attached where — what direction to send data packets to reach a given Internet destination (an IP address) — all these routers exchange messages with each other using an industry-standard protocol format called Border Gateway Protocol (BGP). The totality of this whooshing exchange of routing information oftentimes is referred to as the global Internet routing system.

Figure 1 — Internet Map by The Opte Project. Originally from the English Wikipedia, CC BY 2.5.

One of the key factors for routers to decide which of many paths to use for sending data is the Longest Prefix Match (LPM) algorithm. In a nutshell, more detailed information about a destination is preferred over less granular information. Think of punching into your car’s navigation system your destination’s street and city versus inputting only the city name. Both approaches will bring you closer to your destination, but of course, being more specific is likely to result in a better route. Put differently, the Internet would not work without LPM.

A major contributor to the Internet’s amazing year-to-year growth is that basically anyone can easily connect to it and almost immediately start sending and receiving data. You hook your router up to neighbouring routers from other organizations and then use BGP to send a message into the routing system. In doing so, you tell the Internet that your IP addresses are now reachable via a specified ‘next-hop’. The corollary is that the most obvious vulnerability in the routing system is the unauthorized origination of routes to IP addresses. More on that thorny aspect in the next section!

What happened in 2008?

A large economy’s incumbent telecommunications operator was instructed to censor a popular video-sharing platform within its national borders. Of the various mechanisms to block access to a particular Internet service, BGP is one of the simpler (albeit blunter) ways to blackhole undesired traffic. In the course of normal network operations, not every BGP message is intended or expected to be distributed into the global system. A network operator might intend for some BGP messages to only be distributed to its own routers for its own private purposes, constraining the scope to its own administrative domain.

Unfortunately — due to a configuration mistake — the BGP messages intended to comply with the economy’s censorship order were also passed on to adjacent networks outside of the economy, who, in turn, distributed them to their adjacent networks, and so on. In the blink of an eye, routers around the world received BGP messages that a specific set of the video platform’s IP addresses (remember the LPM algorithm!) were now being served from infrastructure in Pakistan. As that wasn’t at all where the video platform was actually attached, Internet data packets ended up being dropped on the floor, globally disrupting this video platform’s online presence. RIPE NCC did a good write-up on the technical details and NY TimesCNETArs Technica, and NBC News also covered the incident.

Fast forward to 2024

A very similar routing incident happened to Fastly just last week, but this time around no headlines were made. While this incident would’ve severely affected Fastly a few years ago, this time the impact was negligible. What gives? While the specific players and motivations differ from the famous 2008 incident, at its heart, the technical details were the same. In this more recent case, the state incumbent of another large economy generated BGP messages hijacking some of Fastly’s IP address space for the purpose of disrupting Internet traffic. What makes now different from then?

RPKI improves the routing system’s reliability

The big difference between 2008 and 2024 is that nowadays the Internet industry uses a cryptographically verifiable mechanism called RPKI to assess plausibility of BGP messages in a fully automated fashion. The RPKI is a distributed database through which networks can publish their routing intentions in Route Origin Authorizations (ROAs), in turn, enabling other networks to validate BGP messages against this database using a service called Route Origin Validation (ROV). By rejecting messages that fail this validation, the RPKI-invalid routes can be kept out of circulation, limiting their ability to cause disruption.

Publishing ROAs is easy! All five Regional Internet Registries (RIRs) offer RPKI certification services as part of their standard membership services. Since Fastly publishes ROAs for all of its IP addresses, Internet Exchange Points (IXPs) and major carriers like NTT, Comcast, AT&T, Cogent, Arelion, and Lumen can automatically ignore problematic BGP messages (like the ones that were hijacking Fastly’s IP space in this incident!). Because the industry at large is using RPKI, the only measurable impact on our traffic delivery was towards the disruptor itself, the rest of the world remained oblivious of this incident. A very serious BGP hijack happened and Fastly came out just fine. RPKI works as intended.

RPKI is a mature technology

RPKI’s story started two decades ago when X.509 certificate syntax was extended to support encoding IP addresses and Autonomous System Numbers (ASNs) via RFC 3779. (X.509 is the underpinning of web security mechanisms like “https://” that we’re all familiar with.)

In the following years, a design for an architecture materialized,  imposing order on the unwieldy ever-growing global routing system (RFC 6480). Then the five RIRs (APNIC, ARIN, RIPE NCC, LACNIC, and AFRINIC) got to work to build user-facing systems through which operators can configure ROAs. In 2018 and 2019, open-source projects like rpki-client and routinator were kicked off to securely bridge the gap between the RIR systems and BGP routers. Finally, in 2020, there was a sharp increase in the adoption of RPKI by the largest ISPs, IXPs, and cloud providers enabling the RPKI system to be more effective in providing broad benefits to the Internet.

Conclusion

Realizing fundamental changes, like what RPKI did for the Internet, is a matter of extreme patience and perseverance. This is because the Internet, by design, has no centralized or top-down administration. The Internet’s routing system is a voluntary collaboration between close to 100,000 organizations. Change comes from leading by example, educational outreach to peers and business partners, and an iterative engineering approach to resolve any obstacles discovered along the way. Hundreds of engineers and scientists cumulatively dedicated hundreds of years to meticulously performing heart surgery on a running system, embracing RPKI and improving Internet reliability. Even now work continues in the IETF to further improve the dependability and performance of the RPKI.

It came as no surprise to me when the Executive Branch of the US government recognized the societal benefits of using RPKI and endorsed the technology to begin to address the vulnerabilities inherent in BGP. Ultimately, RPKI is a system that helps networks stay in their own lane, allowing everyone to safely zip along the digital highway.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top