Leaping through RPKI history with Ziggy

By Roland van Rijswijk-Deij on 30 Aug 2019

RPKI, the Resource Public Key Infrastructure, is an important cornerstone in securing the BGP routing system on the Internet. In its current form, RPKI enables resource owners (for example, holders of IP prefixes) to issue digitally signed statements about which Autonomous Systems (ASes) may originate routes for those prefixes, using so-called Route Origin Authorizations (ROAs). These ROAs can then be validated independently and used by routers to check if the BGP updates that they receive from peers are legitimate. By checking if BGP announcements are valid, routers can prevent prefix hijacks from propagating, protecting Internet traffic from honest misconfigurations, but also from malicious announcements.

Key points:

We developed a new tool called Ziggy to run historic RPKI data data through Routinator to validate it, and to take a peek into RPKI history at the same time.
We discovered that data before 2014 was difficult to validate and investigated the cause of some large disaggregation events.
The use of RPKI is growing rapidly, especially in the last two years. In fact, we have just passed 100,000 VRPs!
The use of MaxLength is slightly more common for IPv6 than for IPv4. We also noticed a trend towards smaller prefix sizes.
We also analysed global coverage and accuracy, noticing that growth rate differs by region, and accuracy is already high and still improving.

Figure 1 — Routinator logo.

Since 2018, NLnet Labs has a new line of open source projects focusing on RPKI. Our first project in this space is Routinator, which is so-called Relying Party (RP) software. It plays the important role of validating ROAs and outputting so-called Verified ROA Payloads (or ‘VRPs’ for short) that routers can use to filter out invalid BGP updates. Of course, given the vital role it plays, we thoroughly test every release of Routinator carefully with current RPKI data before releasing it. But then we got our hands on some great data archived by our good friends at the RIPE NCC: all historic RPKI data going back to the very origins of the protocol in 2011. This was, of course, an opportunity we could not resist: we wanted to run all of that data through Routinator to test it, and to take a peek into RPKI history at the same time. This long-read blog tells the story of how we did this with a tool called ‘Ziggy’ and what we found.

Figure 2 — Al & Sam in Quantum Leap episode ‘Ghost Ship’ (S4, E16).

Al: “Ziggy says the odds are real good.”
Sam: “How good?”
Al: “Oh, you know. They’re way up there.”

To use Routinator on the archived RPKI data, we need a couple of things to come together. We need to grab data for the right date from the RIPE archive, we need to figure out which objects are in the archived data, we need to recreate the RPKI trust roots (so-called ‘TALs’) for that date, and finally, we need to run Routinator as if it were that day. Being the sci-fi geeks that we are, we of course had to pick an appropriate name for the tool that would perform this task. Given that we were essentially making Routinator travel back in time, we decided to call the tool ‘Ziggy’ after the eponymous computer in 1990s sci-fi classic ‘Quantum Leap’. And, of course, we open-sourced Ziggy so you can play around with the data too.

First things first: what could we validate?

This weekend, in just over 15 hours, @routinator3000 sequentially validated over 8 years of RPKI data. And the slowest step in that exercise was my script that extracted the datasets from .tar.gz files. Awesome! 🚀🚀🚀

— Roland van Rijswijk-Deij 🇳🇱🇪🇺🏳️‍🌈 (@reseauxsansfil) May 20, 2019

The first thing we wanted to find out is what we could validate. So we used Ziggy to run all the data we got from RIPE through Routinator. This took over 15 hours, but considering that this was over 8 years of data (to be precise: 3,109 days of data) that was pretty impressively fast!

Cool, so apparently we could validate something. What does this data look like? The first thing we did was plot the number of VRPs that came out of the validation, giving us the following two plots (IPv4 on the left, IPv6 on the right):
Figure 3 — Development in the number of VRPs over time (IPv4 left, IPv6 right).

There are a couple of takeaways from these graphs. First: before early 2014, very little data could actually be validated (first arrow in the plot). The reason for this is simple: the data before then didn’t quite comply with the RPKI standards, and Routinator cannot validate it because of that. We are working on a way to process this data in any case, but note that a modern validator would have ignored the data, so in essence it is ‘RPKI unknown’ and should not be used for filtering.

The second thing to note is a significant spike in both plots for APNIC around the middle of 2017. This is due to massive deaggregation of the ROAs for three ASes. Effectively, these ASes had only a few ROAs each for a large prefix with the MaxLength attribute set to /24 before the event, and during the event these ROAs were replaced by a very large number of ROAs each covering a single /24 from the larger prefix. We asked APNIC about this, and they commented that this was a mistake during the introduction of a new generic route management system that covers both IRR and RPKI. When internal monitoring noticed the mistake, they stopped the process, fixed the mistake and restarted the migration. There is another deaggregation event for a single AS in the IPv4 data (marked by an arrow); this appears to be a purposeful deaggregation of a large prefix into ROAs for single /24 prefixes. There has been some debate about whether MaxLength should be used if not all of the more specific prefixes allowed by the MaxLength setting are actually announced. We speculate (but cannot confirm) that this has led to at least some ASes deaggregating their ROAs.

We just passed 100,000 Validated ROA Prefixes, globally. Awesome effort everyone! 🚀 #RPKI pic.twitter.com/a0qMzsMdmM

— Routinator 3000 (@routinator3000) August 23, 2019

Finally, and most importantly: the use of RPKI is growing rapidly, especially in the last two years. In fact, we have just passed 100,000 VRPs! 🎉

Zooming in: deployment of RPKI in real-world practice

Now that we have a general overview that shows that RPKI use grew over time, let’s zoom in on some details that show how RPKI use has changed over time. In particular, we will zoom in on two things: the use of the MaxLength attribute and the average size of prefixes and MaxLength attributes in ROAs over time. We start with a big plot:

Figure 4 — ECDFs of prefix sizes (top) and MaxLength (bottom) in VRPs (IPv4 left, IPv6 right).

This plot shows a so-called Empirical Cumulative Distribution Function (ECDF) of the prefix size encountered in ROAs (top: left — IPv4, right — IPv6) and the MaxLength value encountered in ROAs (bottom: left — IPv4, right — IPv6). Each plot shows the distribution on 1 August in five different years (2015–2019). And what each plot clearly shows is a trend toward ever smaller prefix sizes, both in the prefix covered by the ROA and in the MaxLength set in the ROA. In 2015, 42.4% of ROAs for IPv4 had a prefix size of /24, in 2019 this had grown to 53.6%. For MaxLength, we see a similar development with 56.9% of ROAs having a MaxLength of /24 in 2015, growing to 65.6% in 2019. Given that the IPv4 address pool is severely exhausted, this is unsurprising. Routing data also reflects this trend.

Interestingly, the trend of decreasing prefix sizes in ROAs is also clearly evident in the IPv6 graphs. In 2015, a staggering 60.4% of IPv6 ROAs had a prefix size of /32 or bigger. In 2019, this had dramatically decreased, with almost half of IPv6 ROAs having a prefix size of /40 or smaller. We speculate that this is due to the increased production use of IPv6. The use of MaxLength in IPv6 reflects much the same thing; in 2015, about half of ROAs had a MaxLength set to /36 or bigger, in 2019, the halfway point shifted towards a /48.

And what about the use of MaxLength in ROAs? As we mentioned earlier, there is some discussion about the use of MaxLength. So we plotted the use of MaxLength over time in the graph below:

Figure 5 — Use of MaxLength.

What the graph shows is the fraction of VRPs that have a MaxLength that is smaller than the prefix size. In other words, the fraction of VRPs that allows announcement of subprefixes smaller than the prefix covered by the VRP. The graph has two takeaways. First, the use of MaxLength is slightly more common for IPv6 than for IPv4 (not surprising given the larger address space; creating ROAs for every smaller subprefix would escalate very quickly for IPv6).

Second, while the use of MaxLength was declining until early 2018, there appears to be a trend change from then, with the number of VRPs that uses MaxLength growing again from about March 2018. We speculate that this is a side effect of operators actually starting to filter routes based on RPKI validation results. Consider: if you have a ROA for a /16, but you announce smaller subprefixes (for example, multiple /22 prefixes), and you do not have ROAs for these smaller subprefixes, then any operator that performs RPKI-based filtering will reject the announcements for the smaller prefixes. And there are two ways in which you can fix this: you can create ROAs for all of the smaller prefixes, or you can set the MaxLength correctly in the ROA for the /16. We speculate that the slight but persistent rise in the use of MaxLength from 2018 onward can at least partly be explained by operators choosing the second solution to their announcements getting rejected by operators that filter based on RPKI.

To make the trend towards smaller prefix sizes even more visible, we end with two plots showing the average covered prefix size in VRPs and the average MaxLength in VRPs over time:

Figure 5 — Average covered prefix size and MaxLength in VRPs (IPv4 left, IPv6 right).

We leave the explanation of the spikes in 2017 as an exercise to the reader 😜 (hint: scroll back to the part about the growth of RPKI use at the top of the post).

A different look at growth: coverage and accuracy

That RPKI is growing is good news for the Internet. It significantly reduces the risk of route hijacks. What we have shown, however, is just that RPKI is growing, but not what that really means. To explain this better, we end this blog by looking at two things: coverage and accuracy.

To study these two aspects, we use a third open source tool from NLnet Labs: the Secure Routing Stats. This tool can take a set of VRPs that come out of Routinator, and combine this with statistics from the Number Resource Organization about which prefixes have been assigned to which economies, and up-to-date information about which prefixes have been announced in BGP from, for example, RIPE RIS or another source such as Routeviews (which is the source we used). It can then output two things:

Coverage: The fraction of announced prefixes in an economy that is covered by a VRP.
Accuracy: The fraction of announced prefixes in an economy that is covered by a VRP and considered valid.

So, using a combination of Ziggy, Routinator and the Secure Routing Stats tool, together with the archived RPKI data from RIPE, NRO data and full table dumps from Routeviews roughly covering the past two years, we analysed coverage and accuracy.

Let’s start with coverage. The animation below shows the coverage per economy over the period we analysed (click ‘play’ to view the animation, date displayed at the top):

What the animation makes very clear is that coverage is growing; it’s growing rapidly, and growth is accelerating as we get closer to the present day. The fact that the entire map is showing increasingly darker colours means that coverage is increasing all over the world. And that is good news; it means an increasing number of Internet route announcements are protected by RPKI. What is also clear from the animation is that the growth rate differs by region. Latin America shows a high coverage from the get go, with a notable absence of Brazil, which is working hard on deploying RPKI in its National Internet Registry. Europe and the Asia Pacific region show increasing coverage over the entire period.

(although I might borrow that particular one for ARIN’s present RPKI user-interface…😜)

— John Curran (@jcurranarin) July 3, 2019

The lowest growth rate is observed in the ARIN region; we believe there are two reasons for this — the number of prefixes assigned to this region is historically very high (a leftover from the early days of the Internet), meaning there is also a lot of ground to cover in terms of RPKI deployment. Second, the interface that ARIN provides for deploying ROAs is somewhat harder to use than those offered by APNIC and RIPE, for example, which may impact deployment (the tweet shown alludes to calling the ARIN RPKI UI ‘Magnificent Desolation’).

And we end with — what we believe to be — the most exciting animation: the one showing accuracy. This animation shows what fraction of announced prefixes are valid announcements, that is, they are covered by a ROA and validate correctly. We view accuracy as a measure of the quality of RPKI data; if the accuracy is high, it is safe to use RPKI data for filtering, as you are unlikely to filter legitimate announcements. Conversely, if accuracy was low, filtering would probably be unwise as you might drop legitimate, but poorly managed routes.

The animation below shows the development in accuracy for economies that have an accuracy of 90% or over. We specifically picked this cutoff, as accuracy is generally already quite high, and we wanted to show that it is still improving. So play the video below and see for yourself:

“What this data first and foremost shows is that RPKI is ready for the big screen”

We think the animation is pretty self-explanatory: the quality of RPKI data has grown from ‘already quite good’ to ‘very high’ over the past two years. And that brings us to our main conclusion: what this data first and foremost shows is that RPKI is ready for the big screen. The quality is now so good that there is really no reason not to protect your network from accepting route hijacks, so we urge operators to go ahead, and start filtering RPKI invalids!

Running Ziggy on your own system

As we said earlier, we released Ziggy as an open source tool for researchers and other curious folk. Our Medium post explains how you can run Ziggy yourself to get the full set of verified prefixes for a date of your choice.

This article is adapted from the original post on the NLnet Labs Blog.

Roland van Rijswijk-Deij is Principal Scientist at NLnet Labs and Assistant Professor of Computer Network Security at the University of Twente.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

First things first: what could we validate?

Zooming in: deployment of RPKI in real-world practice

A different look at growth: coverage and accuracy

Running Ziggy on your own system

Leave a Reply Cancel reply