2021 was disappointing on many fronts. Pandemic still here? Yes. Global travel restrictions in place? Yes. Still working from home? Yes. There was a bright spot; pandemic or no pandemic, network operators continued to deploy Resource Public Key Infrastructure (RPKI).
If the pandemic has taught us anything, it is that the resiliency of our networks — which enabled widespread work from home — is still paramount in keeping the world as a whole moving. The resiliency we relied upon so much, in part, comes down to the stability of global routing; ensuring that our packets reach their intended destination without being rerouted (or hijacked).
As APNIC was wrapping up training and remote technical assistance at the end of last year, we reviewed the previous 12 months to see if there were any major changes in the RPKI framework and uptake within the Asia Pacific region. What did we find? Well, the news was a bit mixed.
First, we need to remember why we are doing this whole RPKI thing. As a network operator, I want to make sure that my routing intentions are laid out for all the world to see, and I also want those intentions to be able to be validated. Enter the RPKI framework.
RPKI in a nutshell
As an operator of a network, I can attest via a cryptographic certificate that I want my prefix X.X.X.X to be originated from Autonomous System Number (ASN) YYYY, and any other appearance of my prefixes on the Internet should be treated as invalid. I can also multiplex these prefixes by specifying a maximum length, such as 188.8.131.52/21 Max Length /24. This would cover the /21, 2 x /22, 4 x /23 and 8 x /24. That is the Route Origin Authorization (ROA) process.
The second part of the process is for other operators to act on my intentions. If they see my prefixes being used in a manner not specified in my ROA, then they should not accept those prefixes into their router’s forwarding table. This makes up the second part of the framework known as Route Origin Validation (ROV).
The third piece to this puzzle is validators. These are standalone pieces of software that handle the cryptography parts of validating resource certificates and produce output that can be interpreted by my routers, thus removing workload from my routers, and letting them do the job they are best at — sending packets from one place to the other.
Yes, this is a simplified take on framework, but rather than rehashing what others have written about in detail, let’s look at what we saw in 2021.
Compared to recent years, the volume and wider impact of routing incidents (either malicious or accidental) have reduced. If you have been to any APNIC Academy RPKI training sessions, you will know that we start with a review of routing incidents going all the way back to 2008. Last year, we only observed four hijacks of note. All of them appeared to be operator error, and due to the default free zone dropping invalid routes, the spread of these events tended to be limited to direct peers not implementing ROV (or any other good filtering practices), or to peers on Internet Exchange (IX) fabrics. While these events are annoying due to the unintended routing paths, the frequency and impact are improving.
The numbers — ROA
Over the last couple of years, we have seen a rapid increase in the size of the global routing table.
Globally, in 2021 we saw the IPv4 table increase by approximately 6.3% to 922k routes. But on the RPKI side, we observed a 9.9% increase in valid routes. This tells us that there are more networks out there validating their resources, which is good news!
On the downside, we also saw an increase in the number of invalid ROAs up to 0.2% from 0.07% at the end of 2020. Most of these invalids are due to violations on Max Length or Invalid ASes.
Read: Rise of the invalids
The other state, Not Found, where no ROA exists for a resource, has pleasingly reduced by 5.26%, but why hasn’t this decreased in proportion to the increase in valid prefixes? There could be multiple factors, but the primary theories are new allocations not being signed by the holders, or possibly de-aggregation of the current IPv4 space.
If we focus on the Asia Pacific (AP) region, we see a net increase of delegated space of around 0.16% (around 2,782 /23 prefixes). Valid ROAs in the region have increased from 40.97% to 50.55%, which is comparable with global trends. Invalids in the AP region are up by only 0.11% (compared to the global average), but Not Found ROAs have decreased by around 10%. Some of this can be attributed to training and reinforcement by APNIC and others in the industry, some large-scale aggregation within some subregions, as well as upstream providers encouraging their downstreams to sign their prefixes before they implemented ROV themselves.
If we look at the subregions within the AP region, the strong performers have been South East Asia and Oceania. The increase of valid ROAs within those subregions helped lift the average.
Looking at the invalids per subregion, three subregions showed increases, with the exception of South East Asia.
The Not Founds showed a downward trend across the region, with the key stand out being South East Asia.
If we look at the top 20 subregions globally for valid routes, nine of them are from the AP region.
So, who is doing the heavy lifting in our sector of the globe? The results may surprise you, but it’s important to remember that percentage isn’t everything. 100% coverage is much more attainable when there’s a /23 or /22 serving the entire economy, but this doesn’t discount the efforts of the smaller networks.
As you can see from the chart below, the big performer in ROA validity in the region has been Taiwan. We have also seen significant gains through community-based efforts in Pakistan, the Philippines, and Malaysia.
Smaller economies have been also doing their bit — size isn’t everything!
The numbers – ROV
Uptake of ROV is difficult to measure with several factors affecting measurement, with the main one being upstream providers and their filtering of invalids.
What do we mean by this? Let’s look at some images. Figure 7 somewhat depicts the state of ROV on a global scale in June 2020.
Figure 8 shows the previously mentioned African areas have changed back to red, and there is an interesting outlier in Papua New Guinea (PNG). Digging into PNG’s numbers tells more of the story. PNG is seen as ~80% filtering. That 80% appears to transit through AS17828 (Dataco), who in turn upstream to AS4826 (Vocus). Vocus is filtering the invalids.
So, an end network appearing to filter invalids is purely due to their upstream’s routing preferences. As an aside, I had another look at PNG in mid-February 2022, and their ROV dropped to around 50%, and this appears to be due to AS17828 preferencing their traffic towards another upstream who is not filtering invalids.
The lesson here is — don’t outsource routing security to your upstream!
Lessons learned in 2021
What did we learn in 2021?
- Adjacency Routing Information Bases (Adj RIBs) are important
- NCSC CVE disclosures
- APNIC portal issues
1. Adj RIB
Around mid-April, we were assisting a Member, let’s call them ‘Party A’, with their RPKI deployment. We had finally reached the stage where ROV was turned on, and everything was going just fine.
Jump forward to August and one of their upstream providers (Party B) reached out with an interesting problem. Party B was seeing a BGP route refresh every 10 minutes from Party A that was causing issues with CPU spiking on Party B’s router. While troubleshooting, the problem was isolated to the Cisco device on Party A’s end. But why?
It looked like a bug in IOS-XE, but we were unable to isolate it. We knew that if we stopped RPKI to Router protocol (RTR) sessions, the problem went away, and we couldn’t replicate the problem in our environment with different IOS versions. Cisco TAC advised turning on soft-reconfig. Party B was running Juniper, so we discounted it as a Cisco bug.
Later, Randy Bush presented a Lighting Talk on RPKI RTR causing route refresh at RIPE 83, which sounded familiar. Randy Bush, Mark Tinka, Philip Smith, and Kayur Patel had discovered the same issue, and had taken it a step further. Their findings (detailed in this Internet draft) made sense.
When the Validated ROA Payload (VRP) updates on the router, this causes the router to process its best path policy selection (a change in route policy). On many platforms that have route refresh capability, the ADJ-RIB-IN is disabled by default (for memory conservation purposes), causing a route refresh to be triggered with peers. Turning on soft-reconfig enables the ADJ-RIB-IN and disables the route refresh process.
The findings from Randy and others, and the fix can be summarized as:
- Keep a full ADJ-RIB-IN, or
- If there is no ADJ-RIB-IN, then when BGP drops an invalid, keep the path, but mark it as dead, a minimal ADJ-RIB-DROPPED, or
- Do not run RPKI policy on any router that cannot do either of the above
2. NCSC-NL CVE disclosures
On 9 November 2021, several vulnerabilities relating to the RPKI framework and validators were released. Some sectors of the community were not happy about the short notice period given for the embargo, so they delayed the release until the serious vulnerabilities could be rectified. So, what did they find?
- Some validators ran as root
- Some crashed when presented with invalid ROA data
- Some crashed when the repository contained too many bits for the IP address
- Some had no bounds when processing infinite lengths of certificate chains
- Some had strange processing of time-out values
- Some were vulnerable to gzip-white-space attacks (causing out-of-memory crashes)
However, the attacker has to be somewhere in the chain of trust, and identifying attackers is straightforward. The major validator vendors released updated versions to fix the vulnerabilities identified, so make sure your validator code is up to date! Not all of the issues can be fixed at the validator level, so a new Internet draft has been proposed to add some additional extensions to the resource certificates to define a maximum number of child objects and VRPs, creating some bounding limits.
3. APNIC portal – MyAPNIC
If we can’t hold a mirror up to ourselves, then we are not being honest.
Referring to the AP-invalid ROAs, and their increasing nature, APNIC’s own portal, MyAPNIC, could well have been contributing to this issue, specifically the Max Length attribute. When creating ROAs, users were presented with a pre-filled Max Length that matched the prefix, shown in Figure 9.
After spending some time working with networks to fix their invalids, one of the common responses received was ‘Oh, I thought that was just a recommendation, so I clicked next’, which always leads to a long conversation with the Member about ROAs reflecting BGP announcements.
In November 2021, APNIC conducted a face-to-face RPKI training session in Perth. During the training, we asked participants to log into MyAPNIC to inspect their routes, make sure their route objects are correct, and that all ROAs are signed. One of the participants advised us that the import routes feature was showing a recommended route not originating from the participant’s network. Some quick research showed there was an active hijack occurring from AS25478 in Russia.
How are we fixing this? In the short term, check the recommended routes before clicking the import button and validate that data against routes you are originating. In the long term, APNIC will be implementing some backend validation to warn about the routes or filter them from view.
2021 was a big year. The Internet world is still ticking along, and we all need to do our respective parts to make sure it continues. Sign your resources, filter your BGP, patch your software, and be a good neighbour!
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.