Route Origin Validation (ROV), based on Route Origin Authorizations (ROAs), is increasingly being deployed by registries, organizations and users worldwide in an effort to reduce the risk of problems associated with network misconfigurations and mistakes.
The Japan Network Information Center (JPNIC) has been providing ROA services to its members since 2015. After connecting with APNIC’s Resource Public Key Infrastructure Certificate Authority (RPKI CA) in 2017, BGP routes from Japanese ASes were able to be validated worldwide.
Figure 1 — CNNIC, JPNIC and TWNIC all operate their RPKI CA under APNIC’s RPKI CA.
A few months after connecting with the APNIC RPKI CA, we were made aware of a reachability issue by someone in our community who was operating a home network connecting to an ISP in Japan. They were not able to access a website in Europe from their network via IPv4 but were able to via IPv6, or by using a mobile router. Being technically savvy, the user conducted a traceroute that showed the issue was related to the AS accommodating the website (depicted in Figure 2 as ‘Website AS’).
Figure 2 — AS hops from the ISP to the website.
Upon contacting their ISP with the issue, they were first recommended to reboot their router — a usual first step towards troubleshooting similar issues with CPEs. Needless to say the issue was not resolved. So the ISP (surprisingly) investigated further by asking the website owners if they were experiencing reachability issues.
The ISP’s technical assumption, based on the customer’s traceroute test, was that there was either a reachability issue between AS#5 and Website AS or either of the two were deploying a BGP prefix filter. Before contacting the owners of AS#5 and Website AS, because they didn’t have a direct relationship with them, they asked AS#1 to #4 if they had any BGP peering issues with either of them, all of which responded, “No problem found for the prefix, so no action required.” As you may guess, they had checked reachability by using their own IP addresses, which were different than the source address of the Japanese ISP.
So the ISP reached out to the owners of AS#5 and Website AS via email (which they found in Peering DB) and quickly discovered the cause of the issue — the BGP prefix for the ISP had a different prefix length from the ROA it created several years before so it was being dropped by Website AS, which was implementing ROV. The prefix length had been changed for operational reasons but because it was so long ago, no one at the ISP could remember it being done, so they didn’t think it was a probable cause.
By fixing the maximum prefix length in the ROA, reachability was restored.
Lessons when it comes to ROA
The lesson of this story is that humans and organizations often don’t have the best memory, particularly when it comes to things that are often set up once and then forgotten about. It wasn’t until BGP routes from Japanese ASes became validated worldwide that the ‘misconfiguration’ became apparent, which begs the question — how many other instances are there that are yet to be diagnosed or fixed?
More so, this story highlights the care needed when deploying and maintaining ROAs. Take the previous story as an example:
- The IP address holder may configure ROAs differently from actual BGP routes.
- The end user will face unreachability issues without any sign or alert.
- Different players are required to solve the problem — only BGP operators can know the reason and only IP address holders can fix the problem.
To avoid this situation from happening, it’s important for those wanting to deploy ROAs that they know how to do it and maintain them correctly. And for network engineers working at ISPs, it’s important to understand the signs of misconfigured ROAs — when troubleshooting an unreachable prefix for some specific routes, remember to investigate the origin validation state.
One final lesson also worth noting — one that is in relation to the title of this post (see below) — is how important it is for us to work as a community when it comes to diagnosing and responding to these routing issues. This seems equivalent or more important than ever in respect to such global certification frameworks, which ultimately rely on the majority of us all to (correctly) deploy it.
Taiji Kimura is a researcher at JPNIC and Keio University, Japan.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.