APRICOT 2015: RPKI deployment session

Yesterday at the APRICOT meeting, we had a session on the real-world experiences of deployment of RPKI. During this session, JPNIC launched their RPKI service as a pilot, integrated into the APNIC RPKI system as a parent.

Fakrul Alam from bdHUB presented one week’s analysis of RPKI status. This is based on experiences during BDNOG 2, in Bangaldesh in November of 2014. Fakrul was interested in the adoption rate, comparisons with other economies, problems and issues they found out working with RPKI.

Most of the statistics being presented are aggregated to the RIR level, and compare between regions. Fakrul was interested in the per-country information. He used the surfnet tool, and specifically this for his economies data.

Compared to the APNIC adoption rate overall (0.93%) BD is currently on 4.67%, so significantly above the average of our region. However around 1.5% of the Route Origin Attestation (ROA) are incorrect, which he feels is a high error rate. This appears to reflect BGP routing using more specifics, which do not match the ROA maximum length. This seems to stem from fundamental routing mismatches inside the Bangladeshi routing economy. A mix of more specifics (all the way to a single host /32 route!) are being announced by the address holder, but more interesting is the question of who is announcing more specifics.

Fakrul used ExaBGP and GIXLG to download peering data, and hold in an SQL database (http://github.com/Exa-Networks/exabgp)

The advantage of holding the BGP data as SQL is that Fakrul was able to use SQL query statements to select AS paths, with more specifics than a /24, and find what AS Paths were announcing the prefix outside the ROA allowed origination.

Fakrul found some of these routes were attempts at DDoS mitigation, being applied in BGP in ways which leaked into the public routing domain. A bad approach (reminiscent of the Pakistan YouTube incident) and he worked to try and get these resolved inside the BD routing community.

Tomoya Yoshida from JPNAP/Internet Multifeed spoke about RPKI deployment in Japan. The Japanese routing community has noted the low rate of RPKI uptake in the AP region, and JP specifically, compared to the RIPE Region. They felt a strong motivation to increase the rate of adoption, and explored systems and operational issues in a July 2014 workshop. JPNAP and JPNIC decided to operate a ROA cache and provide a feed via the rpki-rtr protocol. This raises some questions in my mind about the ‘outsourcing’ of validation, and trust in the integrity of the feed.

Some visualizations of the distribution of ROA prefix sizes were shown.

Operational experience with RPKI BGP implementations has shown the software is still immature: Juniper for instance, enables debug ports on TCP (2222) which can crash the router if exposed to the public network, and need to be ACL restricted. A rather unexpected and unfortunate outcome.

The current implementations of RPKI-RTR protocol does not always support TLS channel security, so the step of outsourcing ROA validation depends on an insecure channel to fetch the data. And there are some issues in the initial validation state in Cisco BGP, where all routes are initially marked VALID until the first ROA is received, and then they become UNKNOWN, instead of being UNKNOWN (and therefore not implicitly trusted). These bugs present significant roadblocks to deployment.

Matzusaki Yoshinobu (‘Maz’) presented some thoughts on RPKI. He discussed the ‘heuristic’ approach to understanding mis-configuration we use at present, and how RPKI can help prevent this kind of systematic fat-finger problem, but on an assumption of wide deployment of RPKI: once its widely deployed, a realistic chance exists that mistakes will be caught and can be dealt with properly, rather than depending a sequence of conversations with upstream providers, and combing through WHOIS data.

Maz next discussed the provisioning problem. Even noting issues with BGP security, it’s worth remembering RPKI can help identify who has the right to authorize customer nets being announced, and that provisioning routers depends on trust in the people who claim to control address resources and request routing by a provider.

It’s highly likely that the first entry point an ISP should take, as a validator of RPKI, is to look at the information available in RPKI repositories, and reflect on them. Understand the behaviour they imply on a router, without (yet) embedding dependencies into the router. Look at the invalid outcomes.

Maz did note current filter based methods can sometimes lead to unexpected outcomes as a side effect of having certain filter rules badly encoded: you can accept unexpected things. ROA mechanisms appear less prone to this risk, accepting only the specific origin-AS and prefixes rather than being a Boolean logic ‘any rule’ permit system which filters seem to become.

Before the panel session started, Taiji Kimura from JPNIC discussed deployment models. Public cache servers, distinct from locally constructed information bases present different benefits and outcomes. RPKI is a digital signature system, users should check the signatures by themselves. A local cache is therefore clearly the preferred deployment model.

But the community appears to expect services to be provided as a facility. All the panelists agreed it has little or no validity in actual route security, but can help with understanding the ecology of RPKI. Geoff Huston observed ‘you shouldn’t outsource validation’ and have a path between your trust and the data, which can be subverted. Trust is being minimized, to try and avoid assumptions of misplaced trust. The cache, in as much as it helps, should be a pre-validated cache of ROA and related data so a single fetch rather than a pre-validated feed, and should be checked locally to construct a validation model after its been checked.

As Geoff said RPKI is not simple, easy or cheap. But if you think there are risks in sending information over insecure networks, and the risks of sending customer data over insecure networks presents potential liabilities, you need to look into the systems and explore how to run this system for yourselves.

Taiji demonstrated JPNIC’s investment in Internationalization of the RPKI web interfaces, to provide in-language support.

Taiji was also proud to announce that JPNIC launched their RPKI service, at 3:00pm local time during the panel session and is now offering RPKI services in the JPNIC portal to address holders in Japan. This should lead to a step-function increase in ROA deployment in the Asia Pacific region, with more ISPs now enabled to implement RPKI for their originated routes.

The panel discussed the relationship of RPKI and IRR. Although many people see conflict between the two information spaces, it was observed that RPKI provides trust, irrespective of how its expressed. There is no reason not to explore the inclusion of trust from cryptography into the WHOIS/IRR system. RPKI does not entirely replace IRR, they fulfil different needs entirely, but adding more complexity to the system is not helping entirely, so we need to understand what we want to do: Do we want a cohesive system? What tools do we want?

The panel finished with a call for improved training for staff, to ensure Routing Engineers understand the issues. This is going to be vital as a precursor to deployment, to allow staff to understand technologies, measure systems behaviour, before configuring high-risk changes into the global routing framework.

Configuring router dependencies on RPKI is an act of a consumer of the system, even if you are not a producer of a ROA. This demands a local cache to be really secure. But the other side, publishing a ROA means investing in being a Certification Authority. This is a large overhead, and demands care. Leaking the private key has consequences and risks, running the system imposes constraints and obligations. Smaller businesses may wish to run as an outsource, and the RIRs are willing to act as the front-end, as JPNIC can now do, to offer a service. An outsource, with a trusted agency. But other agencies will decide address management is their core business, its intrinsic, and will invest in the staff and technology they need. It’s a local decision. It’s a classic business risk decision.

You can watch the full session below.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply Cancel reply