Improving the resiliency of RPKI Relying Party software

By on 9 Dec 2021

Category: Tech matters

Tags: , , , , , ,

Blog home

The Resource Public Key Infrastructure (RPKI) allows holders of Internet number resources (INRs) to make verifiable statements about their resources. In its current form, it allows the holder of a prefix to make Route Origin Authorizations (ROAs), that tells others which Autonomous System Number (ASN) is authorized to act as the origin for a prefix. The RPKI can be roughly divided into three parts:

  1. The Certification Authority (CA) and repository, which create and host the RPKI information.
  2. The Relying Party (RP), which collects and parses the hosted information.
  3. The router, which uses the parsed information to label BGP advertisements as valid (when it matches the information in the ROAs), invalid (when there is a ROA, but the information does not match what is advertised), and unknown (when no ROA exists for this prefix).

The security the RPKI provides itself is clear, but for the last couple of months I have been looking into the security of the infrastructure itself. Specifically, I asked myself a very simple question — can I disrupt RPKI RP software by introducing a malicious CA/repository into the tree? 

The short answer to this question is — yes. For the long answer, a bit more context is needed.

Read: RPKI ReadTheDocs

Let us first examine what the RPKI is conceptually. The RPKI is a certificate tree, with the Regional Internet Registries (RIRs), AFRINIC, APNIC, ARIN, LACNIC, and RIPE NCC, as the five common roots. The exact structure below each RIR depends on the RIR, but in most cases, the RIR offers two options:

  1. It hosts ROAs for its customers directly in their own repository.
  2. It allows the delegation of the customer’s INRs to their own repository. The customer can then host their own repository, sign their own objects, and also delegate (part of) their resources to others. For some RIRs there may be a National Internet Registry (NIR) as an intermediary.

Read: APNIC now supports RFC-aligned ‘publish in parent’ self-hosted RPKI

How does this work in practice? All objects in the RPKI are signed. Certificates and signed objects follow the X.509v3 standard. This makes tampering with the files impossible. A list of files along with their hashes are stored in a manifest, which is also a signed object. This makes it possible to check that all files are present. The files themselves are served over either rsync or RPKI Repository Delta Protocol (RRDP).

The latter is a protocol specific to RPKI. RP software retrieves the data from the repository of the root and validates the content. Based on the certificates it finds in that repository, it retrieves the repositories mentioned in those certificates, and repeats this process until all the data has been collected. This creates the tree structure. The collected data is then processed and sent to the router.

Becoming a rogue CA

Before looking into what you can do once you are part of the tree, it is worth looking into how difficult it is to become part of the tree. To become part of the tree, some party needs to sign our certificate that delegates some resources to us. This is rather simple — a provider independent IPv6 prefix at an RIR like APNIC, ARIN, or the RIPE NCC already suffices, and at least for the RIPE NCC, the process to set up delegated RPKI is fully automated. In some cases, merely having control of the repository (thus not even being able to sign objects) suffices.

To disrupt RPKI, you have to break one of the assumptions made by RP software, which is not enforced by the software itself. This includes assumptions about RRDP, which is based on XML and HTTPS, which in turn, inherit certain characteristics of HTTP and TCP, but also assumptions about the filesystem and tree structure. To that end, the Relying Party Resiliency Platform (RPRP) was created. The RPRP is a testbed that allows for quick prototyping of potential vulnerabilities. CAs and repositories are created on demand, and a Trust Anchor Locator (TAL) allows for testing real-life scenarios without disrupting RPKI for anyone else, by creating our own evil root that we run alongside the ones provided by the RIRs.

But how do you go about determining what to test? I based myself on OWASP for XML, HTTP, and TCP, as well as similar vulnerabilities found in other protocols and technologies. For instance, what can I do when I just have access to the repository? I looked at what could be done using RRDP without needing any signed objects; what can I do when I make the server just respond very slowly, or make the connection exceptionally unreliable? RP software tends to use existing libraries, and the default library behaviour may not be suitable for RPKI.

I have tested how RP software handles different status codes, such as a 307 redirect, or the 429 ‘Too Many Requests’ rate limiting response with a Retry-After header, how RP software handles different XML documents, and whether RP software can handle large or malformed files. I also looked at what would happen with several CAs and repositories, and several ROAs. Not every test was successful; however, on the whole, vulnerabilities have been found in all RP software — some of these issues are even in the protocol itself.

The tests are labeled A to O, and I will quickly go into what I did for each test:

Test A

Here I created a certificate chain without end. The root would include a certificate to a child, which included a certificate to a child, without end. As our repositories are created on demand, this is actually infinite, not just very long. Some RP software limited the length to a specific maximum length, such as 100, 32, or 12, but some implementations allow for it to effectively stall endlessly retrieving new data.

Test B

I wanted to test what RP software would do when presented with a 429 status code and a Retry-After header. This status code is relatively new, and tells the client to retry the request at a later time, or after the time specified in the Retry-After header. I wanted to see whether any RP software uses this header, and whether they blindly follow the value from the Retry-After header. So far no RP software supports this header.

Test C

Another status code HTTP specifies is the 3xx range, which tells the client that the resource is at some other location. I wanted to see whether I could make the client redirect, and redirect again, without end. Not all RP software supports redirects, but those that do, have a maximum number of redirects.

Test D

HTTP has a Content-Encoding header with which the client can specify whether it supports compression. Some RP software supports this as the data over RRDP can be compressed quite efficiently, greatly reducing the amount of data needed. The most popular compression algorithm supported is gzip. A repository can abuse that by sending what is known as a gzip bomb, a file that is small over the line, but when extracted increases many magnitudes in size. Several RP implementations run out of memory when encountering a gzip bomb.

Test E

The data in a repository has a certain size, and the repository and client can handle a certain bandwidth. The speed to download the data from a repository can vary greatly. A repository can abuse this by hosting quite a lot of data, and providing that data at a speed of, for example, three bytes per second. Some RP software requires a minimum bandwidth, or has a maximum transfer time, but there is software that would wait for all data to be transmitted, even if that would take several weeks.

Test F

Whenever RP software encounters a ROA — a file with a .roa extension — it expects the structure that belongs to a ROA. However, I can simply encode any data I like, and give it a .roa extension. For this test, I encoded an ASCII NUL character as a ROA. Most implementations ignore this, or mark it as broken in the logging, but some implementations also crash when encountering this malformed object.

Test G

This test is the Billion Laughs Attack. What it entails is abuse of XML entities, and as RRDP is based on XML, this applies to RRDP as well. We define an entity ‘lol0’ as ‘lol’. We then define an entity ‘lol1’ as 10 times the entity ‘lol’, an entity ‘lol2’ as 10 times the entity ‘lol1’, and so on. This then expands into billions of ‘lols’, hence the name. The textual form is small, but when actually expanding these entities, it becomes massive. Several RP implementations run out of memory and crash.

Test H

Here I expand on A — rather than have one child, each repository has 10 children. This means that the amount of total repositories starts at 1, then 11, then 111, and so on. This blows up quickly, and makes it so that RP software needs to evaluate years of repositories, even at the depth limit currently set by RP software. This affects all current RP software.

Test I

This is a ROA that is not a valid ROA (Like F). This time, the structure is correct, but the bounds within that structure are exceeded. For example, an IPv4 prefix is, at most, 32 bits. Here we create an IPv4 prefix with 60 bits, which is nonsensical. Some RP software crashes when encountering this.

Test J

This repository contains several ROAs, all for one prefix and a different ASN. The aim here is to not so much sabotage the RP software, but the router. The router stores the entries in a table, and the table lives in memory, which is finite. Here we attempt to overflow the memory by creating several entries — theoretically up to 232 entries are possible. Currently all RP software accepts all data, and passes it to the router as is.

Test K

Much like J, we can do the same trick with prefixes. Imagine we have a /48 prefix, we can split that into 280 /128 prefixes. Unlike J, we can all fit this in one ROA. All RP software accepts all data, and one implementation even gets stuck during the processing phase, whereas all the other software passes it through to the router.

Test L

Here I did something quite simple: What if I just serve a lot of data? Luckily, the RRDP protocol allows for the specification of absolute URIs, so I linked that to a large file usually used as a speedtest hosted externally. The contents of the file are random, and contain nothing useful. Some implementations ran out of memory, some ignored the file.

Test M

XML contains some other interesting possibilities, such as external entities. I wanted to see whether I could modify the URI the RP software requested based on information from the system the RP software itself ran on, to exfiltrate data. External entities on XML attributes are not allowed according to the specification, and all implementations adhere to that, or disable external entities entirely.

Test N

I can specify what my files are called, and where they are located. I can make this path far longer than is realistic, trying to exhaust memory and or the file system. Some implementations did run out of memory during this test, but most were able to process it without issue.

Test O

The last test I created was again regarding paths and filesystems. On most UNIX-based systems, there are two special folders in each folder: '.‘ and ‘..‘. The former to stay in the current folder, and the latter to go one folder up. I wondered — can I somehow make the RP software write files outside the directories they are supposed to end up in, by using a path rsync://example.org/../../etc/not-a-virus? Most RP software rejected the path, but some did write files to random folders.

Resolving these issues

To resolve these issues, the National Cyber Security Centre of the Netherlands (NCSC-NL) coordinated a Coordinated Vulnerability Disclosure process with the RP software developers. Their advisory is available (currently in Dutch only).

The problems that could be fixed without making amendments to the protocol have been fixed in Routinator by NLnet Labs, OctoRPKI by Cloudflare, FORT Validator by NIC México, RPKI-Prover by Mikhail Puzanov, and rpki-client by the OpenBSD team. I want to thank the developers, as well as the NCSC, very much for their work on keeping RPKI secure. The RIPE NCC RPKI Validator 3 was already deprecated at the time of notification, and as a consequence, will likely not receive any updates regarding these issues. Should you still be using it, consider using something else, as you are currently vulnerable.

Read: What’s your wish list for the perfect RPKI validator?

Unfortunately, not all of the issues discussed above can readily be resolved by changes to the RP software. Specifically, issues H, J, and K should be addressed by other means. To this end, I have also written an Internet draft that proposes a possible solution to these issues.

Takeways

One of the most important takeaways from this is that a repository and CA can be a malicious party, and cannot necessarily be trusted, just like BGP advertisements cannot be blindly trusted. If a party has a reason to intentionally send out a malicious BGP advertisement, then they might as well have a reason to disrupt RPKI.

If your organization uses RPKI RP software, then please update your software, and keep it updated.

Koen van Hove is a researcher at the University of Twente on the topic of network security, with a focus on DoS-risks.

This post was originally published on RIPE Labs.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top