It is very tempting to look at the RPKI environment — which needs so-called ‘relying parties’ (the ones who consume the PKI products and have to make decisions based on validating the cryptography) to collect, process and collate a huge amount of information — and say, “Why don’t I run this in the cloud?”.
It can be done. For example, this this RPKI-RTR server is being run for 12 cents a year on the cloud.
Well, I am here to say it’s fine to do this for experiments, to find out how things look, but please don’t do this for your BGP route filtering! Let me explain.
Where does truth come from?
Relying parties are people who take assertions made by private key holders, and perform the test ‘Is this cryptographically valid?’
Additionally, they have to check if the policy constraints defined for validation in the PKI system have been met. In APNIC’s case, it is that all the Internet number resources and the signed objects are consistent and conform to the information model. Only then can they do the ‘purposeful’ part of the system as a whole.
What does this mean for BGP? Well, there are two essential considerations here:
- You have to check the PKI data yourself. Nobody is meant to tell you “this is valid” because that externalizes your trust to an external agency, and the basic model here is that you shouldn’t do that.
- What it means is locally defined: it’s your BGP and your routing which is under consideration, in terms of what you see, and what you do with it, which differs from anyone else because IT’S YOURS.
What could possibly go wrong by doing this in the cloud?
Tempting though it is to do this in the cloud, the problem here is two-fold:
- Do you really know that nobody else has altered the state of validation in this external hosting? If you run a private cloud node, you have at least some assurance with the cloud service provider that your data is yours. Nonetheless, you do need to demonstrate you have unique control of the data you are now assuming is ‘valid’.
- What happens if you can’t reach this service? This is one of the real problems: If you accidentally lose connectivity to this service, what does it mean for your routing? If, for instance, it just ‘goes away’ then your RPKI information base is not visible. What does your local route filtering do? How are you told this? (operations alarming?) And does it break routing?
If you have a NOC, and core assets inside your routing systems under 24/7 monitoring, it reduces your risk surface here IF you operate at least one instance of the RPKI-RTR feed inside your core routing architecture because you cannot be denied the information if the external service fails.
Specifically though, what happens if the cloud host has a bad RPKI state? Let’s just imagine for a minute, you have decided to implement ‘drop invalids’ in your RPKI decision logic, and, for some reason, the cloud host becomes ‘invalid’. What is going to happen to your RPKI-RTR feed? You’re going to drop it because you can ‘prove’ its invalidity.
At least one cloud provider did configure their RPKI state with external dependencies, and incurred routing loss to their systems when the external RPKI-RTR feed did not work properly. They have since reconfigured to internal dependency, but it shows that even at scale, you cannot always rely on the goodwill of others to compute your own routing fate.
Don’t depend on anyone else for RPKI validity
So, in summary, although it’s useful learning to put information about RPKI into the cloud, and to see what other people see, you really do need to think about your routing dependencies on external parties, when you construct your own routing intent.
Don’t depend on anyone else for RPKI validity — run your own validator, and run it inside your own routing control.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.