Cengiz Alaettinoglu provides some insights to increase the SIDR model adoption in Part II of this post.
In Part I of this blog entry, I gave an overview of the IRR and the SIDR models for securing the Internet’s routing. In this part, I would like to share our insights on why the deployment of the IRR model has been low in the hope that it helps increase adoption of the SIDR model. As I mentioned in Part I, some of these challenges cannot be addressed using technology alone and need economic and social engineering as well.
Tragedy of the Commons
First, let’s look at the economics of deploying one of these solutions. Early deployments incur most of the cost, and benefits are not achieved until a critical mass is reached. For example, YouTube may register its RPLS route objects or SIDR ROA objects, but necessary filters may not be in place at the upstream provider of the Pakistani service providers, and hence YouTube routes may still be hijacked.
Unfortunately, both the IRR and the SIDR models suffer from this phenomenon. Geoff Huston, in his APRICOT 2015 presentation, identified this as the “tragedy of the commons.” That is, if every service provider optimizes the outcome for itself, we cannot reach the globally optimum outcome. Typical solutions to tragedy of the commons involve “regulation.” Since regulation is often undesired, what can we do to reach critical mass without it?
In IRR model deployments, we wanted an evolutionary path, not a revolutionary path to reach critical mass. On this front, we wanted to take advantage of RPSL’s heritage. RPSL is based on earlier work known as RIPE-81. RIPE-81 had route objects; however, it lacked a security model and expressive policy representation. RIPE, as one of the early IRRs, had many of these objects already registered. In the United States, the Routing Arbiter team (a collaboration between University of Michigan’s Merit Networks and University of Southern California’s Information Sciences Institute), which I was a part of, converted similar policy objects found in the NSFNet backbone network’s policy database into route objects and stored them in a new IRR called RADB.
Meanwhile, the Internet was going through a big commercialization transformation; from NSFNet being a single Internet backbone network we switched this to multiple commercial backbones and regional networks. As a result, many network operators changed their upstream service providers. Since these new service providers did not use or require registration of these objects, the objects in the IRR became out-of-date very quickly.
Ultimately, the data became stale because it was not used operationally. And conversely, it was kept up-to-date where it was used operationally. I was disappointed to learn that the SIDR model is already suffering from this stale data phenomenon. In his APRICOT 2015 talk, Fakrul Alam reported that more than 50% of new ROA objects registered in the APNIC database are already invalid. Some regions are doing better than others, but invalid data is present in all regions. We have to find a way to reverse this trend.
I think the only way out of this is the operational use of the data. If we turn on BGPSec today, we would be breaking the reachability of the invalid prefixes. I am not advocating breaking anybody’s reachability, especially not of the early adopters. This can be avoided with sufficient monitoring and warning of these invalid announcements before turning on such a switch.
If we don’t turn the switch on, the amount of stale data will increase. If we are ever going to turn it on, it is best to do it while the stale data is small.
Weak Security Model
RIPE-81 used two weak authentication methods: mail-from and unix-crypt. One could register objects by sending an email to a well-known registry mailbox. Mail-from, which is now deprecated, simply checked the sender’s email address against an allowed list of email addresses, and unix-crypt required sending the user’s password in the clear in the body of the email. We have solidified the security model with a public and private key pair method as well. However, we have not deprecated the old methods; we simply discouraged it and provided a transition path to the more secure method. After all, if an operator did not care to protect himself, it is his prerogative. Mail-from and unix-crypt were still useful against accidental misconfigurations. There was a social aspect of this choice that we did not anticipate; it gave the IRR model a bad security reputation and was used as an excuse against updating the stale data.
The SIDR effort definitely sides on the security side of this balance. However, as a result, it needs a database that starts from scratch. Fakrul also reported that ROA adoption had been less than 1% in most regions. LACNIC is an exception to this with an adoption rate of almost 25% with less than 4%of invalid data.
The IRR model uses Pretty Good Privacy, which is based on cryptographic signatures. These are based on a web of trust among service providers (where the public key of a service provider is signed by other service providers). The SIDR model uses X.509 based certificates, which are hierarchically assigned. In the SIDR model, it is possible to shut down a misbehaving service provider by revoking its certificates. However, some service providers worry that this feature might be abused. Use of certificates makes registries a relying party, which is an uncomfortable change to some registries.
Out-of-band verification and need for publishing policies
The IRR model uses out-of-band verification. That is, it relies on the IRR containing route and aut-num objects with accurate policies. This data is then analyzed and compiled into router configurations. All of this happens before any BGP message is received. When BGP messages are received, appropriate filters are in place to accept only valid announcements. That is, the announcements that would cause prefix hijacking or man-in-the-middle attacks can be filtered out. The system however requires registering accurate policies such as who the peers of each AS are and what routes are being exported and imported from them. Some service providers have privacy concerns for revealing this information. In reality, most of this information is already in the BGP routing tables.
The SIDR model on the other hand uses hybrid out-of-band and in-band verification. For ROA objects, it can use either in-band or out-of-band validation. For verifying what BGP AS paths are valid, it uses in-band validation using BGPSec. This replaces the need for registering policies with new validation machinery that is now part of exchanging BGP routes. This is a great benefit. However, it has a serious drawback. This in-band machinery needs to be updated each time a new kind of an attack is discovered. For example, when man-in-the-middle attacks surfaced, it was realized that BGPSec did not protect against them while the IRR model did. BGPSec is now being further extended to protect against some classes of man-in-the-middle attacks. We are looking at a standardization-implementation-deployment cycle of roughly two or more years. We will pay this penalty each time we face an attack we have not dealt with before.
When we worked on the IRR model, most invalid announcements, despite the great harm they caused, were accidental mistakes. Security incidents are becoming more frequent and definitely much more malicious. However, it is easy to look at these incidents as “somebody else’s problem.” It is easy to do that until you are the one being attacked, and it is too late at that moment to secure it. Hence, we must act and secure the Internet’s routing now.
I am disappointed to say I don’t have a recipe for success. I can only provide our insight and several trade-offs. As I said in the beginning of my entry, we are not simply dealing with a technical challenge but with economic and social challenges as well. I hope this entry helps address some of the social challenges by raising awareness of the importance of securing the Internet’s routing.
Andrei Robachevsky at ISOC and others are taking on the social challenge big time. He is bringing operators together and asking them to sign a routing manifesto known as The Mutually Agreed Norms for Routing Security (MANRS). Andrei, in his APRICOT 2015 talk, states that by participating in MANRS, a service provider commits to best practices such as preventing propagation of incorrect BGP routing information, preventing traffic with spoofed source IP addresses, and agreeing to coordination and collaboration among participants by keeping their contact information and policy objects accurate in registries. He has already signed up many service providers around the world to participate and is looking for more service providers. Both he and I hope that the more service providers sign up, the more adoption will accelerate.
Regional and local registries have been big advocates of deploying security models as well. They provide tutorials on the subject during network operators meetings such as APRICOT, APNIC, RIPE and NANOG. These materials are also available online.
On the economic front, if we are dealing with the tragedy of the commons phenomena, do we need regulation? Or can we reach critical mass with social advocation and arm-twisting? The Internet does not have a central governing body that regulates (though some international organizations are seeking this authority). Regulation often slows down innovation; and because of this I’d rather avoid it. However, either we reach critical deployment of a security model, or we reach a critical number of malicious attacks. If we reach the latter first, I suspect regulation might be on our horizon!
Personally, I would like to see the SIDR model succeed. The IRR model is 20-years-old now. It is older than the World Wide Web. It has not been adopted well and is full of stale data. However, the SIDR model’s success relies on the feasibility of running BGPSec in routers. Some worry about the cryptographic computational needs of running BGPSec and still consider the IRR model as the viable alternative. I would like to see if the issues around BGPSec can be fixed before we do that. If we cannot fix them, we need to see if we can perhaps build a hybrid model, or, if we need to enhance the IRR model, bring it to the 21st century.
End of Part II
Cengiz Alaettinoglou is currently the CTO at Packet Design and is working on SDN analytics and developing a prototype of a Network Access Broker. He is a widely published author and popular lecturer and was co-chair of the Internet Engineering Task Force (IETF) Routing Policy System Working Group.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.