The Border Gateway Protocol (BGP) has played a key role in sustaining Internet growth since its commercialization in the early 90s. Despite turning 30 years old last June, its current version (RFC 4271) is still recognized as the de facto standard inter-domain routing protocol used by Autonomous Systems (ASes) to exchange routing information with each other.
The key feature of BGP is its flexibility: it allows network administrators to implement the most variegated import/export policies to reflect traffic engineering decisions or economic agreements between BGP neighbours. This flexibility, however, comes with a cost.
It is a well-known fact that BGP is not immune to routing incidents. BGPStream estimated that hundreds of incidents happened in each month of 2019 (Figure 1), with a peak during June when the famous leak involving the Cloudflare network occurred.
The main problem is that BGP was admittedly not designed with security in mind. It lacks an intrinsic mechanism to secure routing — that is, to authenticate the content of BGP updates — and therefore, it is prone to attacks and misconfigurations such as hijacks and route leaks.
In this post, I will share a real-world route leak that happened in 2019 that we at Catchpoint have analysed to understand how to mitigate such things from happening again. The incident involved a Swiss data centre co-location company called SafeHost and had a huge impact in the news mainly because it involved China Telecom, even if not as the root cause of the leak.
A more complete analysis of the security problems of BGP in 2019 can be found in the Catchpoint blog series.
Big trouble in little Switzerland
During the morning of 6 June 2019, SafeHost (AS21217) announced to its provider China Telecom (AS4134) more than forty-thousands IPv4 routes learned from other peers and providers, creating a perfect example of a route leak. The leaked routes involved prefixes already present in the global BGP table as well as prefixes not already present.
In turn, China Telecom accepted these routes and propagated them to its neighbours, spreading the leak on a global scale. Leaked routes thus reached ASes connected to public route collectors (collector peers), deployed by the University of Oregon (Route Views project) and by the RIPE NCC (Routing Information Service), which dumped the routes in MRT format and made them publicly available on their respective websites.
|184.108.40.206/19||61832 2914 4134 21217 21217 21217 21217 21217 21217 25091 5568|
|220.127.116.11/24||37468 6453 4134 21217 21217 21217 21217 21217 21217 6830 2603 11164 11995|
|18.104.22.168/24||7660 2516 4134 21217 21217 21217 21217 21217 21217 3356 15085|
Table 1 — Examples of routes involved in the SafeHost leak.
Table 1 reports three examples of the leaked routes announced to the collectors by three different collector peers. You can see the presence of China Telecom (AS4134) followed by SafeHost (AS21217), followed by providers of SafeHost, such as Level3 (AS3356).
Observing the AS_PATH attribute of the leaked routes, you can see the presence of an AS path prepending done by SafeHost; as if SafeHost’s AS administrator wanted to discourage China Telecom from using that BGP link at all. It is also worth pointing out that there is no evidence of a BGP connection between AS4134 and AS21217 by analysing public BGP data prior to the leak.
To have an idea of how much the leak spread all over the Internet, we can analyze how many peers announced to the collector at least one leaked route. Since it is well known that not all the peers announce the same amount of BGP routing data to the collector, only those peers announcing the full routing table (FRT peers) were considered.
As can be seen from Figure 3, almost every FRT peer announced at least one leaked route to the collector. Analysing the impact it had by Regional Internet Registry (RIR) (Figure 4), we see that the leak didn’t spare any region, affecting the routing of the FRT peers independently from their region of membership.
Note, Figures 3 and 4 do not note the number of leaked routes that each peer announced to a route collector. In this sense, a deeper understanding can be obtained by looking at how many leaked routes each FRT peer announced to respective collectors. This allows us to distinguish between peers whose routing was heavily affected, from peers whose routing was affected by just a few routes.
Figure 5 shows the distribution of the number of origins, while Figure 6 shows the distribution of the number of subnets that each peer perceived as involved in a leaked route.
As these distributions show, only 20% of FRT peers announced to the collector leaked routes involve more than 100 different origin ASes and more than 2,000 different subnets. In particular, the FRT peers closer or even directly connected to China Telecom were the peers announcing the highest number of leaked routes to the collectors. Among them, was one client of China Telecom that announced leaked routes involving more than 3,000 different origins and more than 25,000 different subnets.
Overall, the leak lasted almost four hours and affected more than 6,000 different origins ASes in total.
Figure 7 shows the evolution of the number of affected origins during this time.
We see that there are various peaks, for example, one around 10:40 UTC where more than 2,500 different origins were affected at the same time. Among the affected origins there were ASes hosting famous services like WhatsApp and Microsoft, as well as Internet Service Providers, banks, and content delivery networks.
Hijacks and leaks: no AS is safe
By looking again at BGPStream numbers from last year, more than 600 ASes were victims of a hijack and more than 800 were victims of a leak.
These numbers are very likely to be an underestimation for various reasons.
First, it is well-known that only a few hundred ASes are currently sharing a full routing table with public route collectors, so data is largely incomplete.
Second, a peer connected to a route collector may not announce to the collector everything it receives. An AS could receive an invalid route (for example, a leak) but not advertise it to the collector simply because it is not selected as the best route by the decision process, or because it is filtered by some mechanism, for example, Resource Public Key Infrastructure (RPKI) validation, peer-lock, or import policies.
Even worse, it is possible that a routing anomaly could have been constrained in a given routing region (for example, an economy) without being noticed by any route collector, due to the poor amount of data sources in that given region and existing import/export routing policies.
For example, in Figure 8 we assume that E is the only AS that applied the tight import policies. If F triggers a malicious hijack attack, it will most likely affect A, B, C and D due to their loose import policies, while E will most likely drop the announcement.
Now assume that a route collector R is connected to E, and that E is an FRT peer of R. In this case, R will never see the hijack attempt, but the hijack will still create headaches to the legitimate owner in that particular region with several complaints from their clients. For this reason, it is important that every BGP monitoring infrastructure collects BGP data from as diverse data sources as possible.
What can be done (and what has been done) about it
Since BGP lacks an intrinsic mechanism to secure routing, several mechanisms have been adopted to overcome this limitation.
RPKI is probably the most popular, though, it is still not widely adopted and does not provide protection from every kind of routing attack, for example, it cannot impede tailored sub-prefix hijacks.
Another interesting mechanism is BGPSec, which is designed as an extension of BGP. In BGPSec, each AS cryptographically signs the BGP messages sent to its neighbours in order to create a chain of trust that “provide confidence that every AS on the path of ASes listed in the UPDATE message has explicitly authorized the advertisement of the route” (RFC 8205). This chain of trust would introduce a strong security defence against BGP messages, but still won’t be enough to fix every BGP routing vulnerability.
In addition, BGPSec introduces a major challenge that has slowed its adoption: each router must cryptographically verify and sign every BGP message they send. This introduces a computational overhead on routers that can be solved only by upgrading them with crypto hardware accelerators.
While waiting for a widely adopted solution that would solve most of the routing problems described in this post, AS administrators are being advised to adopt Mutually Agreed Norms for Routing Security (MANRS) and rely on BGP monitoring and alerting platforms to reduce the mean time to repair (MTTR) of these incidents as much as possible. Most of these platforms allow their users to set up alarms when a routing anomaly like a hijack or leak happens, helping the AS administrator to identify the root cause of the problem and to take the necessary countermeasures.
Again, the more diverse your BGP data sources are, the more effective your monitoring will be at detecting and alerting for such anomalies.
Watch — Luca Sani present on this case study at APRICOT 2020.
Contributors: Alessandro Improta (Catchpoint)
Luca Sani is a BGP expert at Catchpoint.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.