BGP route leak prevention and detection with the help of RFC 9234

By on 10 May 2023

Category: Tech matters

Tags: , , ,

2 Comments

Blog home

Route leaks can be accidental or malicious but most often they arise from accidental misconfiguration. By creating a route leak, the leaker becomes a link between different regions without obtaining income, but the main problem is not even the lost profit.

First, packets must traverse a longer distance when a route leak happens in a third economy. And as we understand, it results in much bigger delays. Secondly, packets can be lost due to insufficient settings if they don’t get to the receiving party.

How often do they occur and what are the risks?

According to Qrator Labs’ BGP incidents report for Q3 2022, there were 12,103,554 BGP route leaks, originated by 3,030 unique route leakers during the period.

Of course, not all of them had enough propagation to be visible globally. According to Qrator Radar data, there were six global route leaks in Q3 2022, five global route leaks in Q2 2022, and four in Q1 2022. And global route leaks occur much more frequently than BGP global hijacks, at least since 2021.

The common effects of an active BGP route leak may vary from increased network delays for the victim (originator of a prefix) to Denial-of-Service (DoS) for both the victim and the leaker. The exact scope and the scale of consequences are impossible to confirm without being a party involved in the leaked BGP session, but some clues can be gathered by monitoring traffic for the affected Autonomous Systems (ASes) and their resources.

Figure 1 — An incident from August 2022.
Figure 1 — An incident from August 2022.

Figure 1 shows an example from the beginning of August 2022 — a classic situation when traffic between two Tier-1 ISPs was rerouted and lasted for several hours. If the leaking ISP was a small network, the amount of traffic would be so immense that it could cause a DoS on a global scale. We have seen this type of situation many times before. Luckily, in the situation shown in Figure 2, the ISP was big enough to handle all the traffic routed its way.

Options for preventing and remediating route leaks

Existing leak prevention techniques rely on marking routes by operator configuration, with no check that the configuration corresponds to that of the eBGP neighbor, or enforcement of the two eBGP speakers agreeing on their peering relationship. 

Right now, except for the AS-SETs, there is virtually no option to deal with route leaks. This is why BGP Roles have three ‘dimensions’ — preventing, detecting, and checking the third-party configuration. Currently, with the help of BGP Communities, we can only try to prevent them.

Figure 2 — Example of route leak consequence.
Figure 2 — Example of route leak consequence.

Can we measure the effects of the route leaks? If you have multiple data monitoring tools, you can correlate the impact on the data and ongoing BGP incidents.
 
On the left in Figure 2, you can see how a spike in traffic volume coincides with the route leak. On the right, you can see that they will be different if you try to visualize traceback during and after the incident; it could include additional economies and other parameters. You can see the increased RTT during the incident and monitor the number of dropped packets.

How route leak problems change with RFC 9234 adoption

In the old world of route leaks, their detection relied on communities. They were set on the ingress and checked on the egress, which is relatively simple. The problem was that this solution was always one mistake away from failure. As an ISP, a route leak occurs if your customer forgets to create an ingress filter or forgets to create an egress filter. They may forget to do both, but a route leak still occurs.

Figure 3 — Route leak prevention with communities.
Figure 3 — Route leak prevention with communities.

RFC 9234 provides a meaningful tool to prevent and detect BGP route leaks. By enhancing the BGP OPEN message to establish an agreement on the peering relationship on each eBGP session it can enforce appropriate configuration on both sides. Propagated routes are then marked according to the agreed relationship — an in-band method with the new configuration parameter — with BGP Role, which is negotiated using a BGP Role Capability in the OPEN message. An eBGP speaker may require the use of this capability and confirmation of the BGP Role with a neighbor for the BGP OPEN to succeed.

There is also an optional, transitive BGP Path attribute, called ‘Only-to-Customer’ (OTC), which prevents ASes from creating leaks and detects leaks created by the ASes in the middle of an AS-PATH.

Figure 4 — The OTC attribute.
Figure 4 — The OTC attribute.

What is a BGP Role?

A BGP Role describes your peering relationship with your neighbor. You only have a few peering relationships, which could be provider, customer, peer, route server, and route server client. You can mark all your neighbors with these easily. This configuration parameter is translated into BGP capability code, and this code is negotiated during the BGP session establishment process.

During open exchange, there is a check that the provider-customer pair is the correct pair. But what happens if one side configures the provider role and its counterpart peer role? If someone mis-clicks, the BGP session won’t establish.

Route leaks are straightforward, and happen when a prefix received from one provider or peer is advertised to another provider or peer. Once a prefix is advertised to a customer, it should only go downstream to the customer or to an indirect customer, and so on. To guarantee that this rule is not violated, we added a new BGP attribute called OTC. How does it work?

Figure 5 — How OTC works.
Figure 5 — How OTC works.

When a provider sends a prefix to a customer, they set the OTC attribute with the value of their own AS. If this attribute is not set, the customer adds it with the value of its neighboring AS. Note, it doesn’t matter who sets the attribute — the value is the same.
 
The OTC attribute does not change during its lifetime, and as shown on the right of Figure 6, the OTC is double-checked. As shown in Figure 7, the customer first checks that if the OTC is set, it must not send its prefixes to other providers and peers. And the same check does the provider piece on the other side.

Figure 6 —  OTC is double set, double-checked.
Figure 6 — OTC is double set, double-checked.

So, OTC is double-set and double-checked. And if the customer fails to configure their filters, the provider will be able to detect route leak instantly.

Dealing with route leaks

The document is quite precise about what to do when you detect a route leak — reject the route. Figure 7 shows the simplicity of configuring BGP Roles in some open source software. The yellow text shows how to configure BGP roles in BIRD and FRR, and OTC will do the rest. As shown in Figure 7, when a corresponding role does not match, the BGP session won’t come up.

Figure 7 — Configuring configuring BGP Roles.
Figure 7 — Configuring configuring BGP Roles.
Figure 8 —  Roles are automatically tagged with the OTC attribute.
Figure 8 — Roles are automatically tagged with the OTC attribute.

Figure 8 shows what is happening behind the scenes. An OTC attribute is emerging in the route, but you are not configuring it — it’s done in code for you. Simple.

So, BGP roles and the OTC allow you to control your neighbor’s configuration. OTC is a transit attribute, a transit signal, that may go from the Tier-1 network or IX to all its direct and indirect downstreams. It’s double-checked on egress; double set on ingress. And the OTC is an attribute that, compared with the community, is highly unlikely to be stripped. Critically, it allows an opportunity to detect route leaks even several hops away from the leaking AS.

Right now and towards the future

Of course, real-world implementation of RFC 9234 would be different depending on the role of the ‘Local AS’ that is adopting the BGP Roles. If you’re an IXP, you can use it right now. If you’re an ISP using some form of hardware provided by a vendor — ask the vendor about their plan to implement RFC 9234.

We consider BGP Roles sufficient to prevent and detect 80% of BGP route leaks in major IXPs and the world’s largest operators. Twenty percent would probably remain broken cases, BGP optimizers that could simply delete the attribute if they want to, or something else we haven’t thought about. What BGP Roles are not intended to deal with is hacking activity — they are focused on preventing and detecting errors/misconfigurations.

Autonomous System Provider Authorization (ASPA) and Route Origin Authorization (ROA) in combination are able to cover hacking activity related to BGP routing, in our opinion. ASPA is complementary to BGPSec, although both have a way to go before we see wide adoption among ISPs.

Currently, we are aware that patches were applied to the three major open-source implementations (Table 1) but the problem is far from solved. To eliminate route leaks, as a community, we need to show a desire to get rid of these routing incidents, with a similar passion the community shows in eliminating BGP hijacks with ROAs.

SolutionStatusVersion
BIRD+2.0.11
FRR+8.4
OpenBGPD+7.5
MikrotikReduced functionalityPredates RFC 9234
TCPdump+GitHub
Wireshark+GitLab
Table 1 — BGP Roles vendor support

If you’re using open-source tools, you can already try to set up BGP Roles. If you are using vendor software, send a request for BGP Role support to your vendor. And if you’re a developer — even greater! There is a vast space for improvement if you can contribute to other BGP implementations. You can contribute to BMP parsers, TCP dump implementations, BGP dumps, and so on.

With thanks to the RFC authors, we hope that RFC 9234 is a start to eliminating BGP anomalies for a better Internet.

Eugene Bogomazov is a Research and Development Engineer at Qrator Labs.

This post is adapted from the original at Qrator Labs.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

2 Comments

  1. SVM

    Does vendors support this feature?
    Nothing heard from Cisco or Juniper or Huawei for supporting it in the current code versions as of Jan 2024.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Top