Should you update your Route Flap Damping parameters?

By on 29 Nov 2021

Category: Tech matters

Tags: , , ,

Blog home

Route Flap Damping (RFD) is a mechanism to locally suppress BGP update churn on the Internet. RFD default configuration parameters in routers are too strict and cause unwanted prefix update suppression, which leads to reachability issues. A 2010 study focused on IPv4 (and only one router vendor, Cisco) determined a set of configuration parameters to avoid these issues.

This post presents results from a 2021 study conducted by my colleagues and I from Freie Universität Berlin, IIJ/Arrcus, Université de Strasbourg, and HAW Hamburg, which sought to reproduce and extend upon a similar 2010 study, but this time also considering IPv6 and one other router vendor (Juniper).

Key points:
  • Current configuration recommendations (BCP 194 and ripe-580) are still valid today and will be valid in the future if current trends continue, considering IPv4 and IPv6.
  • In 2020, 3% of all IPv4 prefixes caused 53.9% of BGP updates (74.8% in IPv6).
  • Network operators should check their RFD configurations for harmful vendor defaults and suggest that Juniper and Cisco update default values in their RFD implementations.

3% of all IPv4 prefixes caused 53.9% of BGP updates

We measured BGP churn using public route collector projects Isolario, RIPE RIS, and RouteViews. We removed BGP duplicates (all attributes match) because they did not trigger best path selection and were most likely related to iBGP update activity in the vantage point network (see Labovitz et al and Park et al).

Figure 1 shows that absolute BGP churn increased over the last ten years. This growth is proportional to the number of active Autonomous Systems (ASes) on the Internet (topology size) (Jia et al). Also, IPv6 seems to exhibit a much larger update activity per prefix and in total. This difference between IPv4 and IPv6, when normalized by the Internet topology size, is constant over time. At least from this perspective, the growth of routing activity complies with common expectations.

We can also see in Figure 1 how BGP update activity is distributed across prefixes in 2020 and 2010. First, inactive prefixes (left part in the plot). Second, prefixes with some update activity (majority in the middle). Third, the most interesting group, heavy hitters (right part in the plot). To put this into perspective, in 2020, 3% of all IPv4 prefixes caused 53.9% of BGP updates (74.8% in IPv6).

Line graph showing average number of announcements and withdrawals per prefix across all vantage points between 1-7 June 2010 and 2020.
Figure 1 — Average number of announcements and withdrawals per prefix across all vantage points between 1-7 June 2010 and 2020.

Figure 2 provides a closer look at the hourly update rate for a week of the top 50 heavy hitter prefixes. The churn behaviour of a prefix is periodic if the update rate is above 10 updates per hour during the entire measurement period, and otherwise erratic. Darker lines show that a set of prefixes have the same update rate because the lines are drawn with a low alpha value. When looking at these prefixes for a longer period, for example using RIPEstat, these plots extend for weeks and often multiple months (see the example). We do not see a reason why this update activity is useful.

Churn behaviour of a prefix at an hourly update rate for a week of the top 50 heavy hitter prefixes across IPv4 and IPv6.
Figure 2 — Number of updates binned by the hour for the 50 noisiest prefixes (ranked by cumulative update count) normalized by the vantage point count. A darker colour indicates multiple prefixes having the same update rate because the lines are drawn with a low alpha value. The churn behaviour of a prefix is periodic if the update rate is above 10 updates per hour during the entire measurement period, and otherwise erratic.

So what are optimal configuration parameters for RFD?

To determine optimal parameter configurations, we select a representative subset of vantage points as peers for our RFD emulation software. We then analyse penalty changes in the first week of June 2020. Learn more about our RFD measurement setup via our TMA 2021 paper.

Configuring RFD means tweaking the suppress threshold. A good suppress threshold is neither too low nor too high because you do not want to suppress prefix announcements caused by normal BGP churn, nor do you want to allow for worst-case churn (heavy hitters).

To fully understand the impact of an inimically configured router, Figure 3 shows the share of prefixes damped at least once across our set of vantage points. The dashed lines indicate the share of prefixes that have been damped by at least one vantage point. The boxplot below the dashed lines shows the range of suppression levels across vantage points.

With the Cisco default suppress threshold (2000) 29% of IPv4 prefixes and 37% of IPv6 prefixes have been damped, and were therefore unreachable, by at least one vantage point! Unsurprisingly, the share of suppressed prefixes varies significantly across vantage points. But, we cannot make individual recommendations per network and, therefore, 2000 (and 4000) can by no means be the generally recommended suppress threshold.

Boxplot across vantage points, showing the share of the global RIB that has been damped at least once. One half of the data lies within the box, split by the median, and whiskers are placed at 1.5 IQR. The dashed lines represent the total share of prefixes that have been damped by at least one vantage point.
Figure 3 — Boxplot across vantage points, showing the share of the global RIB that has been damped at least once. One half of the data lies within the box, split by the median, and whiskers are placed at 1.5 IQR. The dashed lines represent the total share of prefixes that have been damped by at least one vantage point.

But which suppress threshold is too high?

In Figure 4, we have plotted a different visualization of how prefixes were damped on average at different suppress thresholds. A given row (for a suppress threshold) shows all prefixes, sorted by the total number of updates from low to high, and how long they are being suppressed.

At the Cisco default suppress threshold, prefixes from all levels of churn are being suppressed at least once. At 6000 many fewer prefixes are being suppressed while most of the top 3% are still being suppressed. We still believe that a suppress threshold upwards of 6000, at most 12000, is most reasonable as a general recommendation. Advanced users of RFD could start at 6000, watch what share of the routing table is being suppressed, and then adjust the suppress threshold to their needs.

You may wonder why for 97% of prefixes the colour distribution in the below plot looks rather similar for both IP versions, even though absolute churn in IPv6 is much higher. This is because the median update count across prefixes is identical in both IP versions while the mean is ∼4× higher in IPv6, that is, fewer prefixes cause more updates in IPv6.

Graphs showing the mean cumulative damp duration (colour) for each prefix at different suppress thresholds.
Figure 4 — Mean cumulative damp duration (colour) for each prefix at different suppress thresholds. Prefixes are sorted by the total number of updates.

We have determined an RFD parameter recommendation for today’s Internet but what about the future? The RFD mechanism implements exponential decay based on half-life. This means that the amount of (common) BGP updates needs to increase exponentially to render current thresholds unusable. Since such an increase is a very unlikely scenario, RFD parameter recommendations are very unlikely to need changing in the future. If our prediction of BGP churn is wrong and operational prefixes exhibit significantly more churn in the future, other issues will be much more of a concern.

Ideally, default configuration parameters are updated in vendor implementations. We understand that it may surprise network operators who rely on the default values. But, vendors could implement a simple warning saying that the current configuration is out-of-date and should be replaced.

RFD parameter recommendations

RFD parameter Cisco Default Juniper Default Recommendations: BCP 194 / RIPE-580
Suppress threshold 2000 3000 6000
Readvertisement penalty 0 1000 0/1000
Attributes change penalty 500 500 500
Withdrawal Penalty 1000 1000 1000
Half-life (min) 15 15 15
Reuse-threshold 750 750 750
Max suppress time (min) 60 60 60

Table 1 — RFD parameter recommendations.

This research was originally published at IEEE/IFIP TMA 2021. Visit rfd.rg.net for more details.

Contributors: Randy Bush, Cristel Pelsser, Thomas C. Schmidt, Matthias Wählisch.

Adapted from original post which appeared on RIPE Labs.

Clemens Mosig studies Computer Science at Freie Universität Berlin and works at the Internet Technologies research lab, advised by Matthias Wählisch.

Rate this article
Discuss on Twitter

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top