Route Flap Damping (RFD) is a mechanism to locally suppress BGP update churn on the Internet. RFD default configuration parameters in routers are too strict and cause unwanted prefix update suppression, which leads to reachability issues. A 2010 study focused on IPv4 (and only one router vendor, Cisco) determined a set of configuration parameters to avoid these issues.
This post presents results from a 2021 study conducted by my colleagues and I from Freie Universität Berlin, IIJ/Arrcus, Université de Strasbourg, and HAW Hamburg, which sought to reproduce and extend upon a similar 2010 study, but this time also considering IPv6 and one other router vendor (Juniper).
3% of all IPv4 prefixes caused 53.9% of BGP updates
We measured BGP churn using public route collector projects Isolario, RIPE RIS, and RouteViews. We removed BGP duplicates (all attributes match) because they did not trigger best path selection and were most likely related to iBGP update activity in the vantage point network (see Labovitz et al and Park et al).
Figure 1 shows that absolute BGP churn increased over the last ten years. This growth is proportional to the number of active Autonomous Systems (ASes) on the Internet (topology size) (Jia et al). Also, IPv6 seems to exhibit a much larger update activity per prefix and in total. This difference between IPv4 and IPv6, when normalized by the Internet topology size, is constant over time. At least from this perspective, the growth of routing activity complies with common expectations.
We can also see in Figure 1 how BGP update activity is distributed across prefixes in 2020 and 2010. First, inactive prefixes (left part in the plot). Second, prefixes with some update activity (majority in the middle). Third, the most interesting group, heavy hitters (right part in the plot). To put this into perspective, in 2020, 3% of all IPv4 prefixes caused 53.9% of BGP updates (74.8% in IPv6).
Figure 2 provides a closer look at the hourly update rate for a week of the top 50 heavy hitter prefixes. The churn behaviour of a prefix is periodic if the update rate is above 10 updates per hour during the entire measurement period, and otherwise erratic. Darker lines show that a set of prefixes have the same update rate because the lines are drawn with a low alpha value. When looking at these prefixes for a longer period, for example using RIPEstat, these plots extend for weeks and often multiple months (see the example). We do not see a reason why this update activity is useful.
So what are optimal configuration parameters for RFD?
To determine optimal parameter configurations, we select a representative subset of vantage points as peers for our RFD emulation software. We then analyse penalty changes in the first week of June 2020. Learn more about our RFD measurement setup via our TMA 2021 paper.
Configuring RFD means tweaking the suppress threshold. A good suppress threshold is neither too low nor too high because you do not want to suppress prefix announcements caused by normal BGP churn, nor do you want to allow for worst-case churn (heavy hitters).
To fully understand the impact of an inimically configured router, Figure 3 shows the share of prefixes damped at least once across our set of vantage points. The dashed lines indicate the share of prefixes that have been damped by at least one vantage point. The boxplot below the dashed lines shows the range of suppression levels across vantage points.
With the Cisco default suppress threshold (2000) 29% of IPv4 prefixes and 37% of IPv6 prefixes have been damped, and were therefore unreachable, by at least one vantage point! Unsurprisingly, the share of suppressed prefixes varies significantly across vantage points. But, we cannot make individual recommendations per network and, therefore, 2000 (and 4000) can by no means be the generally recommended suppress threshold.
But which suppress threshold is too high?
In Figure 4, we have plotted a different visualization of how prefixes were damped on average at different suppress thresholds. A given row (for a suppress threshold) shows all prefixes, sorted by the total number of updates from low to high, and how long they are being suppressed.
At the Cisco default suppress threshold, prefixes from all levels of churn are being suppressed at least once. At 6000 many fewer prefixes are being suppressed while most of the top 3% are still being suppressed. We still believe that a suppress threshold upwards of 6000, at most 12000, is most reasonable as a general recommendation. Advanced users of RFD could start at 6000, watch what share of the routing table is being suppressed, and then adjust the suppress threshold to their needs.
You may wonder why for 97% of prefixes the colour distribution in the below plot looks rather similar for both IP versions, even though absolute churn in IPv6 is much higher. This is because the median update count across prefixes is identical in both IP versions while the mean is ∼4× higher in IPv6, that is, fewer prefixes cause more updates in IPv6.
We have determined an RFD parameter recommendation for today’s Internet but what about the future? The RFD mechanism implements exponential decay based on half-life. This means that the amount of (common) BGP updates needs to increase exponentially to render current thresholds unusable. Since such an increase is a very unlikely scenario, RFD parameter recommendations are very unlikely to need changing in the future. If our prediction of BGP churn is wrong and operational prefixes exhibit significantly more churn in the future, other issues will be much more of a concern.
Ideally, default configuration parameters are updated in vendor implementations. We understand that it may surprise network operators who rely on the default values. But, vendors could implement a simple warning saying that the current configuration is out-of-date and should be replaced.
RFD parameter recommendations
|RFD parameter||Cisco Default||Juniper Default||Recommendations: BCP 194 / RIPE-580|
|Attributes change penalty||500||500||500|
|Max suppress time (min)||60||60||60|
Table 1 — RFD parameter recommendations.
This research was originally published at IEEE/IFIP TMA 2021. Visit rfd.rg.net for more details.
Contributors: Randy Bush, Cristel Pelsser, Thomas C. Schmidt, Matthias Wählisch.
Adapted from original post which appeared on RIPE Labs.
Clemens Mosig studies Computer Science at Freie Universität Berlin and works at the Internet Technologies research lab, advised by Matthias Wählisch.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.