The game theory behind running CUBIC and BBR on the Internet

By on 24 Aug 2021

Category: Tech matters

Tags: , , , ,

Blog home

In 2016, Google introduced BBR, a new congestion control algorithm that abandons the traditional loss-based approach to congestion control and instead uses RTT and bandwidth estimates to infer congestion.

Read: BBR, the new kid on the TCP block

Being largely loss-agnostic, BBR performs well in lossy networks with less variability in end-to-end delay. These properties, coupled with encouraging deployment results from some of BBR’s early adopters, have made BBR an attractive alternative to traditional loss-based congestion control algorithms like TCP CUBIC.

A recent measurement study suggests that BBR is already being deployed by 22% of the Alexa Top 20k websites on the Internet. Given this rapid adoption, we at the National University of Singapore attempted to understand the evolution of the Internet’s congestion control landscape, specifically how reasonable it is to expect the whole Internet to switch to BBR and/or its variants.

Key points:
  • In a network with two flows, there will always be a Nash Equilibrium (NE).
  • The distribution of CUBIC and BBR at the NE depends mainly on the bottleneck buffer size, and marginally on the RTT distribution.
  • Switching to BBR does not always translate into better performance!

Using game-theoretic analysis to model CUBIC and BBR

In our recent paper at APNet ‘21, we described how the decision to deploy CUBIC or BBR can be modelled as a normal form game. With this formulation, we can then apply standard game-theoretic analysis to compute the Nash Equilibria.

In a network setting, a NE would be a stable distribution of CUBIC and BBR where none of the flows can gain anything by switching to the other algorithm.

Should you switch to BBR too?

Unsurprisingly, making the switch to BBR is likely to be a performance-driven decision. All companies that have reported switching to BBR have cited better network performance, particularly higher throughput, as the key reason for making the switch. In principle, websites can usually gauge this performance gain by A/B testing BBR against their currently deployed algorithm and choosing the algorithm that performs better. This system can be modelled as a game where players can switch strategies (running BBR/CUBIC) to maximize some utility (throughput). We can then search for a NE.

Our investigations suggest that in most network settings, the NE lies in a mixed distribution of both CUBIC and BBR flows. In other words, switching to BBR does not always translate into better performance!

BBR’s diminishing returns in bandwidth

In our paper, we discuss four key observations on how CUBIC and BBR interact, and how these interactions affect the underlying normal form game. However, in the interest of brevity, we will only discuss the key observation that led us to make the conjecture that an NE must exist in a general n-flow system.

In our study we showed that a small number of BBR flows can get a disproportionately large share of the bottleneck bandwidth when competing with CUBIC. If we use this property to plot Point A in Figure 1, we observe that when all the flows at the bottleneck are BBR flows, they cannot take up more bandwidth than the bottleneck link speed L (Point B in Figure 1). This means that the trend line between points A and B has to be sub-linear. Consequently, this means that a BBR flow’s average bandwidth will reduce as more flows at the bottleneck switch over to BBR.

Graph showing sub-linear increase in total BBR bandwidth.
Figure 1 — Sub-linear increase in total BBR bandwidth.

We empirically verified this trend by launching 20 flows through a 100 Mbps bottleneck and recording BBR’s average per-flow throughput normalized by the per-flow throughput of the competing CUBIC flows and plot them in Figure 2.

Graph showing  BBR’s per-flow normalized throughput vs percentage of BBR flows at the bottleneck.
Figure 2 — BBR’s per-flow normalized throughput vs percentage of BBR flows at the bottleneck.

We see that BBR’s average bandwidth reduces as the number of BBR flows at the bottleneck increases. In fact, for larger buffer sizes, BBR can perform worse than CUBIC when there are too many BBR flows!

These observations led us to the conjecture that in a general n-flow system, there must also exist a NE where it is no more beneficial for a CUBIC flow to switch to BBR.

NE in Internet congestion control

We empirically verified the claims of our conjecture by launching 6, 9, and 12 flows through a bottleneck link of varying buffer sizes and link capacities. For each network setting, we ran three trials of all the possible combinations of the n senders running either CUBIC or BBR.

To identify the NE, we enumerated all the combinations and checked if there is any combination such that no individual flow in that combination can achieve higher throughput if it switches to the other TCP variant (all other flows remaining fixed).

We found all our experiments to have exactly one NE. In each of these NEs, the flows selecting CUBIC as their congestion control algorithm were the flows with the shortest RTTs. Therefore, in Figures 3 and 4, an NE with x% of CUBIC flows means that when sorted by RTT, the first x% of the flows run CUBIC at the NE.

Three graphs showing the effect of link capacity and the number of flows (for 6, 9 and 12 flows) on the NE.
Figure 3 — The effect of link capacity and the number of flows on the NE.

Three graphs showing the effect of RTT distribution on the NE for 6, 9 and 12 flows.
Figure 4 — The effect of RTT distribution on the NE.

From Figures 3 and 4, we can see that the NE primarily depends on the bottleneck buffer size, and marginally on the RTT distribution. We also note that for very deep buffers, the NE is often at a 50-50 split between CUBIC and BBR flows.

The future of the Internet’s congestion control landscape

Given our observations, it seems unlikely that the entire Internet will switch to BBR anytime soon and that CUBIC is here to stay. That said, the Internet does not strictly follow economic game theory and the predictions of any game-theoretic analysis should be taken with a pinch of salt.

We have also not evaluated the effects of AQMs and video workloads, which may have a significant impact on the Nash Equilibria. Websites and network engineers are also likely to have more sophisticated utility functions rather than consider just purely throughput.

That said, our results expose a previously unexplored dynamic worthy of further investigation, especially since it can have a significant impact on the future Internet congestion control landscape.

Ayush Mishra is a third-year Ph.D. student at the National University of Singapore. 

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published.

Top