Wide-area network (WAN) routers carry the burden of performing important tasks like Internet-scale routing and switching. However, we at Microsoft and Tel Aviv University, have recently shown that large parts of the switching and routing tasks of many WAN routers can be eliminated.
In doing so, we developed Shoofly, a tool for provisioning wide-area backbones that bypasses routers by keeping traffic in the optical domain for as long as possible. In a nutshell, Shoofly reintroduces concepts from circuit-switching into the world of packet-switched cloud networks to improve their overall cost efficiency.
Wide-area backbone networks
WANs span large geographical regions. For example, cloud WANs connect hundreds of globally-distributed data centres. The physical interconnect of modern WANs consists of wavelengths of light multiplexed over a hundred thousand kilometres of optical fibre. These wavelengths are generated and terminated at optical terminals. At the points of generation and termination, wavelengths are converted to electrical signals that get switched by packet switches and routers (Figure 1).
The nodes where the conversion of optical signals to electrical signals occurs, determine the logical topology (for example, IP layer topology) derived from the physical topology of the optical backbone.
Point-to-point optical backbones
Cloud providers often operate point-to-point optical backbones where optical signals get converted to electrical signals at every intermediate hop between the generation and termination nodes. The conversion of signals from optical to electrical and back to optical domains is called OEO conversions and consumes optical transponders, electrical router ports and optical line ports. The cost of hardware contributes a large portion of the overall cost of provisioning capacity in WANs.
In point-to-point optical backbones, the logical topology of the IP layer closely resembles the underlying physical topology consisting of fibre and optical terminals (Figure 2). Cloud providers chose a point-to-point design to keep the WAN flexible to the emergence of new traffic demands between pairs of data centres.
Is the conversion of optical signals to electrical signals necessary at every hop?
In our recent paper presented at ACM SIGCOMM 2021, we asked if the point-to-point backbone design is appropriate for modern cloud networks.
Specifically, we analysed the traffic demands in a planet-scale cloud WAN and found that roughly 60% of the traffic at 30% of WAN routers is simply passing through or transiting that router. This traffic neither originates at the router nor terminates at it and yet, in the point-to-point backbone design, it is converted from optical to electrical and back to the optical domain.
OEO conversions of pass-through traffic are unnecessary and expensive due to the hardware resources consumed by them. Moreover, these patterns in intra-WAN traffic persist over time.
Optical bypass-enabled wide-area backbone networks
We leveraged insights from inter-regional traffic patterns in the cloud WAN to enable optical bypass in the network. We refer to the elimination of an OEO conversion at a router as optically bypassing the router.
Optical bypass on a router by a wavelength of light saves two router ports and up to two line ports in industry-standard deployments of routers and optical terminals, allowing a significant reduction in the $/Gbps cost of long-haul capacity. However, the ability to bypass a geographical region in the WAN is conditional on physical constraints on the network topology and signal quality on fibre.
Constraints on the optical bypass in the WAN
Optically bypassing geographical regions forces the light signals to travel longer distances on fibre before they can be regenerated. Regenerations fix signal imperfections caused during transmission on the fibre. Signals travelling longer distances undergo more attenuation leading to lower optical signal quality.
Signal quality, measured by the optical signal-to-noise ratio (OSNR), ultimately decides the data rate of the optical signal — the higher the OSNR, the higher the signal data rate. Thus, optical bypassing by some wavelengths in the WAN can lower their achievable data rates due to the increase in transmission distance.
State-of-the-art optical transponders can achieve three discrete data rates (200 Gbps, 150 Gbps and 100 Gbps) by modulating the signals with 16-QAM, 8-QAM and QPSK formats, respectively, provided the signal quality is high enough to support the corresponding data rate.
Moreover, there is a maximum distance a light signal can travel before it must be regenerated, called the optical reach of the signal. If the signal is not regenerated within this distance, the OSNR of the signal is too low to merit error-free decoding at the destination.
Lower order modulation formats can travel longer distances without regeneration. For example, signals modulated in the QPSK format can travel up to 5,000 kms before they require a regeneration. Whereas high order modulation formats (for example, 8-QAM, 16-QAM) have a shorter optical reach (2,500 kms and 800 kms, respectively). Thus, higher-order modulation formats enable higher data rates but have lower optical reach. Due to limited optical reach, each signal can bypass a fixed number of routers before regeneration becomes essential.
Optimal optical bypasses in WANs with Shoofly
Using our empirical understanding of the cloud backbone network, the constraints on optical signal quality and reach, Shoofly formulates the problem of improving the cost efficiency of the cloud WAN. The goal of Shoofly’s optimization problem is to maximize the router ports that are freed after enabling optical bypasses in the WAN topology.
Shoofly encodes the problem of selecting the most ideal optical bypass opportunities in the WAN similar to how centralized traffic engineering algorithms allocate traffic in WANs. Shoofly seeks to engineer the topology and not the traffic in the network.
We solved the resulting optimization formulation using mixed-integer linear programming (MILP) to identify which wavelengths in the WAN should bypass which routers. While the details of our MIP formulation are beyond the scope of this post, the main idea that makes it possible to solve the MIP efficiently is our ability to enumerate all relevant candidate optical bypasses of 3, 4 or 5 router hops in production networks. We added the new routes as extra edges to an expanded network graph and used constraints to enforce that traffic flowing through the new bypass edges is counted against the links in the original graph.
What does it take to deploy Shoofly?
Our evaluation shows that implementing bypasses proposed by Shoofly can free nearly 40% of the WAN router ports. Nearly 30% of the wavelengths in the WAN participate in at least one bypass proposed by Shoofly (Figure 3).
We also evaluated the contribution of each instance of optical bypass to the overall hardware cost savings enabled by Shoofly and found that 25% of bypasses contributed to 80% of all hardware cost savings. This shows a high return on logistical investment in implementing Shoofly’s recommended optical bypasses.
For more details about Shoofly, check our technical material.
Rachee Singh is a senior researcher in the office of the CTO at Azure for Operators.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.