The growth of global Internet traffic has driven a drastic expansion of the submarine cable network, both in terms of the sheer number of links and its total capacity. Today, a complex mesh of hundreds of cables, stretching over one million kilometres, connects nearly every corner of the earth and is instrumental in closing the remaining connectivity gaps. Despite the scale and critical role of the submarine network for both business and society at large, our community has mostly ignored it, treating it as a black box in most Internet studies, from connectivity to inter-domain traffic and reliability.
Figure 1 — Number of active submarine cables based on their ready for service dates (RFS) (left axis). Total length of currently active submarine cables by year (right axis). Includes planned cables for future activations through 2020.
While there are benefits to this kind of abstraction, particularly for application developers, it does create roadblocks for network operators and researchers trying to gain insights into the underlying infrastructure. Both groups rely on diagnostic and measurement tools, such as traceroute, that record observations at the network layer. Despite their undeniable value, the level of abstraction at which they work means that multiple network operators sharing the same physical path may appear as distinct paths at the network layer.
Internet topologists have come up with ways to use these measurement tools to create different visualizations that represent the paths of the Internet. They comprise tens of thousands of networks with even more connections between them.
In reality, a failure in one piece of infrastructure could result in simultaneous failures in many seemingly disparate IP connections. With approximately 400 submarine cables being responsible for carrying 99% of transoceanic traffic, a relatively small number of conduits (with long repair times) bear a lot of responsibility for ensuring global bandwidth and connectivity. Simultaneous outages on just a few of these links can result in large regions being cut off entirely from the Internet. This difference between what our tools reveal and the underlying infrastructure can lead us to overestimate the network’s resiliency.
Figure 3 — A relatively small set of approximately 400 cables is responsible for carrying 99% of transoceanic traffic. Source: Telegeography.
We first became interested in this area a few years back when looking at the connectivity of Cuba. We noticed four different networks being used to access the island – Tata, Telefonica, and two satellite providers. By examining our latency measurements, we knew that Tata and Telefonica were sending traffic via submarine cables instead of a satellite connection. However, both operators shared a single cable (ALBA-1) to reach the island, so relying solely on IP-level measurements would have led us to overestimate the number of physical paths by a factor of two. Most submarine cables are shared by many more than two providers.
So how can I ensure reliable international connectivity? Which operators are using which cables?
When we first started discussing our research with the community, we found that there was a lot of interest from organizations aiming to operate reliable services over the Internet but did not have the global presence of larger networks like those of Google, Amazon, Microsoft, or Facebook. Organizations using these smaller networks want to ensure that if they are hiring another transit provider to improve reliability that it will actually lead to an increase in redundancy. While a second operator may help protect against certain types of issues (that is, non-physical faults such as configuration issues), cable faults will affect both operators.
At the moment, we are working on a service that will take an Internet path (a series of IP addresses such as the output of traceroute), and tag hops with possible submarine cables, identifying the most likely cable. We hope that eventually operators will be able to use this service to evaluate the path independence of their transit providers.
Being able to accurately tag an IP hop to a physical cable is not an easy process. Using over 500 million traceroutes, collected by the RIPE Atlas project, as well as using RIPE’s geolocation service, we were able to calculate whether it was physically possible for IP hops to traverse known submarine cables. We are also working to correlate cable outage and reconfiguration information to improve the accuracy and completeness of our data. For example, the below figures demonstrate the impact of an outage or cable reconfiguration on round-trip time (RTT).
Figure 5 — South East Asia – Middle East – Western Europe 4 (SEA-ME-WE 4) cable reconfiguration in October 2017.
Figure 6 — Asia-America Gateway (AAG) cable reconfiguration for the section S1 to Viet Nam in January 2018.
The hope is that this data can be updated as Atlas probes continue to run measurements.
We are currently in discussions with RIPE to eventually add this functionality to their traceroute visualization tool, increasing the visibility of our service.
Why don’t submarine cable operators just reveal this physical information?
While there are faults and outages in terrestrial settings that can result in performance degradation or disconnections for large groups of users, the impact of submarine cable outages can be particularly devastating. First, relative to terrestrial networks, an even larger population — in some cases, entire economies — rely on a smaller number of conduits for connectivity.
Additionally, performance degradation caused by rerouting traffic can be even more extreme. In multiple cases in our study, we saw latency measurements between two regions increase more than threefold. Finally, depending on where the cable fault occurs and ship schedules, repair times can be measured in weeks. Though outages occur more frequently on land, the impact in terms of performance, the number of impacted users, and the repair times are more distressing.
One of the reasons why organizations like Microsoft and Google are investing in submarine infrastructure is due to a lack of reliability. While there are many additional motivations, such as better control over the network, higher reliability and predictability seem to be a critical factor. Smaller organizations without the resources to invest in their own submarine infrastructure may be interested in better information and subsea cable operators working to improve their service reliability.
How can we prevent submarine outages in the future?
While preventing submarine outages is a high bar, we can certainly gain a better understanding of critical infrastructure and recurrent problems with a more systemic perspective. While we have come across various reports of previous outages, a more complete compilation of outages that we can cross reference with other data sets covering natural disasters (such as earthquakes), ship routes, and cable repair times will allow us to more accurately assess the likelihood and potential impact of prolonged outages. It will also aid in identifying specific regions and cables where further investment may be the most beneficial.
Zachary S. Bischof is a Visiting Researcher at the IIJ Research Lab, working on experimental networks, measurement and distributed systems.
Romain Fontugne is a senior researcher at IIJ Research Lab, Japan, who focuses on Internet measurements, traffic analysis and network security.
Fabian E. Bustamante is Professor and Associate Head in the Computer Science Department at Northwestern University.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.