Reality check. Things aren’t always as reported, and a recent problem with access to high profile websites shows why.
Routing was fine, Internet pages weren’t visible
The 8 June 2021 incident was not that ‘the Internet went down’. While websites including Amazon, Reddit, The Guardian, New York Times, gov.uk and even the White House weren’t accessible, the issue was that some application services, hosted on a single content distribution network (CDN), went missing. However, the actual Internet routing to that CDN worked fine; users were getting sent pages by their web cache servers saying ‘I can’t give you those web pages’.
CDNs help serve content and online services from servers geographically closer to end-users than where it originated, making delivery of content significantly faster, but sometimes things can go wrong.
To use an analogy: the roads weren’t blocked or damaged, a store was just closed. In no sense did the ‘Internet break’. A bunch of web sites weren’t visible, that’s all.
The conflation of ‘the web’ with ‘the Internet’ isn’t helpful in the long term, either. It is better to remember the distinction between these terms, and identify clearly what did break. Some parts of the world wide web were offline. The Internet? It was fine.
The difference is big, and it matters.
The important distinction here is between layers in the protocol model. The web is an application layer, it is about things done over the Internet protocol stack.
The Internet itself is not the applications, it’s the end-to-end connectivity between IPv4 and between IPv6 communicating hosts. Internet protocols send packets, and inside those packets, application protocols — like the web — conduct their business. What actually broke was a service provided by Fastly using content caches to distribute web content.
No, this doesn’t mean ‘infrastructure needs urgent fixing’
The Guardian ran with the hyperbolic headline: Major Internet outage shows infrastructure needs urgent fixing.
The actual story was more measured. The subheading, for example, is arguably true: “Experts say outage shows Internet services too centralised and lack resilience.”
Unlike the headline, this is a reasonable assertion given the range of government and non-government agencies worldwide which were unavailable at the time. However, that doesn’t prove the need for ‘urgent fixing’, it raises a governance question: What kind of risks are we willing to accept for optimization of access?
Which leads us back to the issue of Internet centrality that we’ve been questioning for some time.
The best analyses on this topic discuss the double-edged sword of centralized dependency. Geoff Huston discussed this a few days ago in a blog post, and the key points he made are worth reflecting on.
The tendency to centralize has been happening for a long time, claiming benefits to efficiency and scale. If we decide the risks are too great, we need to be mindful of the costs and consequences. Maybe it isn’t earth shattering if The Guardian is unreachable for some period of time, and we can leave the governance issue to subscribers and the publisher, but what about the gov.uk outage? That’s a more serious consequence.
As long as the tension between efficiency and safety exists, we’re going to need to keep having these discussions.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.