The network operations community is cautiously heading back into a mode of in-person meetings and the NANOG meeting at the start of November was a hybrid affair with a mix of in-person and virtual participation, by both presenters and the attendees. I attended virtually, and these are my notes from the presentations I found to be of personal interest. I hope you’ll find them interesting as well.
Famous Internet outages
The year 2021 has not been a good year for Internet outages. The widely reported Facebook outage in October came after outages from Akamai in July and Fastly in June 2021. These events received widespread coverage in the media at the time. 2021 didn’t differ much from recent years, and we can confidently expect to see more outages in 2022 and beyond. A panel session at the start of NANOG 83 discussed some of the more memorable outages and their causes.
One early outage occurred on 2 November 1988 when virus code devised by Robert Morris ran rampant across the Internet. It exploited a buffer overflow hole in a resident daemon process to inject and then execute code. The code was tailored to DEC VAX and Sun-3 systems, the predominant host systems on the Internet at that time. The issue was a very high replication rate of the virus, injecting multiple copies into systems that bought the host to its knees.
Incidents of hostile malware taking out large parts of the Internet have continued since. There was the 2003 SQL Slammer worm that exploited a buffer overflow bug in Microsoft’s SQL server implementation. The worm code was tiny (just 376 bytes). Transport was via a single UDP packet targeting servers, so it infected the high-capacity core of the Internet extremely rapidly. In 2008 there was a similar replicating malware system in the form of Conficker, again targeted at the Microsoft Windows platforms, with an aggressive spreading algorithm.
Malware incidents that have significant replication factors and easily discernible network signature raise the question of the role of ISPs in such attacks. To what extent should ISPs take on the role of the Internet’s universal firewall? Or should they position themselves as totally neutral carriers and trade all traffic in precisely the same manner? Most ISPs would be keen to assist and protect customers, so they often undertake moves to block malware, but in many instances it’s not quite so black and white. What is their position when the attacker and intended victim are customers of other ISPs and they are acting as a transit? What is the position where there is a request from a foreign law enforcement authority?
It’s not just hostile attacks. Sometimes it’s a case of accidental damage caused by operational slip-ups. One of the well-remembered incidents was the AS7007 event from 1997. AS7007 leaked a large volume of more specific routes (all /24s), all with a new origin AS, namely AS7007. It emerged during the NANOG panel that AS7007 used RIPv1 to pass a collection of routes, learned from the eBGP instances, to the internal routers in the network. Folk with a long memory might remember that in 1997, Classless Routing was still a bit of a novelty, and RIPv1 only used Classful prefixes. When it learned a set of routes from a Classless eBGP session it dropped the prefix length information and added the Class A, B and C prefix lengths derived from the value of the first octet of the address prefix. At this point the eBGP speaker learned these more specific routes and promptly propagated them out into the inter-domain routing system. By today’s standards it was not a big leak (6,000 routes) and didn’t last very long (two hours), but the more specifics caused a large-scale traffic redirection that was noticed at the time!
Outages have been caused by submarine landslides, such as the multi-cable outage in the Luzon Strait close to Taiwan in December 2006, which disrupted six of the seven east-west cable systems linking East Asia with North America. In this case, the repair of 18 individual cable faults took some time due to the complexities in locating the correct cables due to the extent of the cable movement on the seafloor following the event.
There have been train derailments that have taken out fibre runs, as the railway rights-of-way have always been attractive as potential fibre paths for long distance runs.
As well as natural events, collateral damage events and similar, there are also a set of deliberate outages where a local regime acts to shut down all Internet access. Sudan has been in a period of Internet shutdown since late October 2021. Similar shutdowns have occurred in Egypt, Syria, and Bahrain. Over the course of 2020, 29 economies intentionally shut down or slowed their internet communications at least 155 times, according to a new report published by Access Now, a digital rights group.
If practice makes perfect, we are probably only getting better at generating Internet outages!
Who controls the Internet?
Bert Hubert provided a keynote presentation on the topic of who controls the Internet.
There is no shortage of contenders to claim such control. Considering the increasingly common use of deliberate Internet shutdowns or slowdowns, it certainly looks like several governments operate under the perception that their corner of the Internet is under their control. This is a far cry from John Perry Barlow’s 1996 Declaration of the Independence of Cyberspace, although this message is still being carried by activists of various shapes and forms.
The Internet has been built on a foundation of private sector investment, and there is a strong case to be made that the Internet is controlled by these corporate interests from the private sector. However, this is likely to be an over-simplification. The massive shift of advertising expenditure from print media and television into the Internet could sustain the case that the Internet is controlled by advertisers. The perversion of many of the technologies of the Internet into tools to enhance digital surveillance, all to improve the effectiveness of advertising and thereby increase its value to the advertiser, tends to sustain the case that this environment is controlled by advertisers, and without advertiser-funded services the Internet would probably be a rather hollow shell, devoid of any forms of compelling content.
One could also see the Internet as the expression of a deregulated market, where ultimately, it’s the collective summation of user preferences that drives the entire show. From this perspective, the case can be made that it’s you and me, as consumers of digital services, in control of the Internet. Without you and me as subjects of such intense scrutiny and as targets of advertising, there would be no advertisers. Without advertisers there would be no funded services. Without this universe of compelling digital content provided through these services there would be no Internet, or at least no recognizable Internet.
Let’s go through these contenders one by one.
First, let’s look at governments. The role of some governments in controlling the Internet is quite low key, while in others it’s far more evident. The North Korean situation is a relatively extreme case where the digital environment is highly curated and the tools and services, including email, social networking, search, and video conferencing have been heavily adapted to meet local requirements. Surveillance capabilities used by the private sector elsewhere are evident, but in this case the client is the government rather than the advertising broker.
The situation in China has some similarities to North Korea, although on a far larger scale and with one of the more advanced levels of technical sophistication in the world. While China appears to have the capability to essentially disconnect from the rest of the Internet, there are certain windows of visibility that permit Chinese entities to participate in the broader digital environment.
The control measures used in China are shrouded in secrecy. There have been periods where Chinese enterprises have attempted to join in with the larger global market with their own approaches and technologies, but such efforts have been recently rebuffed by Beijing, or rebuffed by the broader Internet market. There has been a long-held belief from other quarters that much of the Chinese technologies are either strongly derivative or unashamed clones of foreign technology.
But these two cases appear to be exceptions to the norm. The efforts in other economies have been less successful in imposing a government agenda on the national digital environment. In Turkey, Indonesia, and Iran, the experience has been that active censorship and exerting control is hard work and there is a continuing escalation between block and counter-block. There are efforts by global technology vendors to increase privacy levels and obscure user transactions and traffic, so national governments attempting to stay one step ahead of these moves find themselves in increasingly challenging positions needing additional technical resources that simply may not be available.
Most governments do not have sufficient local capabilities to effectively block access to proscribed digital services. This is the case in Russia, where efforts to block some services provided via AWS and Digital Ocean were so broad and sweeping that it had negative impacts on the national economy, and the government found itself backing off.
Western Europe has its own issues. The United Kingdom has a mixed track record of effectively protecting intellectual property rights and preventing digital privacy. The European Union’s GDPR and more recently NIS2 have had some impact on the Internet, but in many ways the measures are more cosmetic than substantial. Blocks on domain names and blocks on file sharing are sporadic and effectiveness varies.
The United States has always had a vexed relationship with the Internet. For many years the preeminent position of the US government on the carriage infrastructure elements of DNS names and IP addresses was part of a story that went, informally, along the lines of: ‘We’re here to protect you from the vagaries of other governments. Some of those other governments are seriously scary!’. At the same time, the US has been captured by the intellectual property rights lobby, the draconian provisions of the DMCA, the unilateral ability to intervene in international payments, and the broad reaching powers that are encompassed in national security measures all point to a US set of interests that clearly extend far beyond the lines of their map.
Now let’s look at the case for corporate control of the Internet. Facebook’s Content Oversight Board was not the first (nor will it be the last) corporate tribunal to determine who can use a service, how they can use it, and whom they interact with. Google, YouTube, Microsoft, Apple, Twitter, Flickr, and many others use similar control mechanisms. These oversight bodies are often arbitrary, their decisions are final, binding, do not necessarily adhere to a consistently applied set of procedures, and are sometimes enforced without warning or recourse to appeal. These days, for example, losing access to Facebook or WhatsApp can be a dire situation for a company that relies upon such platforms to sustain their relationships with their customers.
The corporate interest is always present in such actions, and respect for societal and human values, such as privacy, respect, remediation of harms, and the forms of recourse that are found in many legal systems are either missing or provided only at the whim of the corporate entity. In this respect their control structures and behaviour are, intentionally, both absolute and unaccountable.
What about the case for technology control? The evolution of our technology has been a fascinating path. Early computers were ‘open’ in the true sense of the word. Anyone could write code and feed that code to the computer. When we look at portable devices, or browsers, or many applications, the best description of the openness of these platforms is ‘hermetically sealed’. The behaviours of the platform, be it a device or an app, are set by the manufacturer and hidden behind defensive barriers including layers of encryption and obfuscation. A good case in point was the Skype app, where the essential asset of the application was the code itself. It was shipped in an encrypted binary, and almost everything in the binary image was obfuscated.
Transport applications that interact with networks are collapsing into a single protocol, QUIC, exposing just a UDP stream where both the payload and the session control channels are completely obscured from third parties outside a very limited circle of visibility. This includes governments, network operators, the device platform, and other applications. Incidentally it also includes you, the end user. From this perspective, all forms of control are passed to the application.
The question of control can be phrased as a tussle between contenders, including governments, corporates, and the technology itself.
What powers will governments exercise to compromise the device and open it up for inspection, and what powers will the application use to hide its activity from both the device and the network?
While application behaviours have not normally been the subject of regulation will we see the scope of regulations broaden to attempt to curb how applications may behave, particularly with encryption and obscured data?
When a device vendor exercises complete control of the device’s app developer ecosystem, preventing the device from loading unauthorized applications, and that control builds in a whopping 30% sales revenue (yes, I’m referring to Apple if you hadn’t guessed) then who is really in control here?
It’s unclear that governments have any effective control these days. The pace and scope of change has left many regimes following in the wake of control change, trying to regain ground through regulation and taxation rather than setting out a rule-based framework in advance. The big technology companies have gained much in the way of control, including centralizing the points of control into internal functions that have little in the way of broader public accountability. Privacy is good, particularly when the lack of privacy has been weaponized in this surveillance economy. Encryption is good as a means of strengthening privacy, and freely available information is good. But neither governments nor technology companies are effective in protecting the public interest through this process of change, and the typical user is very much a victim in all this. We really need clearer role accountability in this sector.
Internet Routing Registries
ARIN’s Brad Gorman presented on the topic of Internet Routing Registries (IRR) Spring Cleaning. IRRs are a means of listing the combination of address prefixes and routing intention, intended to be used to validate the information being passed through the routing system. IRRs contain route objects, linking an address prefix to an originating Autonomous System Number (ASN) and aut-num objects describing an AS, its adjacent ASNs, and the routing policies it applies to these ASNs. There are set objects that allow groups of route objects or aut-num objects to be grouped together and treated as a single entity.
So, what’s the spring cleaning at ARIN? Historically, the IRR operated by ARIN operated as a common resource for network operators, and there was little in the way of validation of entries that were placed in the IRR. Over time, the IRR contained both accurate and up-to-date information and earlier information that had lost relevance over time, and clients of the IRR have issues in discriminating between the two. The response from the Regional Internet Registries (RIRs) has been two-fold.
The first is Resource Public Key Infrastructure (RPKI) where IP address and AS holders can associate their public key with the resource in an RIR-issued digital certificate. However, the RPKI framework is missing explicit attribution and routing policies, both of which have their uses in the network operations community. In response to this ongoing use of IRR and a desire to explicitly label IRR entries that are current and authentic, ARIN has also introduced the concept of ‘authenticated resources’ where the association of the address prefix or ASN with a resource holder is based on current information in ARIN’s resource records. For some time now, ARIN has been operating essentially two databases — one describing Authenticated ARIN resources, and a second legacy database of resources that are not authenticated by ARIN. ARIN has indicated that it will be withdrawing the IRR-NONAUTH database on the 31 March 2022. You’ve been warned!
Deeper peering
Lumen’s Guy Tal presented on Lumen’s proposed changes to its peering policies. Peering in the Internet ISP sector has been a thorny problem ever since we started using a multi-provider network. When a packet transits multiple networks on its journey from source to destination then who pays whom? In multi-provider activities, including the telephone industry and the postal system, each transaction was individually valued and funded by the sender and the revenue was apportioned to each contributing provider according to either a mutually agreed tariff or a universal settlement tariff. But the Internet was never able to establish a common definition of a transaction or an associated tariff model. Packets are just too small and too random. Instead, retailers quickly adopted a simple flat-fee model. The implication of this move was that inter-provider settlements needed to reflect the retail model. The ISP industry adopted an equally simple inter-provider model — either you’re the customer and you pay me, or I’m the customer and I pay you, or we just don’t pay each other anything at all. This last option, peering, is used when the two parties are of equal size.
It sounds fine in most respects, except that the attribute of ‘equality’ is not really an objective judgement. Instead, it’s a negotiation, and there are obvious rewards for a smaller provider to peer with a larger provider. It’s in the interests of larger providers to make their ‘equality threshold’ as high as it can. It’s in the interests of competitive smaller providers, and the market regulators who have an interest in continued competitive presence in the market, to make equality thresholds far lower.
Lumen’s latest effort is a play right out of the old telephone book. Instead of setting up a small number of peering points and exposing the entire network to the other network, the intent is to divide the network into a large set of ‘local call zones’. If you peer in a local call zone you only get to see the routes for the customers of the peer who are also in the same call zone. The provider is not exposing its transit network to leverage through peering because smaller entities derive no competitive advantage as they need to have the same breadth of coverage in order to avoid being treated as a customer. As Guy concluded in his presentation: “Lumen has updated peering requirements to mandate peering in all Tier 1 interconnectivity markets and 2/3 of Tier 2 interconnectivity markets in the US.” If Lumen’s idea is to place pressure on regional or smaller ISPs and wean them off peering with Lumen by exposing them to the additional costs of transit then yes, this might just do that.
Scanners
Scanning the IPv4 address space has been a popular sport for years. It became popular when the ZMap scanner code was released by researchers at the University of Michigan. There are many such tools these days, but the ZMap tools are still popular, and as they say: “… we’ve built nearly a dozen open-source tools and libraries for performing large-scale empirical analysis of Internet devices.”
Many of these scanners operate blindly, walking through the entire IPv4 space, but others appear to be more selective in their scanning, targeting networks within a particular geography, or targeting networks associated with a category, such as residential, enterprise, or educational campus, according to a presentation from Max Resing of the University of Twente.
I’m a little skeptical about these conclusions, as the way the experiment is conducted can have a strong impact on the resultant data. Earlier work in setting up entire /8 prefixes as honeypots for scanners showed a strong pattern for blind scanning across the entire address range, which is inconsistent with targeted scanning.
Anycast
Is anycast an art or a science? Anycast is used extensively in today’s Internet. It is used extensively in the DNS for both open recursive resolvers and authoritative name servers. It’s used by several content data networks. It’s used in mitigation of DDoS attacks. But how good is it?
The ideal is that anycast distribution should relate to geography, and each anycast distribution point should attract nearby clients. The reality is a lot messier as the anycast boundaries are blurred and do not correlate well to either distancer or latency. Previous studies have indicated that around one third of end points are routed to a suboptimal service point. It’s also been observed that simply increasing the population of the anycast constellation does not necessarily improve the outcome as the metric is average latency of an anycast service point.
One factor behind this is that BGP minimizes AS hop counts in its route selection process, not latency. A global transit network still counts as one AS. The second is that on the Internet AS paths are, on average, very short. This implies that the routing system is not capable of fine-grained discrimination between anycast service points.
This suggests that the construction of a large-scale anycast service constellation poses some challenges. If one wants to predict the impact on an anycast service by adding or remove an anycast service point, the most effective way to do it is setting up a testable anycast service constellation on the Internet, setting up a large set of test points distributed across access networks and perform the maximum set of reachability tests on each configuration.
The research work being undertaken by a team at Duke University is attempting to provide similar outcomes using a far more constrained testing setup. In an experiment using 15 sites, each peering with one of six transit providers, their prototype AnyOpt predicted site catchments of 15,300 clients with 94.7% accuracy and client round-trip times (RTTs) with a mean error of 4.6%. AnyOpt identified a subset of 12 sites, announcing to which, lowered the mean RTT to clients by 33ms compared to a greedy approach that enables the same number of sites with the lowest average unicast latency.
Password recovery weaknesses
If you have ever forgotten your password, and that’s most of us by now, the common approach is to click on the ‘forgotten password’ link and the service sends a password recovery link to your nominated mail address. What if the provider’s DNS resolver does not perform DNSSEC validation, or the domain name of the client’s email address is not DNSSEC-signed? Then the attacker can perform a DNS attack and cause the recovery message to be sent to the attacker’s mailbox.
The underlying messages are simple. To service providers with user accounts — passwords alone are a lousy security mechanism. Use Two-Factor Authentication at a minimum! Oh, and use a DNSSEC-validating recursive resolver for all your domain name queries. No exceptions. To clients and client service providers — use DNSSEC to sign your domain name.
We’ve known these messages for years, but many service providers prefer to make it ‘simple’ for their customers and persist in password-only account protection mechanisms. I see this as no different to treating these same customers with careless contempt!
Predictive routing
A lot of effort has gone into the repair mechanisms used in the BGP. The nature of a distance vector algorithm is that a BGP speaker may not have all the information at hand to immediately repair the routing state when it receives a withdrawal message. Instead, the BGP speaker withdraws the route internally, announces this withdrawal to its BGP neighbours, and then patiently waits for a neighbour to announce a replacement route.
There have been several approaches to mitigate this BGP repair delay, BGP Add Path being one of the more recent. In this approach, BGP not only propagates its ‘best’ path to its neighbours, but also any viable alternate paths that are viable. This way, a BGP speaker can process alternate paths when processing a withdrawal and recover quickly.
In predictive routing this is taken a step further. Why not just allow the BGP speaker to remember previously learned viable next hop forwarding decisions and when the primary next hop decision is withdrawn just assume that a previously learned route still exists and just use it while awaiting an explicit BGP-based repair? If this sounds a little dubious it should be remembered that the use of a default route has a similar outcome, which is telling the BGP speaker of a universal forwarding action to take when there is no more explicit routing information available.
IPv6 — The next 10 years
If you had asked anyone back in the year 2011 or so whether we would still be running IPv4 to support the Internet in 2021, I’m pretty sure that answer would be a very definite no! We were chewing through 400 million IPv4 addresses every year at that point, and we were very dubious of the ability of NATs in IPv4 to scale up year on year. Ten years later when asked if we will still be using IPv4 on the Internet in 10 years from now, then the answer from Comcast’s John Brzozowski is a surprising ‘Yes!’
The efforts in transitioning to a full dual-stack environment are proceeding, albeit at a somewhat ponderous pace. Services are migrating over to dual-stack platforms, and John’s analysis of popular websites points to a continuing uptake on IPv6 capability. At the same time, the deployment of dual-stack in access networks is following a similarly steady pattern. One the whole, user deployment of IPv6 follows the pattern of economies with a higher GDP per capita level, although Sweden, Norway, and Spain are exceptions at one end of the GDP per capita spectrum and India’s ranking as the highest level of IPv6 deployment is a marked exception at the other end.
The end point of all of this should be that once the level of dual-stack deployment reaches some yet to be determined critical mass it will then be viable to sustain IPv6 clients and IPv6-only services. At that point, the pressure will be on for remaining IPv4-only services and access networks to install IPv6 services, and this would see the rapid decline in the use and value of IPv4. If we can’t determine in advance when we will reach that point, then what indicator would signal to us that such a point is nigh? Counting services or user populations is fine, but it does not really assist in identifying the threshold point of change.
My preferred metric is the price of IPv4 addresses on the transfer market. When price starts to decline that’s a clear signal the market has made the transition and won’t see IPv4 as a critical-to-have resource. Are we there yet?
We’re nowhere near that point using that line of reasoning. Over the past 12 months the market price of IPv4 addresses has doubled to around USD 50 per address, according to the reports from IPv4.GLOBAL (Figure 1).
How long will this protracted transition continue? Nobody knows.
Quantum key distribution
In December 1926, Albert Einstein wrote in response to a letter from Max Born describing the random and uncertain heartbeat of quantum mechanics that: “The theory produces a good deal but hardly brings us closer to the secret of the Old One. I am at all events convinced that He does not play dice.”
I will not pretend that I have any useful clue in the area of quantum mechanics, quantum computing, and quantum cryptography, and while I truly appreciate the effort that Melchior Aelmans put into his presentation on Demystifying Quantum Key Distribution, I think the best I can do here is to note that in the area of cryptography, at the very least, this is an important topic, and secondly, you really need to review Melchoir’s material yourself and not any second-hand interpretation or commentary that I could offer here. His presentation pack has a useful reading list as well for those who are motivated to dig deeper into this fascinating topic.
NANOG 84
And that was NANOG 83 for me! Next time it’s NANOG 84 in Austin, Texas, running 14-16 February 2022. And I guess we are all saying hopefully to each other ‘see you there!’.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.