This article was originally posted on APNIC Labs. Read the entire article here.
NANOG 65 was once again your typical NANOG meeting: a set of operators, vendors, researchers and others for 3 days, this time in Montreal in October. Here’s my impressions of the meeting.
The opening keynote was from Jack Waters from Level 3, which looked back over the past 25 years of the Internet, was interesting to me in its reference to the “Kingsbury Letter”. As the referenced Washington Post article points out, in exchange for this government-sponsored monopoly, AT&T agreed in 1913 to operate as a public utility, eventually providing high-quality phone service to the vast majority of Americans regardless of income or geography. Kingsbury’s commitment to President Wilson would later be formalized and expanded by Congress into the legal obligations that still bind the modern successors of the old AT&T: Verizon, Sprint, and the entity that today carries the AT&T name. At the time the US Government had undertaken anti-trust actions against monopolies in the rail, oil, and tobacco industries, and were contemplating similar action against AT&T. AT&T managed to take the initiative and create in effect a government sanctioned privately owned monopoly.
Things are again changing. The Open Internet Order from the FCC is putting the former unregulated Internet business under more overt regulatory purview, and while the current rules are vague and not highly prescriptive, the FCC is now equipped with the clear ability to proceed with rule making in response to poor behaviours by Internet access service operators. We have already seen TWC’s proposed merger with Comcast being blocked, and instead TWC has bulked up with Charter. It certainly appears that the number of providers in this sector is shrinking as market expansion stops and operators start to look hard at their costs, and the three way tussle between carriage, content and regulation appears to be one that will keep a set of lawyers employed for many years.
Another interesting perspective I picked up from the presentation was that Jack felt that silicon switch capability was outpacing fibre. He pointed out the issues of fibre with 100G per usable channel and the resorting to use of ECMP and LAG in routeres as a means of synthesis of higher capacities that are not being provided on the fibre plant. What he was claiming was that silicon switches are more “dense” than WDM in fibres, so the string is now skinnier than the terminations. He is sceptical about the ability of this year’s SDN fashion, or NFV for that matter, to bridge this gap. It’s hard to reconcile these calls for ever high aggregate channel capacity out of fibre and the issues of memory speeds and the limitations imposed by memory on packet rates that come from higher channel capacities.
One other observation he made that struck me was that the dynamics of peering and interconnection have altered in recent times and this is due in part to John Oliver’s “expose” of this business. Has the role of determination of communications policy in the United States now a role that has passed from the Federal Courts of Appeal to the host of a satirical comedy television show?
Finally, the Security elephant in the room: with an Internet of Things who’s responsibility is it to fix it when the things inevitably turn toxic?
Roland Dobbin’s presentation on DDoS for Bitcoins was an interesting perspective of the activity of a particular criminal element performing online extortion. One interesting aspect here was that it was not clear from the presentation if this was a very active individual or an organised group.
The later DDoS tutorial was a walk through the toxic behaviour taxonomy. I’m not sure that there was anything “new” in terms of attack behaviour or the nature of the weaknesses that these attacks are exploiting. The hard truth is that the Internet of Stupid Things is the protein of Internet toxicity, and there is no real incentive for remediation here. It’s not a happy outlook given the poor quality of software that we are embedding into these things.
Avi Freedman presented on the many uses of sflow.
Many flow tools present aggregated information that is already “combed” by the flow tool. The disturbing part of this presentation is the concept of pervasive network level snooping on the data flows. It seems that part of the reason why routers have computing grunt is to tap the packets as they fly past and perform a basic flow analysis to spit out an aggregated view of user traffic.
Frankly, if you are at all concerned about the level of unauthorised and uncontrolled surveillance of users’ online activity, then much of this talk is about as good as advertisement for using encrypted connections for all user connections as any I have seen, or for adoption of encrypted VPN services. On a similar vein it to me that Google’s decision to use UDP as QUIC is an even better idea simply to confound the current TCP fume sniffers that sit out there in the network’s infrastructure!
The DNS track was a re-run of some presentations from the DNS OARC meeting held immediately prior to the NANOG meeting. I have already written on these presentations.
Being NANOG there was no shortage of BGP-related presentations
A session looked at the state of efforts to secure BGP. OpenDNS has taken over BGPmon. It’s a routing anomaly detector that attempts to discern the difference between routing updates and the normal cut and thrust of link level unreliability, and the intended target of the anomaly detector is routing hijacks and other hostile forms of manipulating the routing system. Of course some routing attacks are such that everyone sees the information, but other more subtle attacks are directed to a particular network, and the propagation of the false information is deliberately limited.It’s not easily determined just how fine a granularity is used in the collection of BGPmon data, which means that its unclear to what extent focussed routing attacks are detectable by this setup.
The session had another presentation on IRR tools, and a repeat of a proposal currently circulating round the traps to try and find a consistent subset of IRR route objects. There is a salutary lesson here, and it’s probably more about human nature rather than technology: it may be easy to solve your particular problem by building just another instance of a routing registry that meets your particular needs, but in the end such actions contribute to the entropy of the Internet, not reducing it!
There is a long term underlying issue with BGP security that we have been working on for a couple of decades now. That’s a solid indicator that it’s a difficult problem, and what was more evident from this session was not the specifics of the various approaches considered here, but the more general observation that we are just not finding a feasible approach to detecting “good” from “bad” in inter-domain routing that meshes with practical constraints of a diverse set of players in the routing system. These operational responses strike me as more band aid. We are still searching for some form of approach that supports piecemeal deployment, accommodates a variety of operational models, does not get bogged into a morass of heavy weight crypto.
BGP BMP has been slowly gestating in the GROW WG of the IETF for many years now. The specification has now celebrated its 10th anniversary of sitting the IETF drafts repository!
I suspect, sadly, that this is not a record for today’s IETF. BGP monitoring used to either put the BGP speaker into debug mode and push out as debug messages all the input and output BGP messages, or just sit as a BGP peer and collect the post-processing state updates of the BGP speaker being monitored. BMP is a standardisation of the debug approach, allowing a BGP monitor to track the state of the BGP speaker, tracing both the incoming BGP updates and the state changes that occur within the BGP speaker.
This presentation reported on combining BMP with Apache Kafka, which is an open source message bus. The presentation looked at a number of scenarios where the combination of BMP and Kafka could be used as an effective local BGP anomaly detector, including route leak detection and inconsistent origination.
Much has been said about the growth in the BGP routing tables, and much more will be said no doubt.The size of the IPv4 BGP routing table is now at 575,000 routes, and there are no signs that this growth is slowing down anytime soon. Of course while all these route entries are jammed into high speed memory of every ‘core’ routing in the Internet, the issue is that most of this information is unused and is unnecessary.
At any point in the network a BGP speaker carries some 250,000 superfluous routes that are more specific routes of covering aggregate routes that reflect the same intended path to the destination. DRAGON is an attempt to try and perform proxy aggregation in a router to try and remove much of this superfluous information. Other than an academic exercise, I’m really not sure what the value is.
If you really wanted to reduce the route decision space a FIB compression algorithm has a far greater potential outcome in terms of reduction of the size of the decision space, or if you really wanted to compress the routing state then perhaps looking at actual data would help.
As Brian Field of Comcast reported at NANOG 64, in a 6 day period in their network they observed that some 415K entries had no traffic at all. Some 90% of the data traffic handled by the routers was directed to 3,156 distinct routing prefixed, and 99% of the traffic was sent to 25,893 prefixes. One possible response is to load the in-line FIBs with a far smaller “core” of active IPv4 prefixes, and send a default route via a tunnel to a nearby Internet egress point.
The relationship between Content Distribution Networks and other parts of the Internet has never been completely smooth. The work on Client Subnet in the DNS is one example of efforts to make the DNS a little chattier about who is asking the question to help the responder give the “right” answer. But of course the relationship between two addresses on the Internet depends on the routing state, and this is the topic of the “Whack-a-Mole” routing presentation.
BGP does not use “performance” or even “latency” as a routing metric. It tries to minimise the AS path between any two addresses and sometimes that’s not optimal for some consumers of the outcome (such as the CDNs in this case). To date there is no clear answer — routing does not allow the form of differentiated forwarding that is being called for, and no Virginia, SDN won’t help here. One very common response for the larger CDN providers is to place replicated content sources everywhere, so that long line routes are minimised simply by making every client “close” to an instance of the data source. Or you can alter the content application to be more tolerant of longer latency and higher jitter with exposing all of the vagaries of the long line connection to the end user. Or you can complain at NANOG that BGP routing doesn’t do exactly what you want it to do!
WiFi: See slide 5 of the presentation — enough said! Actually both the slides and the presentation itself are worth looking through — there is some excellent material here to consumers of WiFi services just as much to designers of WiFi service networks.
I sat in on Rafal Szarecki’s presentation on the internals of packet buffering inside routers. He explains that sometimes the external behaviour of a switching system may be completely anomalous, such as when a flow through the unit sees packet loss even when the ingress and egress port utilisation rates are less than 1% of the port capacity.
Internally routers are multi-processing systems, and at the simplest level can be considered as a collection of ingress packet forwarding engines connected to a collection of egress packet forwarding engines, connected by a switch fabric. Evidently these days, routers use “virtual output queuing”, where instead of queuing packets against each output interface after they have been switched, each ingress element has as many virtual input queues as there are egress ports, and each egress grants switch access to requesting input ports.
When coupled with a crossbar switching element, that can switch multiple inputs to multiple outputs simultaneously, the result is highly efficient. However, this can lead to some surprising pathologies under load, where high flow rates across some ports can lead to high drop rates on supposed unrelated ports. The presentation pack is well worth a review if you are into this aspect of networking.
Cables and Carriage
The Telegeography presentation by Tim Stronge was exceptionally interesting. He talked on the current state of undersea cables.
Notably, there is the rise of purely machine-driven communication. When you look at the use of trans-Pac and trans-Atlantic over time, the large content providers now haul more volume of data than ISPs. People are not the reason why the bulk of data moves across these systems — it’s data storage systems priming their local storage, so, as Tim puts it, the major traffic component now machine-to-machine.ISPs barely exceed 50% of capacity.
Buying power on cables has shifted away from carriage resellers (carriers) to buyers (content). Telegeography estimate that there is some 12Tbs of traffic due to ISP operations, and 14Tbps of traffic due to content loading on the aggregate of the Trans-Atlantic systems.The content folk are now in a position to purchase fibre pairs of their own, or be a submarine cable partner.
Secondly, he observed the disruptive issues of price erosion on cables. The Monet cable is significant as an illustrating erosive price pressures on cable systems. The price today Miami-Sao-Paolo is four times the price of London-NY, LA-Tokyo and HK-Tokyo per 10G (these are selling at approximately $10K per month per 10G), so the massive price premium on the Miami – Brazil run exposes a build opportunity, which is being taken up by Monet. This is a content provider core anchor cable.
Some time back, Google was a major in Unity with a handfull of carrier partners and the carrier partners dumped their capacity onto the market. The LA-Tokyo transPac prices post-Unity for 10Gbs went down from $70K p.m. to $30K p.m. due to the strong influence of these dumping actions have over the market. Google is a core owner in Monet, and this time the other three are Algar Telecom (Brazilian SP consortium) Angola Cables (wanted to link this to a BR – Angola cable which is frozen so its a stranded asset) and Antel Uruguay (dramatically over-provisioned), so it looks like Monet will also dump into the market and Miami-Sao Paolo prices will plummet.
It should be noted that this business is not just a technology business, but a financial venture business. If the wildest expectations of a cable’s backers are achieved then they will be in a position to calibrate release of capacity in a cable at a level that continues to leave unmet demand, so that the price of access to a cable will include a hefty scarcity premium, and if all goes according to plan, the cable owners will generate a revenue stream that repays the original capital investment within two years, and continues to pay solid returns for up to two decades. But today such stories of financial windfalls are few and far between, and the environment has its far share of failed venture partners who are forced into fire sales of stranded assets. (e.g. Angola Cables).
Despite dumping and price slumps, undersea fibre pairs are disappearing — book capacity is now being sold off by the cables. Before today’s data centre content priming rush no one needed that much capacity, but these days the fibre pairs are being sold off quickly, so new cable development is likely in 2016 – 2017 (there has been no new cable trans Atlantic since 2003). These days there are new cables coming up but its Microsoft, Facebook, Google, etc. driving this. They are driving this not as resellers, but as owner/operators, and are not interested in creating artificial scarcity conditions, and the result is that resellers are seeing dwindling margins as the content industry pushes a regime of cost base pricing with marginal rates of return on the original build capital.
In this business, time can be crucial. If you can be faster relative to others then you can sell this speed differential. High Frequency Traders (HFTs) found a microwave link between Chicago and New York was faster than fibre because of the inherent advantage of propagation time over air (slightly lower than c) vs propagation time in fibre (0.65 c). Hibernia trenched and armored the trans-Atlantic run across the heavily fished Grand Banks. Telegeography calculate the advantage they managed to get from this slightly more direct route to be 2.9ms. HFTs will pay up to 20x (yes, twenty times!) the ISP rate for the fastest route. But HFTs do not have infinite demand, so the cable consortium needs to sell its remaining book inventory, which they will do with an induced latency via an extra spool on the cable so that the HFTs retain their relative time edge that they paid for, and ISPs will see a “longer” cable with propagation times comparable to other cable systems.
Carriers don’t know to price 100G capacity. Submarine cables were slow to the market compared to land prices, and the factor Teleglobe are using here to reach this conclusion is the price multiplier between 10G and 100G services. i.e. Is 100G 10x more expensive than 10G? Or 6x? They observe a lot of variance in the multiplier at present. There is a view that this multiplier for 100G services will stabilise at 4x to 5x 10G prices.
It appears that new sources of demand for international capacity (the current large scale demands for content priming) is leading to new infrastructure builds that displace traditional carrier resellers with owner/operators who use the capacity within their own business. The dumping of excess capacity on the reseller market disrupts this market leading to continued price erosion.
IPv6 at Telus
Telus, the connectivity provider to the meeting gave a presentation on their experience leading to their deployment of IPv6 in their retail products. Their results are clearly visible. They use MPLS on V4 in the core network, and implemented V6 as 6PE, which was a very minor change to their network architecture. However, as a consumer ISP, so a large part of this is a story of the high investment by their consumer ISP division in process and in having staff familiar with the process. In this context IPv6 is not the same as IPv4, and it can require careful consideration in some cases. There was a long lead time for this work: Telus started including IPv6 peering in 2012, some 3 years ago. Interestingly they used their conference provision of IPv6 for IETF, NANOG and ARIN taught them some valuable lessons about IPv6 in their production environment. They upgraded their managed Home Gateways, and sequenced this in a way that did not produce a large scale flood, using DHCPv6 as their prefix distribution mechanism.
They are doing V6 in their mobile network and evidently using NAT64/DNS64 for Apple iOS and 464XLAT for Android — though it appears that this is not a completed deployment — he cited a rather unique problem of Canadian users roaming too close to the US border and getting a handover dropout that lasted for minutes. They are working on this. They also had a rather strange problem of the PTMU packets being too big and getting dropped!
Quick talks for quick takeaways: My takeaway, the CZ.NIC folk, who have done great work with BIRD for BGP and the Knot Authoritative server are working on Knot as a DNSSEC-validating resolver. QNAME minimisation is included, and not only is it on by default — you can’t turn it off!
As usual with NANOG, the presentations, both as slides and the video are all archived. https://www.nanog.org is a good starting place to explore these for yourself.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.