Since the start of 2022, DE-CIX had been working towards implementing an Ethernet Virtual Private Network (EVPN) onto its peering platform. By mid-December 2022, the company had performed its largest-ever network upgrade during ongoing operations — upgrading the DE-CIX Apollon platform to what we call ‘Peering LAN 2.0’. This post will explain how we did it without a single disruption for customers.
Challenges of large Peering LANs
If you run large Peering LANs, as DE-CIX does, there are a number of challenges. DE-CIX Frankfurt is our largest Peering LAN with over 1,000 ASNs and a lot of peering sessions — at peak times there are 14.4Tbps of traffic over this Peering LAN.
If you imagine all these routers connected to the Peering LAN, using IPv4 and IPv6, the result is that you usually have a lot of ARP/ND traffic, also called Broadcast, unknown-unicast and multicast traffic (BUM) traffic. DE-CIX Frankfurt, for example, counted 2Mbps of Address Resolution Protocol/Neighbor Discovery (ARP/ND) traffic, which is just too much for some of the routers. Another issue is that Proxy-ARP/ND at customer routers is an attack vector to the whole Peering LAN.
Motivation: What led to this migration?
Usually, the traffic just flows between routers. It can happen that one router switches to a proxy ARP/ND, which means that the router is replying to the ARP/ND discovery. In some circumstances, all the traffic goes to this particular router and overwhelms it.
We worked on tools to detect if something like this is happening, but we couldn’t prevent such an incident technically — we could only react to it. It’s catastrophic to the Peering LAN because traffic is not flowing smoothly anymore. We did our best to monitor and shut off routers that switched on proxy ARP and ND automatically. The impact still took 10 to 20 minutes depending on how quickly we could detect it. Even if it only happened twice a year, depending on how carefully people configure their routers, we wanted to get rid of it.
Consequently, we had been working on introducing EVPN on the peering platform since the beginning of 2022. Considering the increasing number of participants (especially in New York and Frankfurt), the introduction of EVPN including proxy ARP/ND was required to get the exponentially growing broadcast/multicast traffic in the Peering LANs under control and to reduce the load on customer routers.
Additionally, further security features based on a proxy ARP/ND agent were activated in accordance with RFC 9161, and the protocol stack of the DE-CIX global network will be expanded to include Resource Reservation Protocol-Traffic Engineering (RSVP-TE) and Seamless Bidirectional Forwarding Detection (SBFD).
Rolling EVPN out with minimum downtime
We spent eight months preparing everything and doing all the tests. If you want to build a new network you can easily deploy Peering LAN 2.0, but if you have a large network like DE-CIX, with 40 locations globally and over 3,000 customers connected, you need to plan the rollout carefully and be very cautious with the migration.
We looked at different migration scenarios and chose to do it service by service, Peering LAN by Peering LAN. With the goal of absolutely minimizing downtime for our customers, we needed to test it — there would only be one chance to get it right when we did the migration.
We built parts of the networks, for example, Frankfurt or New York, in our lab to do tests. Hardware tests are good, but software tests are even better because you can make changes quickly. A lot of the networks were emulated in software so that we could run tests repeatedly and detect if, for instance, something was wrong with our configuration.
We found out that one type of line card we were using wasn’t capable of running this feature set. After fixing this issue, we tested via hardware and software. For the hardware, we recreated critical parts of the network in our lab. For the software, we emulated the network. We detected anomalies in the configuration of the Peering LAN and tested cases for configuration generators.
We worked mainly with Nokia to standardize how (what we call) Peering LAN 2.0 works. This resulted in RFC 9161, where we basically enhance EVPN for the Peering LAN use-case. As we take port security very seriously at DE-CIX, we ask our customers for the MAC address of their routers that connect to the Peering LAN.
We also know the IP address we hand out to the customer, so we can then feed these bindings between the IP addresses and the MAC into the ARP/ND Proxy Agent — making sure this agent snoops the ARP/ND requests that come from customer routers and that the requests don’t flood the Peering LAN. By having this static binding, the DE-CIX router can directly send back the ARP/ND response. Even if a router is switching on Proxy ARP/ND, the attack vector has disappeared. There is no flooding of ARP/ND requests, the load on customer routers is reduced, and the attack vector is removed.
On 1 November 2022, we began the rollout with DE-CIX Phoenix. Because it had fewer customers connected, it’s a pretty simple setup from our perspective. The migration was done in a few minutes, but we waited a week to see if it was stable, before directly moving on to DE-CIX Frankfurt. We then migrated one service at a time on a weekly basis. On 29 November 2022, we migrated all the remaining appearing LANs in the US, and six weeks after that, we completed the migration of all the other interconnection services, such as DirectCLOUD and Closed User Groups (CUGs). It took us six weeks, and it went flawlessly.
All customer connections were migrated to an EVPN, which offers improved stability and additional security features.
After the successful migration, we can now see that the Peering LAN 2.0 solution with EVPN has reduced the ARP/ND traffic significantly. In addition, EVPN prevents certain attack vectors as well as some sources of error, thus increasing the security and robustness of the entire platform.
Customers tell us that CPU usage on their routers connected to DE-CIX has dropped by as much as 25% due to the elimination of ARP/NDP noise. This is an impressive increase in efficiency, especially in times when there is an ever-growing focus on the energy efficiency of data centres and IT in general.
The introduction of EVPN reduces unnecessary network noise on the platform, thus lowering the load on all connected routers and strengthening the security of DE-CIX’s IXs. The version of EVPN now deployed on the DE-CIX exchanges is based on an extension of the protocol RFC 9161 including additional security features — a project in which DE-CIX played a significant role.
Thomas presented this work at NANOG 87. The recording is available to view:
Dr. Thomas King is the Chief Technology Officer (CTO) at DE-CIX, interested in pushing the boundaries of what is possible in terms of high-bandwidth access technology and security solutions for IX platforms, and trailblazing the automation of IX services.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.