When a network provider invests in building its network infrastructure, the design goal is a scalable network that can handle not only current demand but also future traffic growth, with as few upgrades and as little capital expenditure as possible.
Traffic engineering is the solution that network providers use when they need to optimize their existing IP/MPLS networks and provide more services to customers.
Most IP networks use hop-by-hop-based routing, where the shortest path between a source and destination is used to route packets, even when the shortest path is not the best-performing one.
Traffic engineering allows the provider to ignore the shortest-path rule by sending traffic over longer but less congested links. This helps alleviate network congestion and enables the network provider to maximize the use of the existing network infrastructure.
Yet, the widely used traffic engineering solutions of today also carry a few challenges. Here’s a look at how traffic engineering has evolved from an offline model to the on-device model in use today, the challenges this model brings, and how software-defined networking (SDN) can make it a whole lot easier.
The traffic engineering evolution
Offline model
Network engineers have used traffic engineering for a long time. The early implementations used an offline model, which involved loading the current network topology and traffic demand matrix into a planning tool that calculated the best paths using optimization algorithms. This method, while workable, could not handle the network changes efficiently enough because of several challenges. The biggest issue came from having to add the network topology into the planning tool. This is easy when there are only a handful of routers, but it’s a major challenge when the network has several hundreds or thousands of routers.
The difficulty became even greater when the network topology changed before the next traffic engineering adjustments, which meant adding the updated topology once again to the planning tool. The other challenges were calculating the traffic demand matrix and how long it took the optimization algorithms to calculate new paths, something that typically ranged from a few hours to even a couple of days in very large networks. All this meant that when a link failure caused congestion, the network provider could not find new paths quickly enough to overcome the issue.
On-device traffic engineering
Then came on-device traffic engineering using RSVP-TE and the constraint-based shortest path first (CSPF) algorithm. This involved the router, rather than external planning tools, managing the traffic engineering function. This method leverages a new behaviour of routers in the IGP where they also flood the available link bandwidth. Traffic engineering tunnels are then set up between routers, and the utilization on these tunnels is monitored to understand the traffic demand matrix.
When increased utilization across a tunnel causes congestion, the headend router runs its CSPF algorithm to re-optimize the path and uses RSVP-TE to signal the new path to the other routers along the path. The CSPF algorithm used here is extremely fast, and because this model works from the routers themselves, the network topology is readily available. This allows it to respond in real time and overcome the limitations of offline traffic engineering. This is especially useful in alleviating congestion that is caused by link failures.
When does RSVP-TE become a challenge?
Here are the reasons why on-device traffic engineering using RSVP-TE is not the optimal solution.
One major challenge is that RSVP-TE requires tunnels to be formed between each router in the network to get the complete traffic matrix. This leads to what is known as the “n-squared” or “full-mesh” problem [PPT 342 KB], where too many tunnels are created. For example:
- A small service provider with 75 routers and 300 links can have 1,600 tunnels.
- A medium-sized service provider with 450 routers can have up to 20,000 tunnels.
- A large service provider with 1,900 routers and 8,000 links can end up with 132,000 tunnels.
It becomes extremely hard for a network engineer to configure and manage these tunnels.
The n-squared problem also creates another issue. In addition to the link status updates it carries, IGP must flood the bandwidth available on the tunnels across the network. But the high number of tunnels created results in IGP propagating more about the available bandwidth than the link up and down status.
The other challenge arising from RSVP-TE is the race conditions it triggers when a link fails. Take the case of the medium-sized service provider with 450 routers, 2,000 links, and 20,000 tunnels. While most of the links in this provider network may carry fewer than 200 tunnels, there can be one link that carries around 1,000 tunnels. If that one link fails, all the 1,000 tunnels on that link must find a new path. And that means all the headend routers have to run CSPF and re-optimize the tunnels.
This triggers a race condition where each headend router associated with the tunnels is independently optimizing for itself, going after the same finite amount of bandwidth without being aware of the other network routers’ requirements. In such a condition, some of the tunnels fail to find a path and re-optimize. The result is that, on a network-wide scale, re-optimization can be considered to have failed because not all tunnels have found a path.
Combined with the n-squared problem, this means that in a medium-sized service provider network, about 5% of the tunnels, which is nearly 1,200 tunnels, are down most of the time. Imagine how high these numbers can go in larger networks. It is very time consuming for engineers to triage why 1,200 tunnels are down.
How can SDN address traffic engineering challenges?
Each of the challenges has different solutions. Operators can achieve adaptive traffic engineering — quickly responding to a failure — with a real-time model of the network. The n-squared problem is addressed by creating as few tunnels as necessary, which reduces the IGP as well as the signalling overhead. Network providers can overcome the race conditions — triggered where each router independently tries to reserve bandwidth — by going back to the original global view of the network and demands that came with the offline traffic engineering model. Finally, software can help with the manageability issues that arise from having n-squared tunnels.
There is an approach that incorporates all these solutions: software-defined networking, or SDN.
The SDN-based approach comes from many recent network enhancements. Segment Routing, which can replace RSVP-TE (RSVP-TE can still be used in this approach), is less complex and simplifies the IP/MPLS control plane. It can set up any type of path in the network and also comes with a low overhead. A push-based telemetry such as YANG provides the traffic matrix, replacing NetFlow, which has been traditionally used to create traffic matrices. The real-time topology of the network comes from the SDN controller, which is part of the network control plane.
Finally, the actual task of traffic engineering is once again removed from the router and moved onto an SDN application. The SDN application not only responds to traffic demands based on the real-time network state, but it can also be programmed to handle future traffic demand. For example, it can reserve bandwidth for a big game later in the week, which cannot be done by routers that respond only to current demands. In addition, SDN applications do not need IGP to know the bandwidth availability and instead support both push-based telemetry (YANG) as well as the traditional NetFlow to get the traffic matrices.
The SDN application now takes care of computing the path and allocating the bandwidth for the entire network. This centralized approach addresses both the key challenges of on-device traffic engineering:
- It overcomes race conditions because the SDN app has a global network view, which allows for network-wide resource optimization, and
- It provides the ability to overcome the n-squared problem by creating tunnels only if needed and if they will have a positive impact.
Network providers also benefit from freedom of choice if traffic engineering is shifted to a stand-alone third-party application rather than depending on the SDN controller. This way, if they come across a new SDN application with more features or a better algorithm, they can shift to it without being locked down by the SDN controller or the dependent routers.
This is why the SDN application-based approach is the best solution for overcoming traffic engineering challenges. Network providers can transform their networks to be scalable and flexible to handle the ever-changing demands of users.
Cengiz Alaettinoglou is CTO and founding member of Packet Design’s R&D. He is currently working on real-time SDN analytics and developing service applications.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.