Following on from my earlier post on where traffic should exit a network, in this post, I’ll cover the ways to give surrounding Autonomous Systems (ASes) a hint about where traffic should enter a network.
This is one of the most vexing problems in BGP policy, so there will be a lot of notes across these posts about why some solutions don’t work all that well, or when they will and won’t work.
There are at least three reasons an operator may want to control the point at which traffic enters their network, including:
- Controlling the inbound load on each link. It might be important to balance the inbound and outbound load to maintain settlement-free peering, to equally use all available inbound bandwidth, or to ensure the quality of experience is not impacted by overusing a single link.
- Accounting for geographically dispersed entry points. For instance, while the two entry points into AS65001 might appear to be topologically close, they might be geographically diverse, with one being in South America and the other being in North America.
- Ensuring flows requiring symmetric paths are properly handled. A common use case is the use of stateful packet filters or port address translators, both of which require inbound and outbound traffic to be routed through a single device.
All these reasons apply to all kinds of network operators, so this section will examine the various techniques used to control traffic entry points from the perspective of AS65001 in the following network:
Policies designed to control the point at which traffic enters an operator’s network will often conflict with policies designed to control the point at which traffic exits some other operator’s network. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network from AS65002 may conflict with AS6500’s policy that all traffic destined to 100::/64 leave its network by being forwarded to AS65003.
This effect is not just seen between directly connected ASes. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network through AS65002 may conflict with AS65004’s policy that all traffic to that same destination exit the network by being forwarded to AS65003.
The original intent of BGP policy was that the policy of the sender overrides the policy of the receiver, as expressed in the design of the metrics (the multiple exit discriminator, discussed below, has a lower priority than the preference). In real deployments, however, exit and entry policies are more fluid and entangled. These relationships will be considered in each of the sections below, each of which describes a different way to influence or control how traffic is destined to a single reachable destination.
Multiple Exist Discriminator
Multiple Exist Discriminator (MED) is a suggestion or request to neighboring ASes to forward traffic to reachable destinations along a particular path. For instance, AS65001 may desire for traffic being sent to 100::/64 be sent to B (Figure 1), rather than to A or through its link to AS65003.
However, the MED is not a transitive attribute of a BGP route. This means that if AS65001 sets the MED so that entry B is preferred, and sends this MED to AS65003, AS65003 will strip (or reset) the MED before advertising 100::/64 to either AS65004 or AS65002.
MED, in this case, would be useful to help AS65002 determine whether to send this traffic to A or B, but not whether to send the traffic to AS65001 or AS65003. AS65002 will, instead, rely on local policy, primarily preference, to determine which exit point to use. If AS65002 determines the best path to 100::/64 is through one of its direct connections to AS65001 (either A or B), and there is no other reason for AS65002 to choose one path over the other, the MED will be used to determine which path to use.
Because AS65003 only has one connection to AS65001, the MED will not impact its bestpath decision at all. Because AS65001’s MED has been reset or stripped in all the routes to 100::/64 AS65004 receives, AS65001’s MED will not play a role in any bestpath decision there, either (AS65002 or AS65003 may set the MED when sending routes to AS65004, which may influence the path AS65004 chooses, but again only when choosing between multiple connections to the same peering AS).
Because MED is only considered nominally useful, it is often stripped of routes when they are received from another AS.
AS-PATH prepending
Since the length of the AS-PATH plays a role in choosing which path to use when forwarding traffic towards a given reachable destination, many (if not most) operators prepend the AS-PATH when advertising routes to a peer. Thus, an AS-PATH of 65001, when advertised towards AS65003, can become {65001,65001} by adding one prepend, {65001,65001,65001} by adding two prepends, and so forth. Most BGP implementations allow an operator to prepend as many times as they would like, so it is possible to see twenty, thirty, or even higher numbers of prepends. Note, the usefulness of prepending is generally restricted to around two or three, as the average length of an AS-PATH in the global Internet is around four hops.
If AS65001 would like traffic destined to 100::/64 to enter from AS65003 rather than AS65002, it can prepend the AS-PATH at every peering point with AS65002 (A and B) with two hops (sending {65001,65001,65001} to AS65002). If preference, MED, and all other metrics are equal, AS65002 would then prefer the path with the shorter AS-PATH through AS65003, rather than the path directly into AS65001 (either through A or B).
That all metrics are equal is not likely, however. AS65002 will probably have a preference set so routes learned directly from customers (such as AS65001) are selected over routes learned from peers (such as AS65003). The impact of prepending on route selection by directly connected peers is, therefore, uncertain.
Moving one step out in the network, consider the routes received by AS65004 to reach 100::/64. There will be one route along {65002,65001,65001,65001}, and another with an AS-PATH of {65003,65001}. All other things being equal (same preference, and so forth), AS65004 will choose to send traffic destined to 100::/64 through AS65003 rather than AS65002. How likely is it all the other BGP metrics will be equal at AS65004? So long as the peering between AS65004, AS65003, and AS65002 are all of the same types, the odds are high, therefore, prepending can help move some (not all) traffic from one inbound link to another.
Because AS-PATH prepending has variable results over time, operators using this technique often ‘just try it’ to see what the effect will be. There’s no real way to predict how effective prepending any number of times will be in moving traffic from one inbound link to another.
What if AS65001 does not want traffic destined to 100::/64 to traverse AS6505? For instance, suppose AS6506 is across an ocean, mountain range, or other difficult-to-cross geographic feature. AS65005 crosses this geography via a satellite link, while AS65004 crosses the same geography via an optical cable. Since optical cable runs can provide better delay and jitter than a satellite link, AS65001 may desire to choose which of these two ASes is traversed to reach 100::/64.
This cannot be directly accomplished using AS-PATH prepend, as both AS65004 and AS65005 will receive the same prepended path.
To express this kind of policy, some operators allow their customers to set communities that cause the operator to remotely prepend a given route advertisement. For instance, NTT allows their customers to set a community that will cause NTT to prepend specific routes when those routes are advertised to specific ASes; in this case, AS65001 could add the community 65421:65005 to the advertisement for 100::/64, which would cause NTT to prepend AS65001 when advertising 100::64 to AS65005, and not prepend anything when advertising 100::/64 to AS65004.
This technique is subject to the same caveats as using AS-PATH prepend locally — it may work in some situations, or it may not — because the local operator does not have visibility into the policies of the operators they are trying to influence.
Local preference via communities
This brings us to local preference via communities. Let’s look at longer prefix match and conditional advertisement from the perspective of AS65001 in the example network (Figure 1).
Communities and local preference
As noted above, MED is the tool ‘designed into’ BGP for selecting an entrance point into the local AS for specific reachable destinations. MED is not very effective, however, because a route’s preference will always win over MED, and because it is not carried between ASes.
Some operators provide an alternative for MED in the form of communities that set a route’s preference within the AS. For instance, assume 100::/64 is geographically closer to the {65001, 65003} link than either of the {65001, 65002} links, so AS65001 would prefer traffic destined to 100::/64 enter through AS65003.
In this case, AS65001 can advertise 100::/64 with a community that makes AS65001 prefer the route through AS65003 over the direct route to AS65001 (see 2914:450 on NTT’s list of customer set communities as an example).
Note, many of the communities described here have regional versions for more specific use cases. These operate on the same principles, just in a more restricted topological or geographical area.
Longer prefix match
While MED is often ineffective, and using communities is both restricted in range and complex to configure and manage, advertising a longer-prefix match always works, is simple to configure, and is easy to deploy.
For instance, if AS65001 would like traffic destined to 100::/64 to only enter from AS65003, it may advertise an aggregated route, say 2001:db8:3e8100::/63 to both AS65003 and AS65002, and then advertise 100::/64 only to AS65003. Because all routing systems will select the prefix with the longest match first, the /64 through AS65003 will be selected over the /63 through AS65003 and AS65003, so the traffic always enters AS65001 the way the operator desires.
The overlapping, or covering, aggregate is advertised to provide backup reachability. If the {AS65001, AS65003} link (or peering) fails for any reason, traffic destined to 100::/64 will follow the /63 route, entering from AS65002. This is not optimal from the perspective of AS65001, but it keeps connectivity in place while any problems can be traced down and repaired.
According to Geoff Huston, a large percentage of the routes in the current global table are advertised for traffic engineering — to manipulate the point at which traffic destined to specific reachable destinations enters an AS.
Note, using longer prefix routes to control inbound route flows represents a ‘tragedy of the commons’ problem to the global Internet. There has been work into various mechanisms designed to remove these more specific routes from the routing table when they are no longer needed, but little progress has been made in implementing them, and none of these solutions has achieved widespread adoption and deployment.
Conditional advertisement
What if AS65001 has signed a contract with AS65003 to carry traffic only if both its links to AS65002 fail? In this case, AS65001 could advertise many more longer prefix specifics through AS65002 and one shorter covering route through AS65003.
This strategy, however, has two flaws. First, it requires AS65001 to manage the more specifics and covering routes as a set, making certain the pairs are correctly configured. Second, it could be that AS65001 does not want anyone to know about this backup arrangement unless and until it is used. This is sometimes the case when two competitors agree to back one another up, and neither wants anyone to know what their backup arrangements are.
To resolve these (and other) policy problems, operators can use conditional advertisement.
Conditional advertisement is conceptually simple. If a router does not have some route, x, in its routing table, it advertises some other route (given the route is in the local tables so it can be advertised). For instance, AS65001 might configure the router at C to advertise 100::/64 only when it does not have some other route.
The hardest part of configuring conditional advertisement is knowing when to trigger the advertisement of the alternate path. Using the lack of reachability to the destination itself (100::/64 in this case) as the trigger will fail in some circumstances, and will always require the global table to converge before the alternate path is advertised. Instead, conditional advertisement is often triggered by the lack of a route between the BGP speakers being ‘watched’ (in this case, the two {65001, 65002} links) learned from within the AS (within AS65001, rather than through the global routing table).
Triggering the internal state of a link directly connected to a router managed by the local operator, and carried through internal convergence, removes external convergence from the time required to begin advertising the alternate path.
In the next post in this series, I’ll look at do not transit options in a network. Feel free to leave a comment if you have any questions.
Russ White is a Network Architect at LinkedIn.
This post is adapted from a series at Rule 11.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.