Those with technical knowledge of how Internet routing works will already be familiar with the Border Gateway Protocol (BGP), a fundamental Internet protocol enabling the exchange of reachability information to drive the vast quantities of traffic that cross the web worldwide each day.
While the exchange of BGP information tends to be complex and, at times, error-prone, there has also always been a lack of a streamlined, standardized toolset when it comes to monitoring how routes are being exchanged and why particular paths are chosen or not. This often creates something of an air of mystery around these processes and makes it hard to see how well things are running. And although it is possible to use BGP to monitor itself, this falls short on insight compared with having an independent monitoring plane for the protocol.
The situation is improving, however, as a new protocol — known as the BGP Monitoring Protocol (BMP) — is being defined (RFC 7854) and adopted, promising some key advances and unprecedented opportunities to streamline BGP data collection. Developing solutions relying on this protocol presents a clean, standardized way to run this process that is straightforward and easy to operate, and for which the pathway has opened in the age of big data.
Big benefits, big data
With vendors now shipping the first releases of software that include the BMP code, the improved visibility opens several key use cases that will improve operations around BGP:
- Tighter security, such as in cases of attempted route hijacking.
- The ability to gather far more granular statistics to check the stability of routes and their historic performance.
- The ability to evaluate and troubleshoot issues linked to propagation of information both within a single router and among different routers.
A key advantage of BMP is that it allows all paths heard by a peering router to be learned rather than only the best one, as with traditional BGP monitoring. This means that greater insight is provided into why a certain path or set of paths was chosen. Additionally, every BMP message is accurately timestamped for further insight into the chain of events.
By making BGP more observable, BMP contributes to the correct functioning of the control plane. In addition, through the correlation of its data to the reliability of pipes and volumes of traffic, the protocol aids adherence to Service Level Agreements (SLAs). Finally, it enables closed-loop operations, whereby information verifying whether processes are functioning as intended is fed back to reduce errors and boost stability.
Using BMP, key statistics can also be obtained at each phase of the routing process, showing all reachability information at every processing stage. In this way, it allows analysis of big data relating to the BGP system, enabling it to be improved over time.
Furthermore, the fact that BMP has been under development for many years — since around 2008 — means it has had plenty of time to evolve to deal with several key issues through the joint efforts of many operators, vendors, industry bodies, and others. And the work is not finished, with improvements and standardization being carried out on an ongoing basis to make the system ever more efficient and remove bugs.
One way that BMP is being improved is through work to enable support for so-called type-length-values (TLVs), which unlock a wide amount of extra information on routes, such as identifying which paths are active and which are being used as backup, the specific node and associated policy responsible for filtering out a route, and potentially countless other applications.
There is also work being done to speed up the recovery process when a session is broken between a router and a monitoring station, avoiding potentially lengthy delays involved in the resending of information when the session restarts.
All this will lead to the creation of a stronger BMP over time and solve longstanding issues, with the protocol itself spurring more efficient functioning, opening unprecedented insight into BGP, and aiding the overall stability of the Internet to the benefit of all. Now is the time to make people aware of its benefits, allowing these key advantages to be harnessed as soon as possible.
Paolo Lucente is Principal Software Development Engineer and part of the Network Tools team at GIN, the Global IP Network of NTT. He works in the areas of network telemetry data analysis and collection.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.