Border Gateway Protocol (BGP) correctness and security are becoming increasingly more relevant for Internet-scale operators, due to the catastrophic effects that we observe when mistakes happen — Australian BGP outage 2012 — and the implications of attacks in routing space — recent bitcoin attacks using BGP that sparked novel research in these types of events.
Luckily with more attention, initiatives such as Resource Public Key Infrastructure (RPKI), strict route whitelisting, and BGP Security (BGPsec) are gathering more traction and will make global operations more reliable. Equally important is the need to monitor for BGP anomalies.
For close to a decade, the Colorado State University’s BGPMon project, in partnership with RouteViews, has collected, archived, and championed to make BGP data more accessible to researchers and the general public. We believe increasing the visibility of the BGP is needed to ensure the trouble-free operation of our networks.
The problem, of course, is the sheer size of data that we need to go through to find a simple event — a quick ‘back of the envelope’ calculation can be performed as follows: the current size of the archive served by bgpmon and routeviews spans 20 TB of data. The median BGP message size is around 200 bytes. This means the data comprises around 10 * 10^10 events.
Currently, BGPMon and Routeviews together have 26 geographically diverse collectors. Each of these collectors is peering with approximately 25 peers, which all give back to the projects a full table view of how they view routing on the Internet. This means approximately 500 peers in total are propagating BGP updates to the collectors.
The total number of messages that we store at this point in time is around 20 TB of data. With a median BGP message size of 200 bytes, this is approximately 100 billion messages collected since 2001.
As such, we are working on a number of features to assist users with researching this large number of binary data.
Protoparse is a low-level library that we have implemented which parses binary messages in Protocol Buffer messages. This enables users to more easily get our protocol buffer specification and parse and analyse messages in their language of choice.
gobgpdump is a straightforward tool that is based on protoparse and acts as the standard bgpdump. Due to the nature of protocol buffers, we can easily export the messages into plain text, JSON, XML or other formats with minimal changes to the tool. The tool is also made to understand standard directory structures that can contain MRT exported data and can work in parallel, maximizing the use of system resources. It has basic filtering, based on AS-path and prefix features.
We have set up an online archive service, giving users access to the whole archive of BGPMon and RouteViews data. Users can export messages in one of four forms — json, pb, mrt, and text — as well as access basic statistics on archived data.
What we are actively developing is the ability to make messages even more available and queryable by backing this vast amount of data in SQL databases. We have developed ways for peering with BGP routers and ingesting the data in real time without having to touch the filesystem and to allow digging into interesting events by writing expressive SQL.
We are also planning to use a multi-tenant distributed database setup for building complex setups that would allow other organizations to either:
- Have their own private installations and sync data tables from the BGPMon project in order to have a vast Internet scale view of routing on their prefixes of interest, or
- Make it easy for them to share their views on BGP in our ever-growing repository.
Stay tuned for more updates in the meantime, get in touch if you’d like to peer with us at email@example.com, or leave a comment below on what other features you’d like in BGPMon.
Spiros Thanasoulas is a researcher at the Network Security Group of Colorado State University.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.