Micro BGP Suite: the Swiss Army Knife of routing analysis

By on 16 Nov 2021

Category: Tech matters

Tags: , , ,

2 Comments

Blog home

It has been a few years since my colleagues and I at the Isolario project launched ​​BGP Scanner to assist researchers with analysing increasing amounts of routing data in a reasonable time. This has continued to be a passion project of mine and has led me to develop the Micro BGP Suite project. In this post, I’ll discuss its evolution from BGP Scanner, its advantages, and possible uses.

From BGP Scanner to µbgpsuite

BGP Scanner proved to be a useful tool for many network analysis projects, including lib_bgp_data, to wade through the vast amount of MRT data provided by the multitude of benevolent route-collecting projects.

Much of its usefulness is owed to its powerful filtering engine and its easily parsable text output, which allow painless integration with larger data-processing workflows, and possibly scripting languages or external tools, by virtue of simple Unix-style output piping.

Despite this, BGP Scanner has some issues with its filtering capabilities mostly due to backwards compatibility concerns with its predecessor — MRT_data_reader.

This, among other reasons, motivated me to take a rather radical approach, and take the tool to a stand-alone project, rewriting the entire codebase from scratch, taking advantage of my past experience, and freeing myself from the burden of backwards compatibility.

The result of this choice is the Micro BGP Suite (µbgpsuite or ubgpsuite for short, depending on your keyboard proficiency with Greek).

So, how’s the Micro BGP Suite better than BGP Scanner?

BGP Scanner had a tradition of being fast at what it did. The Micro BGP Suite continues this with recent benchmarks demonstrating that bgpgrep, the successor to the bgpscanner command within µbgpsuite, can deliver between 12% to 20% additional performance on a middle-end laptop.

There are more improvements within bgpgrep too, such as better compatibility with less common MRT dump formats and several bugfixes, which make it more reliable. But, by far, its new command line interface and filtering options vastly outshine any other achievement.

To illustrate, let’s put bgpgrep to use for a significant MRT data set. Assume we wish to find out how many bogon ASNs and bogon addresses appeared during July 2021.

First, we’ll lay down our study by fetching the relevant data from a route collecting project of our choice. In this case, we’ll use RIPE RIS RRC00. By taking the first RIB snapshot of the month and every subsequent update, we obtain 50GB worth of gzip-compressed MRT data (corresponding to about 646GB uncompressed).

$ du -hc rrc00/bview.*.gz | tail -n1    # RIB size
1.4G    total
$ du -hc rrc00/updates.*.gz | tail -n1  # updates size
49G     total
$ gzip -l rrc00/*.gz | \
    awk '{ megs+=$2/1000000 } END { printf "%.2fG\n", megs/1000 }'
645.88G

It is relatively easy to use bgpgrep’s filtering option to find out whether a BGP message contains any bogon ASNs in its AS_PATH. However, it is harder to tell whether a bogon address is present, because bogon addresses change over time. Thankfully, Team Cymru provides an up-to-date list that we can use for our study.

The following script downloads the up-to-date full bogon address list from Team Cymru and adapts it with some sed-fu to generate a suitable filter list for bgpgrep; we’ll save it under our current directory with the name download-fullbogons.sh:

#!/bin/sh

# Be nice and only download bogons once a day.
if [ -f fullbogons.txt ] && [ $(find fullbogons.txt -mtime -1) ]; then
    echo "Bogon address list already up to date."
    exit 1
fi

# Generate updated bogon list
echo "Writing bogon list to: fullbogons.txt"

( wget -O - https://www.team-cymru.org/Services/Bogons/fullbogons-ipv4.txt
  echo
  wget -O - https://www.team-cymru.org/Services/Bogons/fullbogons-ipv6.txt
) | sed 's/#.*$//g' >fullbogons.txt

Our resulting directory layout is:

$ ls
download-fullbogons.sh
rrc00

Now we are all set for our analysis. For ease of demonstration, we simply print the number of messages containing bogons:

$ sh ./download-fullbogons.sh
Writing bogon list to: fullbogons.txt
$ bgpgrep rrc00/*.gz \
    -bogon-asn -or -subnet fullbogons.txt | \
    grep -E '^+|^-|^=' | wc -l
1344411

The bgpgrep command reads as follows:

For every gzip file in rrc00/, print each BGP message containing bogon ASN, or containing subnets of any prefix listed within fullbogons.txt.

Results are piped to grep, which will leave only entries coming from announcements, withdrawals and RIB snapshots, respectively. Every line of output by bgpgrep is prefixed by a character identifying its nature:

  • = snapshot
  • + announcement
  • – withdrawal
  • # BGP state change

Events unrelated to BGP updates aren’t subject to filtering rules and might cause miscounts in our analysis if left there. In the end, wc enters into play to count the total number of lines — bgpgrep produces one line of output for every BGP message or event. We choose to include withdrawals in our count; a slight alteration of the grep expression may easily restrict matching to announcements alone.

To dig further into our demonstration, let’s elaborate on this example and do some timing. What follows is not a benchmark, but rather a regular execution under a KDE Plasma session running various applications in the background (such as Firefox), on a middle-end laptop with the following configuration:

  • CPU: Intel© Core™ i7-8565 @ 1.80GHz, 4 physical cores, (8 with hyperthreading)
  • Memory: 16 GB RAM DDR4
  • Hard disk: SAMSUNG MZALQ512HALU-000L1
  • Kernel: Linux 5.10.62-1-lts SMP x86_64 GNU/Linux

This is an honest performance you might get analysing that amount of compressed data on a similar machine, under average load.

Timing of the example above:

$ time bgpgrep rrc00/*.gz \
    -bogon-asn -or -subnet fullbogons.txt | \
    grep -E '^+|^-|^=' | wc -l
1344411

real    59m0,248s
user    58m43,747s
sys     1m41,248s

Restricting the match to the first seven days of July, note how the more restrictive the match, the faster the performance:

$ time bgpgrep rrc00/*.gz \
    -timestamp ">=2021-07-01" -and -timestamp "<2021-07-08" \
    -and \( -bogon-asn -or -subnet fullbogons.txt \) | \
    grep -E '^+|^-|^=' | wc -l
273210

real    28m36,568s
user    28m14,241s
sys     1m9,260s

Obtaining a list of direct peers propagating the bogons:

$ time bgpgrep rrc00/*.gz \
    -bogon-asn -or -subnet fullbogons.txt | \
    grep -E '^+|^-|^=' | \
    cut -f9 -d'|' | sort -u >peerlist.txt

real    49m15,492s
user    49m7,445s
sys     1m36,730

This last example further relies on the bgpgrep output format always placing the peer address and ASN of each BGP message at the 9th field, thus making it trivial to extract and process it by itself. With more relaxed filtering specifications, performance is closer to the first experiment.

For reference, what follows is a gross estimate of the decompression overhead. We measure this by simply inflating all data to /dev/null. This serves to better grasp how much time bgpgrep has to pay to solely decompress its input — a necessary cost unless MRT data is decompressed in advance:

$ time zcat rrc00/*.gz >/dev/null

real    30m57,084s
user    29m38,916s
sys     0m36,911s

Take a dive into the Suite

bgpgrep is a decent Swiss Army knife for quick and dirty network analysis on huge MRT data sets. Its flexible and intuitive filtering options, coupled with a convenient output format, go to great lengths to provide interoperability with other familiar command line tools and scripting languages.

The Micro BGP Suite strives to provide similar tools, essentially simple and efficient in their operations, capable of achieving complex results when combined. Clear task division and effective interoperation are encouraged, rather than hidden behind a wall of complexity. Such flexibility is key to overcoming unanticipated needs.

There is more where bgpgrep came from, including peerindex — see my post on The Micro BGP Suite.

If you want to know more, be sure to check out The DoubleFourteen Code Forge website and its git code repository.

Lorenzo Cogotti is the founder and lead of the DoubleFourteen Code Forge, a non-profit community promoting independent research and good software practices.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Top