BeauCoup: Answering many network traffic queries, one memory update at a time

By on 7 Jan 2021

Category: Tech matters

Tags: , , ,

Blog home

Multiple diverse queries, strict memory constraints

Network administrators often want to run diverse monitoring queries, to diagnose performance problems and discover security threats. As networks grow faster, traditional methods that analyse packet samples offline suffer from low accuracy due to lower sampling rates. 

An alternative approach is to run measurement algorithms directly in the network’s data plane, where packets are individually parsed and forwarded within nanoseconds. Modern programmable switches support running customized algorithms in the data plane alongside basic forwarding functionalities at line rate, as long as the algorithms satisfy some hardware resource constraints.

However, most existing data-plane measurement algorithms are designed for only one monitoring query, and focus on only a single packet header field. For example, we can use HyperLogLog, an algorithm to approximately count the number of unique elements, for finding super-spreaders (defined as the Source IPs sending to many unique Destination IPs). We could also identify DDoS victims, by finding the destination IPs receiving from many distinct source IPs. Here, the two queries count different header fields, and we need to run two separate instances of HyperLogLog.

It is very challenging to run multiple instances of data-plane measurement algorithms simultaneously at line rate. To achieve line-rate performance, today’s programmable switches impose strict limits on how many times an algorithm can access data-plane memory while processing a packet. Even though prior works focused extensively on saving memory space, such algorithms still require several memory accesses per packet, which scales proportionally when we run multiple instances of those algorithms. As switches allow only a small, constant number of times memory is accessed per packet, it is infeasible to run dozens of different measurement queries together.

Key points:

  • BeauCoup is a network-measurement algorithm that supports simultaneously running multiple count-distinct queries, and replacing them on the fly. 
  • Compared to a state-of-the-art measurement sketch, BeauCoup uses 4x-10x fewer memory access requirements to achieve the same accuracy.
  • Applying BeauCoup is a first step towards running most, if not all, network-monitoring queries simultaneously in the data plane. 
Image of a table showing Beaucoup running multiple queries
Figure 1 — Many network monitoring tasks can be expressed in the form of counting or count-distinct queries. BeauCoup supports running multiple queries with different keys simultaneously at line rate.

Achieving constant memory access per packet

BeauCoup is a network-measurement algorithm, developed by me and my colleagues at Princeton University, that supports running multiple count-distinct queries with different keys, using only a small constant number of total memory accesses per packet. 

To achieve constant memory access regardless of the number of queries, BeauCoup uses a mathematical technique called coupon collectors to implement individual queries with fewer than one memory access per packet on average, and combines and coordinates all queries to never use more than a constant number of accesses per packet.

Using coupon collector for count-distinct

We borrow the coupon collector concept to count distinct items using fewer than one memory access per packet. Originally, the coupon collector’s problem described a mathematical puzzle where we repeatedly draw from a set of different coupons uniformly at random, with replacement, and we calculate the number of draws needed to collect at least one of each coupon. For example, if Alice receives one of four coupons randomly at every draw, she needs 8.33 draws to expect to collect all four coupons.

BeauCoup uses coupon collectors to count distinct elements by building a random mapping. For example, to find DDoS victims, we randomly map source IPs to coupons. Every new, unseen source IP is mapped to a random coupon, resembling a random draw, while a repeated source IP is mapped to the same coupon, with no effect on the coupon collector. With enough coupons collected for one victim, we can conclude there are a large number of distinct source IPs, and report this destination IP as a victim. 

A table showing the process behind the coupon collector
Figure 2 — BeauCoup maps query attributes to coupons randomly, and duplicate attributes always map to the same coupon. When many different coupons are collected for one query key, we report this key as having many distinct attributes.

Furthermore, BeauCoup only records a small fraction of coupons in memory while ignoring a majority of them, so it does not need to access memory. For example, we could conceptually designate 1,024 coupons (each drawn with a probability of 1/1024) while only recording the first 16 coupons in memory, thus using only 1/64 memory access per packet on average.

This coupon-based probabilistic approach is approximate and will introduce errors in counting. As such we evaluated our approach by comparing BeauCoup’s accuracy against the state-of-the-art measurement sketch that supports count-distinct, and showed that it achieves better accuracy under the same per-packet memory access constraint, and only needs 4x-10x fewer memory access requirements to achieve the same accuracy.

Combining multiple queries

Once we achieve fewer than one memory access per query, it’s time to run them simultaneously. With the right per-query parameters, a naïve solution running all queries independent of each other can easily achieve constant memory access per packet on average. However, in the worst-case scenario, different queries might need to collect coupons while processing the same packet, leading to a larger memory access per packet in the worst case. Thus, we need to coordinate among different traffic queries.

We first combine the queries that count on the same thing. For example, if two or three queries all count the number of distinct destination IPs (while using different query keys), they all need random mappings from destination IPs to coupons. In this case, we build a single random hash function that maps the destination IPs to all coupons across these queries, mapping to, at most, one coupon at a time while following the probabilities required by different queries. We designate one such hash function for every packet header field being counted, such as source IPs and port numbers, and categorize each query under the corresponding hash function. Afterwards, we design an additional tie-breaking step to ensure, at most, one coupon is produced across all the hash functions. Now, for each packet, we collect, at most, one coupon, which translates to a fixed number of memory accesses per packet.

Running BeauCoup in the data plane

As part of a study we recently presented at ACM SIGCOMM 2020, we demonstrated a prototype of BeauCoup on the Barefoot Tofino programmable switch using the P4 programming language. Since BeauCoup collects strictly, at most, one coupon per packet, it fits comfortably within the hardware constraint for per-packet memory access. 

We implemented the hash functions using CRC32 hash with random seeds, and efficiently matched their output to coupons using Ternary Content-Addressable Memory (TCAM) matching rules. Both features are readily available in programmable network devices as they’re already commonly used for checksum and forwarding.

Furthermore, unlike some prior works, our prototype supports updating traffic queries on the fly without the need to recompile new P4 programs. This is because all logic and probability parameters pertinent to matching coupons are embedded in the TCAM rules, while the logic for hashing, tiebreaking, and collecting coupons is relatively fixed and is independent of query parameters. Therefore, BeauCoup minimizes downtime for switches carrying production traffic, while still serving the changing and diverse monitoring needs of network administrators.

As more measurement algorithms are designed with memory access constraints in mind, we look forward to future work combining even more queries and arbitrating their memory space and access requirements, achieving line-rate performance.

Read our paper “BeauCoup: Answering Many Network Traffic Queries, One Memory Update at a Time” to learn more.

Xiaoqi Chen is a PhD student at Princeton University. His research interests include designing and implementing approximate algorithms to run in programmable network switches, for performing network measurement and optimization.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top