Programmatically analyse packet captures with GoPacket

By on 12 May 2021

Category: Tech matters

Tags: , , , ,

Blog home

Tcpdump and Wireshark are great, but what if you want to programmatically analyse network traffic?

At Akita, we’ve built a tool that analyses API traffic to build API models. One of the ways Akita collects API traffic is by passively watching packets on the network. To watch network traffic in a minimally invasive way, we’ve built a custom packet processor in Golang using the GoPacket library.

In this post, I’ll walk through how to use GoPacket to capture and analyse network traffic, explaining the key concepts of the library, along with working examples you can try at home. You can also see this in action by checking out the Akita CLI on GitHub.

tcpdump + Wireshark: good for one-off debugging

You may be familiar with using the command-line packet analyser tcpdump to create a packet capture:

tcpdump -i lo -w lo.pcap

Tcpdump makes it possible to inspect network traffic by letting you print the contents of packets on a network interface that match a given filter.

You can then load the output in Wireshark, which provides a nice GUI to perform all sorts of analysis on the packets.

This setup is great for one-off debugging sessions. But it’s not easy to programmatically run the analysis, let alone package up custom analyses to ship off to your users.

Using GoPacket for general purpose packet processing

Now I’ll show how to automate package processing with GoPacket, a general purpose packet processing library. You can perform all sorts of analysis on packets using the awesome Go programming language.

But it’s a little more complicated than a couple of lines of code, so I’ll walk you through what you’ll need to do. In this post, we’ll focus on using GoPacket to:

  1. Collect packets from network interfaces (replace TCPDump)
  2. Reassemble TCP streams (a Wireshark feature)

Collecting packets

GoPacket provides a nice mechanism to interface with libpcap, the underlying library powering tcpdump. This means you can capture packets right from your Go program!

The GoPacket/pcap interface is fairly straightforward. For example, here’s an equivalent of `tcpdump -i lo “port 3030”` with GoPacket:

package main

import (
	"github.com/google/gopacket"
	_ "github.com/google/gopacket/layers"
	"github.com/google/gopacket/pcap"
)


const (
	// The same default as tcpdump.
	defaultSnapLen = 262144
)

func main() {
	handle, err := pcap.OpenLive("lo", defaultSnapLen, true,
				     pcap.BlockForever)
	if err != nil {
		panic(err)
	}
	defer handle.Close()

	if err := handle.SetBPFFilter("port 3030"); err != nil {
		panic(err)
	}

	packets := gopacket.NewPacketSource(
		handle, handle.LinkType()).Packets()
	for pkt := range packets {
		// Your analysis here!
	}
}

GoPacket also allows you to import a packet capture file that you’ve previously collected with tcpdump. For example:

package main

import (
	"github.com/google/gopacket"
	_ "github.com/google/gopacket/layers"
	"github.com/google/gopacket/pcap"
)

func main() {
	handle, err := pcap.OpenOffline("/tmp/lo.pcap")
	if err != nil {
		panic(err)
	}
	defer handle.Close()

	packets := gopacket.NewPacketSource(
		handle, handle.LinkType()).Packets()
	for pkt := range packets {
		// Your analysis here!
	}
}

You can see our code for calling into GoPacket on GitHub here.

Reassemble TCP streams from packets

Once you’ve collected your packets, the next thing you might want to do is reconstruct TCP streams from those packets. This is useful for examining higher level protocols such as HTTP that run over TCP. Now I’ll show how to do that. In addition to this guide, you can check out the full code for how Akita reconstructs TCP streams and parses HTTP here.

As a quick refresher, a TCP stream is a sequential flow of data exchanged between two hosts on the network. To allow the network to accommodate different bandwidths, the networking stack splits each TCP stream into multiple packets. Since the underlying IP network does not guarantee in-order delivery, your packet capture may contain duplicate or out-of-order packets for each stream. 😱

But worry not, GoPacket can help you remove the noise and reassemble those TCP streams from packets.

To use the reassembly package, you need to implement two interfaces:

  1. Stream — each stream represents a reassembled TCP stream and is the mechanism through which the reassembly packages passes data from TCP packets to you
  2. StreamFactory — a wrapper for constructing a new Stream for each TCP stream

See our implementation of Stream below — and the full context here.

// tcpStream represents a pair of uni-directional tcpFlows. It
// implements reassembly.Stream interface to receive reassembled
// packets for BOTH flows, which it then directs to the correct
// tcpFlow.
type tcpStream struct {
	clock  clockWrapper     // constant
	bidiID akinet.TCPBidiID // constant

	// Network layer flow.
	netFlow gopacket.Flow

	// flows is populated upon seeing the first packet.
	flows map[reassembly.TCPFlowDirection]*tcpFlow

	factorySelector akinet.TCPParserFactorySelector
	outChan         chan<- akinet.ParsedNetworkTraffic
}

And here is our implementation of StreamFactory, which relies on the definition of newTCPStream here.

func newTCPStream(clock clockWrapper, netFlow gopacket.Flow, 
		  outChan chan<- akinet.ParsedNetworkTraffic,
		  fs akinet.TCPParserFactorySelector) *tcpStream {
	return &tcpStream{
		clock:           clock,
		bidiID:          akinet.TCPBidiID(uuid.New()),
		netFlow:         netFlow,
		factorySelector: fs,
		outChan:         outChan,
	}
}

You will then need to wrap StreamFactory in a StreamPool, whose purpose is to create a new stream with the factory if data from a new TCP stream arrives or to pass data to an existing stream. The StreamPool in turn is used by an assembler, which contains all the fancy logic that takes care of reconstructing TCP streams from packets and its associated edge cases (out-of-order packets, early connection termination, and similar). To process packets, your program simply hands packets to the assembler. I summarize the interaction in this figure:

Figure 1 — How the reassembly components interact.
Figure 1 — How the reassembly components interact.

See how we do it at Akita here, with the relevant code below:

func (p *NetworkTrafficParser) ParseFromInterface(
	interfaceName, bpfFilter string, signalClose <-chan struct{},
	fs ...akinet.TCPParserFactory) 
(<-chan akinet.ParsedNetworkTraffic, error) {
	// Read in packets, pass to assembler
	packets, err := p.pcap.capturePackets(signalClose,
					      interfaceName,
					      bpfFilter)
	if err != nil {
		return nil, errors.Wrapf(
			err, "failed begin capturing packets from %s", 
			interfaceName)
	}

	// Set up assembly
	out := make(chan akinet.ParsedNetworkTraffic, 100)
	streamFactory := newTCPStreamFactory(
		p.clock, out, akinet.TCPParserFactorySelector(fs))
	streamPool := reassembly.NewStreamPool(streamFactory)
	assembler := reassembly.NewAssembler(streamPool)
  ...

What’s next?

If you want to see more of how we listen to and parse network traffic, check out the Akita CLI on GitHub. We’ll also have a few more blog posts coming out soon about how our tool works.‍

With thanks to Nelson Elhage, Mark Gritter, Cole Schlesinger, and Jean Yang for comments.

Kevin Ku (thethoughtfulkoala.com) did this work during his time as a Founding Engineer at Akita Software. Prior to Akita, Kevin was a Software Engineer at Google, with a focus on networking and distributed infrastructure.

This post was originally published at Akita Software Blog. Join the Akita beta to give feedback on their tool for catching regressions in API behavior.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top