Network traffic telemetry on modern routers: Part 1

By Pavel Odintsov on 27 Mar 2025

Tags: Guest Post, How to, monitoring, protocols, security

During NANOG 87 in Seattle, I had to pay a visit to the authors of Netflow.

This article focuses solely on traffic telemetry protocols that export information about network packets (either in parsed format or as the first X bytes of the packet itself) observed or forwarded by network equipment. Protocols such as gNMI, NETCONF, and SNMP are out of scope for this discussion.

As the CTO and co-founder of FastNetMon LTD, my main field of interest is Distributed Denial-of-Service (DDoS) defence. In this article, I’ll share my perspective on each protocol from the standpoint of DDoS detection. If your primary use of network traffic telemetry is for traffic visibility, billing, or law enforcement, my experience may be less relevant to you.

The first stable version of FastNetMon, released in 2014, supported only port mirroring, as it was the only protocol compatible with our network equipment at the time. After receiving initial feedback from the networking community about the need to support more widely adopted network traffic telemetry protocols, I began work on expanding compatibility. After nine months of active development, I introduced support for NetFlow v5, v9, sFlow v5, and IPFIX in 2015.

Due to the large number and diversity of FastNetMon users, we had access to a wide variety of network equipment models and were able to test our product’s compatibility with the majority of products available on the market.

In this article, I’ll share insights on different network traffic telemetry protocols from both operational (how specific vendors implement them) and implementation (how to write code to handle particular protocols) perspectives.

The main motivation for adding new state-of-the-art traffic telemetry protocols was to achieve faster attack detection times — now as quick as 1.5 seconds — and to gather more detailed information about DDoS attack traffic. This enables us to develop the most efficient mitigation strategies possible.

Network telemetry on modern routers

Modern telco-grade routers support multiple traffic telemetry protocols, including:

NetFlow v5
NetFlow v9
IPFIX
sFlow v5
Port mirroring
Sampled port mirroring (including the GRE option)
Raw headers over IPFIX or NetFlow v9

Does that look overwhelming? It certainly can be. Choosing the right protocol to meet your needs is a significant challenge. You have to consider numerous theoretical characteristics of each protocol while also taking into account the strengths and weaknesses of how vendors have implemented them.

Let’s dive into the details of each one.

Netflow v5

Let’s start with the oldest protocol, NetFlow v5. Surprisingly, you can still find support for it even in very modern routers (though personally, I think it should be deprecated and removed).

NetFlow v5 is a fixed-format protocol, meaning it can only export a limited number of predefined fields that describe each packet. Let’s take a look at the protocol structure, as kindly provided by Cisco.

Bytes	Contents	Description
0-1	version	NetFlow export format version number
2-3	count	Number of flows exported in this packet (1-30)
4-7	SysUptime	Current time in milliseconds since the export device booted
8-11	unix_secs	Current count of seconds since 0000 UTC 1970
12-15	unix_nsecs	Residual nanoseconds since 0000 UTC 1970
16-19	flow_sequence	Sequence counter of total flows seen
20	engine_type	Type of flow-switching engine
21	engine_id	Slot number of the flow-switching engine
22-23	sampling_interval	First two bits hold the sampling mode; remaining 14 bits hold value of sampling interval

Table 1 — Netflow v5 header format. Source.

This part doesn’t contain any information about the traffic itself but is essential for implementing a NetFlow collector. For example, the sampling_interval field is absolutely necessary for our needs, as it indicates the number of skipped (sampled out) packets during observation.

Now, let’s look at the flow encoding format. Each NetFlow v5 packet can carry multiple flows (usually 12–15), which describe the traffic observed by the router. You can find the full list below.

Bytes	Contents	Description
0-3	srcaddr	Source IP address
4-7	dstaddr	Destination IP address
8-11	nexthop	IP address of next hop router
12-13	input	SNMP index of input interface
14-15	output	SNMP index of output interface
16-19	dPkts	Packets in the flow
20-23	dOctets	Total number of Layer 3 bytes in the packets of the flow
24-27	First	SysUptime at start of flow
28-31	Last	SysUptime at the time the last packet of the flow was received
32-33	srcport	TCP/UDP source port number or equivalent
34-35	dstport	TCP/UDP destination port number or equivalent
36	pad1	Unused (zero) bytes
37	tcp_flags	Cumulative OR of TCP flags
38	prot	IP protocol type (for example, TCP = 6; UDP = 17)
39	tos	IP type of service (ToS)
40-41	src_as	Autonomous System number (ASN) of the source, either origin or peer
42-43	dst_as	ASN of the destination, either origin or peer
44	src_mask	Source address prefix mask bits
45	dst_mask	Destination address prefix mask bits
46-47	pad2	Unused (zero) bytes

Table 2 — Netflow v5 flow encoding format. Source.

Netflow and IPFIX flow aggregation

The NetFlow and IPFIX protocol family generally (with a few minor exceptions that I’ll explain later) assumes that network equipment (usually a router) has a flow tracking subsystem that performs packet aggregation.

What is flow? A flow is typically defined as a 5-tuple consisting of the following fields:

Source IP
Source port
Destination IP
Destination port
Protocol

To implement flow tracking, the router creates a memory entry for each unique 5-tuple and then maintains multiple metrics for each flow. Typically, it counts the number of packets and bytes transferred by that flow.

Since we’re discussing memory, it’s important to emphasize that it’s a finite resource. The very nature of DDoS attacks revolves around exhausting finite resources — typically network capacity, but it can also include the router’s compute or memory resources. This is one of the primary disadvantages of this protocol family for attack detection, as it can itself become a vector for a denial-of-service attack against network equipment.

What is packet aggregation? Instead of sending information about every single packet handled by the network device, packet aggregation exports data only for unique flows. If thousands of packets belong to the same flow, they are exported just once, saving significant resources. Essentially, multiple packets that belong to the same flow are aggregated into a single flow record.

As part of the flow tracking implementation, we need to export information about the traffic transferred by each flow to the NetFlow collector. Let’s imagine a medium-sized network with 100G+ of traffic. Such networks can easily have tens of millions of active flows at any given moment.

Can we send millions of flows every second? Technically, yes, but it would be extremely challenging for the router and would likely overload the NetFlow collector.

What is the solution? The most common approach is to scan the table with all active flows in the network every 5, 15, or 60 seconds (this period is known as the active flow timeout) and send to the collector only information about flows that have had at least one packet transferred during that period.

This approach reduces the load on the router (since there is more time to scan the table) and significantly decreases the amount of data sent to the collector.

What is the issue with this approach? The main problem is that it introduces a delay in exporting information about packets observed by the network — up to X seconds. This delay can be extremely dangerous during a DDoS attack, where bandwidth can increase to hundreds of gigabits in just 30 seconds. Such a delay could result in network downtime and connectivity issues for customers.

Many vendors set the minimum value for this timeout to relatively large values, like 15 or even 60 seconds, to prevent control plane overload. Unfortunately, this makes fast traffic monitoring impossible.

Sampling

The main way to deal with flow table overload issues is through sampling. Instead of sending all traffic to the flow tracking engine, we can discard 99% of it and only pass a small fraction of the traffic. This results in far fewer flow tracking entries, which allows for much faster flow table scanning. For example, on a 10G port, you can expect very accurate results when using a 1 in 1,024 sampling ratio. From our experience, the vast majority of large telco installations rely on sampling to manage traffic efficiently.

Sampling is a very powerful tool with almost no drawbacks for DDoS attack detection purposes. However, you need to be extremely careful when selecting the sampling rate value, as choosing an incorrect rate can impact the accuracy and effectiveness of detection.

Let’s summarize our field experience with Netflow v5.

Benefits of Netflow v5

Supported even by very old equipment
Simple parser implementation due to static structures
Simple sampling rate encoding (available in each packet)

Issues with Netflow v5

Official standard does not exist
Lack of IPv6 support
Lack of 32-bit (4-byte) ASNs support
Sampling cannot exceed 1:16384 due to 14-bit field length
Impossible to extend due to static structures
Flow delays in the range of 1-30 seconds before export

If you’re still using NetFlow v5, it’s time to stop as soon as possible. This protocol no longer meets the needs of modern networks and should not be used. So, what should you use instead? Please keep reading to find out.

Netflow v9

It’s one of the most widely adopted protocols in the industry. Used by nearly everyone, it can be found in almost all modern routers and is supported by the majority of NetFlow collectors (including FastNetMon).

It’s a truly great protocol, capable of carrying virtually any information from the router to the collector. In practice, however, we are limited by the fields selected by the vendor. Figures 1 and 2 show examples of field lists for two leading telco vendors.

Figure 1 — Netflow v9 field set, vendor A.

Figure 2 — Netflow v9 field set, vendor B.

As Figures 1 and 2 show, the list of exported fields in NetFlow v9 is much longer than the list supported by NetFlow v5. The best part? We can easily add new fields, and the protocol supports this flexibility.

However, this flexibility comes with a hidden cost for NetFlow collector developers — template management. Every collector that supports NetFlow v9 must track all lists of fields (known as ‘data templates’) announced by routers, which can change over time. This information is crucial for decoding the data received from routers.

Naturally, decoding data that uses NetFlow v9 is much more complex compared to NetFlow v5, as the structure is dynamic for each device, and even a single device can use multiple formats.

Sampling encoding

NetFlow v9 also performs flow aggregation, so all the issues discussed in the section about NetFlow v5 regarding this process apply to NetFlow v9 as well. It does support sampling too, but the logic to export the actual sampling rate is much more complicated. Essentially, we have special types of packets (options and option templates) that deliver this information from the device to the collector.

Let’s take a look at example formats for sampling encoding (Figures 3 and 4). Personally, I consider this part of the protocol to be the most over-engineered and excessively complicated.

Figure 4 — Sampling encoding, vendor B (please ignore yellow highlighting).

Let’s summarize our feedback with Netflow v9

Benefits of Netflow v9

Supported by almost all vendors
IPv6 support
Can carry sampling rate in any range
Well documented and most of the implementations are reasonably close to the original implementation
Offers almost unlimited extensibility
Some fields are documented as part of IPFIX RFCs

Issues with Netflow v9

Complicated data encoding for collector
Sampling encoding is complicated and vendor-specific
Issues with flow duration encoding on some vendors

IPFIX

IPFIX is the successor and further development of NetFlow v9. Occasionally, it may be referred to as NetFlow v10, highlighting the vast number of similarities in protocol design.

IPFIX is essentially the first widely adopted version of a network telemetry standard, developed by the IETF and published across numerous RFCs. From my own experience, I would describe IPFIX as NetFlow v9 with proper documentation.

Should you migrate to IPFIX because NetFlow v9 is outdated? The answer depends on the vendor and specific model. Personally, I recommend using IPFIX for all new deployments.

Now, let’s take a look at an example list of IPFIX fields. As you’ll see in Figure 5, it clearly resembles the list used by NetFlow v9.

Did it fix all the issues present in NetFlow v9? Clearly, no. Sampling encoding became even more complicated, and each vendor handles it differently.

Let’s summarize our view of IPFIX.

Benefits of IPFIX

Well documented RFC standard
IPv6 support
Unlimited flexibility

Issues of IPFIX

Complicated encoding for collector
Tricky encoding for dropped by BGP Flow Spec traffic (some vendors)
Some vendors still do not support it
Limited by subset of fields selected by vendor

What’s next?

In the second part of this article, I’ll cover protocols such as sFlow v5, sampled port mirror, sampled port mirror over GRE, and the most innovative protocols that deliver raw headers over IPFIX or NetFlow v9 (IPFIX 315 or ‘inline monitoring services’). Stay tuned!

Pavel is the author of FastNetMon, an open source DDoS detection tool with a variety of traffic capture methods and works in software development and community management.

Adapted from the original on Pavel’s blog.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.