Network traffic telemetry on modern routers: Part 1

By on 27 Mar 2025

Category: Tech matters

Tags: , , , ,

Blog home

During NANOG 87 in Seattle, I had to pay a visit to the authors of Netflow.

This article focuses solely on traffic telemetry protocols that export information about network packets (either in parsed format or as the first X bytes of the packet itself) observed or forwarded by network equipment. Protocols such as gNMI, NETCONF, and SNMP are out of scope for this discussion.

As the CTO and co-founder of FastNetMon LTD, my main field of interest is Distributed Denial-of-Service (DDoS) defence. In this article, I’ll share my perspective on each protocol from the standpoint of DDoS detection. If your primary use of network traffic telemetry is for traffic visibility, billing, or law enforcement, my experience may be less relevant to you.

The first stable version of FastNetMon, released in 2014, supported only port mirroring, as it was the only protocol compatible with our network equipment at the time. After receiving initial feedback from the networking community about the need to support more widely adopted network traffic telemetry protocols, I began work on expanding compatibility. After nine months of active development, I introduced support for NetFlow v5, v9, sFlow v5, and IPFIX in 2015.

Due to the large number and diversity of FastNetMon users, we had access to a wide variety of network equipment models and were able to test our product’s compatibility with the majority of products available on the market.

In this article, I’ll share insights on different network traffic telemetry protocols from both operational (how specific vendors implement them) and implementation (how to write code to handle particular protocols) perspectives.

The main motivation for adding new state-of-the-art traffic telemetry protocols was to achieve faster attack detection times — now as quick as 1.5 seconds — and to gather more detailed information about DDoS attack traffic. This enables us to develop the most efficient mitigation strategies possible.

Network telemetry on modern routers

Modern telco-grade routers support multiple traffic telemetry protocols, including:

  • NetFlow v5
  • NetFlow v9
  • IPFIX
  • sFlow v5
  • Port mirroring
  • Sampled port mirroring (including the GRE option)
  • Raw headers over IPFIX or NetFlow v9

Does that look overwhelming? It certainly can be. Choosing the right protocol to meet your needs is a significant challenge. You have to consider numerous theoretical characteristics of each protocol while also taking into account the strengths and weaknesses of how vendors have implemented them.

Let’s dive into the details of each one.

Netflow v5

Let’s start with the oldest protocol, NetFlow v5. Surprisingly, you can still find support for it even in very modern routers (though personally, I think it should be deprecated and removed).

NetFlow v5 is a fixed-format protocol, meaning it can only export a limited number of predefined fields that describe each packet. Let’s take a look at the protocol structure, as kindly provided by Cisco.

BytesContentsDescription
0-1versionNetFlow export format version number
2-3countNumber of flows exported in this packet (1-30)
4-7SysUptimeCurrent time in milliseconds since the export device booted
8-11unix_secsCurrent count of seconds since 0000 UTC 1970
12-15unix_nsecsResidual nanoseconds since 0000 UTC 1970
16-19flow_sequenceSequence counter of total flows seen
20engine_typeType of flow-switching engine
21engine_idSlot number of the flow-switching engine
22-23sampling_intervalFirst two bits hold the sampling mode; remaining 14 bits hold value of sampling interval
Table 1 — Netflow v5 header format. Source.

This part doesn’t contain any information about the traffic itself but is essential for implementing a NetFlow collector. For example, the sampling_interval field is absolutely necessary for our needs, as it indicates the number of skipped (sampled out) packets during observation.

Now, let’s look at the flow encoding format. Each NetFlow v5 packet can carry multiple flows (usually 12–15), which describe the traffic observed by the router. You can find the full list below.

BytesContentsDescription
0-3srcaddrSource IP address
4-7dstaddrDestination IP address
8-11nexthopIP address of next hop router
12-13inputSNMP index of input interface
14-15outputSNMP index of output interface
16-19dPktsPackets in the flow
20-23dOctetsTotal number of Layer 3 bytes in the packets of the flow
24-27FirstSysUptime at start of flow
28-31LastSysUptime at the time the last packet of the flow was received
32-33srcportTCP/UDP source port number or equivalent
34-35dstportTCP/UDP destination port number or equivalent
36pad1Unused (zero) bytes
37tcp_flagsCumulative OR of TCP flags
38protIP protocol type (for example, TCP = 6; UDP = 17)
39tosIP type of service (ToS)
40-41src_asAutonomous System number (ASN) of the source, either origin or peer
42-43dst_asASN of the destination, either origin or peer
44src_maskSource address prefix mask bits
45dst_maskDestination address prefix mask bits
46-47pad2Unused (zero) bytes
Table 2 — Netflow v5 flow encoding format. Source.

Netflow and IPFIX flow aggregation

The NetFlow and IPFIX protocol family generally (with a few minor exceptions that I’ll explain later) assumes that network equipment (usually a router) has a flow tracking subsystem that performs packet aggregation.

What is flow? A flow is typically defined as a 5-tuple consisting of the following fields:

  • Source IP
  • Source port
  • Destination IP
  • Destination port
  • Protocol

To implement flow tracking, the router creates a memory entry for each unique 5-tuple and then maintains multiple metrics for each flow. Typically, it counts the number of packets and bytes transferred by that flow.

Since we’re discussing memory, it’s important to emphasize that it’s a finite resource. The very nature of DDoS attacks revolves around exhausting finite resources — typically network capacity, but it can also include the router’s compute or memory resources. This is one of the primary disadvantages of this protocol family for attack detection, as it can itself become a vector for a denial-of-service attack against network equipment.

What is packet aggregation? Instead of sending information about every single packet handled by the network device, packet aggregation exports data only for unique flows. If thousands of packets belong to the same flow, they are exported just once, saving significant resources. Essentially, multiple packets that belong to the same flow are aggregated into a single flow record.

As part of the flow tracking implementation, we need to export information about the traffic transferred by each flow to the NetFlow collector. Let’s imagine a medium-sized network with 100G+ of traffic. Such networks can easily have tens of millions of active flows at any given moment.

Can we send millions of flows every second? Technically, yes, but it would be extremely challenging for the router and would likely overload the NetFlow collector.

What is the solution? The most common approach is to scan the table with all active flows in the network every 5, 15, or 60 seconds (this period is known as the active flow timeout) and send to the collector only information about flows that have had at least one packet transferred during that period.

This approach reduces the load on the router (since there is more time to scan the table) and significantly decreases the amount of data sent to the collector.

What is the issue with this approach? The main problem is that it introduces a delay in exporting information about packets observed by the network — up to X seconds. This delay can be extremely dangerous during a DDoS attack, where bandwidth can increase to hundreds of gigabits in just 30 seconds. Such a delay could result in network downtime and connectivity issues for customers.

Many vendors set the minimum value for this timeout to relatively large values, like 15 or even 60 seconds, to prevent control plane overload. Unfortunately, this makes fast traffic monitoring impossible.

Sampling

The main way to deal with flow table overload issues is through sampling. Instead of sending all traffic to the flow tracking engine, we can discard 99% of it and only pass a small fraction of the traffic. This results in far fewer flow tracking entries, which allows for much faster flow table scanning. For example, on a 10G port, you can expect very accurate results when using a 1 in 1,024 sampling ratio. From our experience, the vast majority of large telco installations rely on sampling to manage traffic efficiently.

Sampling is a very powerful tool with almost no drawbacks for DDoS attack detection purposes. However, you need to be extremely careful when selecting the sampling rate value, as choosing an incorrect rate can impact the accuracy and effectiveness of detection.

Let’s summarize our field experience with Netflow v5.

Benefits of Netflow v5

  • Supported even by very old equipment
  • Simple parser implementation due to static structures
  • Simple sampling rate encoding (available in each packet)

Issues with Netflow v5

  • Official standard does not exist
  • Lack of IPv6 support
  • Lack of 32-bit (4-byte) ASNs support
  • Sampling cannot exceed 1:16384 due to 14-bit field length
  • Impossible to extend due to static structures
  • Flow delays in the range of 1-30 seconds before export

If you’re still using NetFlow v5, it’s time to stop as soon as possible. This protocol no longer meets the needs of modern networks and should not be used. So, what should you use instead? Please keep reading to find out.

Netflow v9

It’s one of the most widely adopted protocols in the industry. Used by nearly everyone, it can be found in almost all modern routers and is supported by the majority of NetFlow collectors (including FastNetMon).

It’s a truly great protocol, capable of carrying virtually any information from the router to the collector. In practice, however, we are limited by the fields selected by the vendor. Figures 1 and 2 show examples of field lists for two leading telco vendors.

Figure 1 — Netflow v9 field set, vendor A.
Figure 1 — Netflow v9 field set, vendor A.
Figure 2 — Netflow v9 field set, vendor B.
Figure 2 — Netflow v9 field set, vendor B.

As Figures 1 and 2 show, the list of exported fields in NetFlow v9 is much longer than the list supported by NetFlow v5. The best part? We can easily add new fields, and the protocol supports this flexibility.

However, this flexibility comes with a hidden cost for NetFlow collector developers — template management. Every collector that supports NetFlow v9 must track all lists of fields (known as ‘data templates’) announced by routers, which can change over time. This information is crucial for decoding the data received from routers.

Naturally, decoding data that uses NetFlow v9 is much more complex compared to NetFlow v5, as the structure is dynamic for each device, and even a single device can use multiple formats.

Sampling encoding

NetFlow v9 also performs flow aggregation, so all the issues discussed in the section about NetFlow v5 regarding this process apply to NetFlow v9 as well. It does support sampling too, but the logic to export the actual sampling rate is much more complicated. Essentially, we have special types of packets (options and option templates) that deliver this information from the device to the collector.

Let’s take a look at example formats for sampling encoding (Figures 3 and 4). Personally, I consider this part of the protocol to be the most over-engineered and excessively complicated.

Figure 3 — Sampling encoding, vendor A.
Figure 3 — Sampling encoding, vendor A.
Figure 4 — Sampling encoding, vendor B (please ignore yellow highlighting).
Figure 4 — Sampling encoding, vendor B (please ignore yellow highlighting).

Let’s summarize our feedback with Netflow v9

Benefits of Netflow v9

  • Supported by almost all vendors
  • IPv6 support
  • Can carry sampling rate in any range
  • Well documented and most of the implementations are reasonably close to the original implementation
  • Offers almost unlimited extensibility
  • Some fields are documented as part of IPFIX RFCs

Issues with Netflow v9

  • Complicated data encoding for collector
  • Sampling encoding is complicated and vendor-specific
  • Issues with flow duration encoding on some vendors

IPFIX

IPFIX is the successor and further development of NetFlow v9. Occasionally, it may be referred to as NetFlow v10, highlighting the vast number of similarities in protocol design.

IPFIX is essentially the first widely adopted version of a network telemetry standard, developed by the IETF and published across numerous RFCs. From my own experience, I would describe IPFIX as NetFlow v9 with proper documentation.

Should you migrate to IPFIX because NetFlow v9 is outdated? The answer depends on the vendor and specific model. Personally, I recommend using IPFIX for all new deployments.

Now, let’s take a look at an example list of IPFIX fields. As you’ll see in Figure 5, it clearly resembles the list used by NetFlow v9.

Figure 5 — IPFIX list of fields.
Figure 5 — IPFIX list of fields.

Did it fix all the issues present in NetFlow v9? Clearly, no. Sampling encoding became even more complicated, and each vendor handles it differently.

Figure 6 — IPFIX Options template.
Figure 6 — IPFIX Options template.

Let’s summarize our view of IPFIX.

Benefits of IPFIX

  • Well documented RFC standard
  • IPv6 support
  • Unlimited flexibility

Issues of IPFIX

  • Complicated encoding for collector
  • Tricky encoding for dropped by BGP Flow Spec traffic (some vendors)
  • Some vendors still do not support it
  • Limited by subset of fields selected by vendor

What’s next?

In the second part of this article, I’ll cover protocols such as sFlow v5, sampled port mirror, sampled port mirror over GRE, and the most innovative protocols that deliver raw headers over IPFIX or NetFlow v9 (IPFIX 315 or ‘inline monitoring services’). Stay tuned!

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top