A blog post on Cloudflare from late October, before the recent outage that has dominated the blogosphere and become the topic everyone seems to want to discuss, is worth reading. It focuses on TCP and network measurement, but from the perspective of a ‘man-in-the-middle’. The post is titled Measuring Characteristics of TCP Connections at Internet Scale.
Normally, when we talk about a man-in-the-middle, it’s in the context of cryptography or interception, highlighting the risks of someone intruding on your private end-to-end communications. But Cloudflare exists to be a man-in-the-middle on your behalf. They act as a front-end, making your content visible across their network so that the maximum number of users can access it with the speed and latency of a local connection, rather than dealing with the variable delays of a long path to your origin server. On top of that, Cloudflare can provide protection against Distributed Denial-of-Service (DDoS) attacks and improve overall availability, thanks to its scale and redundancy.
Nonetheless, they are a man-in-the-middle, and having that vantage point means they also can observe TCP flows heading to ‘your’ content, within their caching infrastructure. Cloudflare now reports that it forwards around 20% of all websites and web traffic worldwide, which represents a very significant footprint across global connections. Their capacity to sample packets and analyse the behaviour of TCP for millions of connections every day is certainly worth examining.
The blog post is written to present a potential dataset and its performance characteristics in terms of observable packet behaviour ‘on the wire’. The dataset is a uniform 1% sample of TCP traffic seen from 7 to 15 October 2025. It was collected at Cloudflare’s user-facing edge (the ‘outside edge’) on the premise that this introduces less bias than gathering data from within their core network.
They’ve deliberately limited the dataset to ‘useful’ TCP flows, meaning they only include sessions where they observe at least one TCP FIN (finish) packet and actual data transfer within the session, typically real HTTP traffic. By excluding TCP flows without a FIN, they filter out many of the bogus, unidirectional distributed DDoS packets. Surprisingly, roughly one-tenth of the valid TCP sessions they see never carry any actual HTTP data. It raises the question: Why establish a two-way connection, exchange identity information, and perform TLS cryptography, only to end up doing… nothing?
The post provides details on session duration, packet counts, and bytes sent and received. The classic ‘elephants and mice‘ distribution of flows is observed and documented, alongside in-flow characteristics of TCP sessions, such as changes to the window size and the influence of congestion control behaviours.
It’s a great read and an excellent example of how improved telemetry and access to data from millions of connections can help you understand what to expect when tuning your services for the modern Internet. Even if you don’t use Cloudflare as a front for your website, there’s plenty of data here to consider.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.