A friend (Brendan, but I’m keeping his social media ID private) just asked me this in a Facebook chat:
The Radio National program he was referring to is called The Big Picture on Big Data.
And it set me thinking: how do we compare these kinds of aggregate data? Brendan has a science background and a PhD in chemistry so he understands the issues well: journalism demands snappy punchlines and short messages. 1.1 exabytes is a huge number, but it’s also suspiciously inaccurate: 1.1 exabytes. Was that really 1.09 exabytes? Or was it closer to 1.14? I suspect that the underlying premise of “how much” actually isn’t well defined.
The source is the Australian Bureau of Statistics (ABS) data. Hmm… well that’s statutory reporting, and perhaps I have to be more generous. The number is the number. Let’s work with it a bit.
This kind data is typically collected in one of several ways: ISPs get asked to provide basic information on traffic flows over their systems, measurements are made at exchange points and offshore links and then multiplied up to get a rough overall figure, and customer-billing data gets re-purposed to meet the goal. Both of the first two approaches have their problems: link counters overflow (SNMP, the basic tool of network management is notorious for having counter values which wrap on 32 bit numbers, so you cannot count more than 16 billion without resetting to zero: you have to read them frequently enough to catch the resets and work out ‘how many times round the clock’ you’ve gone) and so traffic flows are of necessity an estimation sometimes. Likewise, data flow at an exchangepoint is a good sign of who talks to whom, but only for the public traffic. Lots of people now have off-exchange links, or embed content inside their ISP network with content distribution services like Akamai and CloudFlare ‘inside the ISP’ so the exchangepoint measure isn’t a good sense of what end users do any more.
On the other hand, Australia is a land of download quota. And we’re billed on the quota and throttled on the quota. So if the figure is based on returns on which customers received what bytes over what period of time, it might be a reasonable basis for measurement because at the volumes of users in Australia.
APNIC Labs has a report on Internet address distributions by economy, which usefully notes Australia has around 23 million people, and 20 million end users. (the ABS actually reports 12 million subscribers, but its very likely a subscriber is not the same as a user, and we know people carry around multiple devices, and live in houses with multiple people so I’m not so worried by this disparity. Its within an order of magnitude, so won’t alter the numbers by too much) So, by judicious hand-waving and big numbers, we can be pretty confident the billing returns for download volumes by users in the big ISPs is statistically close to what people really do. And, it is all about the big ISPs. Another Labs report on market share by ASN shows that for Australia, the biggest 10 ISPs have 85% of the market. (You have to sort the table data by economy code and do some arithmetic, but its not too hard to find this). The behaviours of users over just 10 companies is very strongly indicative of what’s going on in Australia.
So. We have 1.1 exabytes, but we bill our download capped accounts in gigabytes.
1.1 exabytes is 1,181,116,006 (or so) gigabytes. That’s 1,118 million gigabytes, or thereabouts.
If we work with the ABS subscriber count, 12,000,000 people did this. So 1,100 million Gb was downloaded by 12 million people, and so by knocking out the millions (remember your arithmetic?) one person on average downloaded 1/12 of 1,100 Gb, which gets to 91 Gb in the three months, which is only around 30Gb per month per subscriber. And if we work with the APNIC/ITU figure of 20,000,000 people, it’s more like 18-20GB per subscriber per month.
These figures begin to sound remarkably plausible. Many home ADSL contracts are 20, 40 or 100GB per month capped. We’re comfortably under that number. And since we know the world divides into the heavy hitters, downloaders and the rest of us, I’m not too worried that probably this is a curve, and some people did a lot more than this on average, and a lot did less.
By gum! those clever ABS people are probably on the money!
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.