Those who measure blindly, tend to produce rubbish results

There is a saying in my native Germany: “Wer misst, misst Mist”. Roughly translated, it’s a warning that those who measure blindly tend to produce rubbish results.

In a longer post of mine that features on Discover.ISIF.Asia, I’ve described how several “out-of-left field” effects have caused us problems in our research to simulate satellite Internet traffic to small island Internet providers. (I’ve also written about our satellite project on the APNIC Blog previously, if you’d like more background).

With over 110 machines in a complex configuration, there’s plenty that can go wrong.

To overcome these effects, and in doing so, ensure we’re producing solid data, we perform the following four tasks:

Configuring, verifying and testing: This includes checking whether servers and clients are capable of handling the load, link setups have the correct bandwidth configured, and that all machines are up and running at the beginning of an experiment.
Automated analysis of log files: When we compare the log files from either side of the link, we check when each client and server was first and last seen, and how much traffic went to/from the respective machine. Whenever a machine starts late or disappears early, or its traffic stats deviate, we issue a warning.
Double checking results: Two of us will ask the following questions to decide whether the results are feasible: Are throughput and goodput within capacity limits? Do the graphs that we produce to assess quality show what we’d expect, or do they show artefacts?
Scripting experiment configuration: Configuring experiments requires setting a label, seven parameters for the link simulation, 14 different RTT latencies and jitters for the servers, load and timeout configurations for 94 client machines, an iperf download size, plus the orchestrated execution of everything at the right time. Configuring all of this manually would be a recipe for disaster, so we script as much as we can – this takes care of a lot of typos!

Sweeping provides further verification

Another factor we need to consider in our research is that underperforming satellite links could simply be a matter of bad link configuration rather than a fundamental problem with TCP congestion control. It would be all too easy to take a particular combination of link capacity and queue capacity to demonstrate an effect without asking what influence these parameters have on the effect.

This is why we also perform a further verification step: We “sweep” through a range of feasible parameters. This allows us to measure whether observables change in the expected direction when we change parameters such as load or queue capacity.

For example, in the case of an experiment where we studied a 16Mbps link, we initially swept across 11 potential queue capacities between 30kB and 800kB. For each capacity, we swept up to nine load levels between 10 and 600 client channels. That’s many dozens of combinations, each of which takes around 20 minutes to simulate, plus whatever time we then take for subsequent manual inspection. Multiply this by the number of possible link bandwidths of interest in GEO and MEO configuration, plus additional sweeps across sub-ranges to obtain more detail and repeats for experiments with quality control issues, and we’ve got our work cut out for us.

Having said this, when it comes to comparing the performance of different technologies, we want to ensure that we are putting our best foot forward.

Ulrich Speidel is a senior lecturer in Computer Science at the University of Auckland with research interests in fundamental and applied problems in data communications, information theory, signal processing and information measurement.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Sweeping provides further verification

Leave a Reply Cancel reply