Deploying new protocols on the Internet is hard.
Though the end-to-end principle and the four-layer TCP/IP architecture suggest that what happens above the IPv4 or IPv6 header isn’t any of the network’s business, the widespread deployment of firewalls, network address translators, proxies, and other middleboxes at layers four and seven mean that, in practice, TCP usually works, UDP usually works, and for everything else, your mileage may vary.
The design and implementation of new protocols and protocol extensions, such as Multipath TCP (MPTCP), must consider the potential actions of middleboxes in modifying their headers or dropping their packets, with correspondingly complex fallback code.
These design decisions, though, should be backed up by measurement: while it makes sense to detect and respond to an impairment that happens 1% of the time, it makes less sense to do so for one that happens on one path in ten million. In addition, as protocol extensions like MPTCP and Explicit Congestion Notification (ECN), and new protocols such as QUIC see increased
deployment, quantifying and localizing the extent of middlebox-driven impairments to these protocols is useful to the operations community in debugging connectivity and performance problems linked to their use.
What is PTO and how does it work?
A path through the Internet is said to be transparent to a given protocol when packets using that protocol are received without drops or modification at their destination. The PTO is designed to meet the goals of comparability and repeatability in a way that is agnostic to the measurement method for determining the presence of impairments to path transparency.
We learned some lessons from a pilot release of the PTO in March — a limited dataset is available at the MAMI Project Observatory — and have gone back to the drawing board for a fully RESTful design for the PTO, centred around a common data model aimed at path transparency measurements, and built with ‘medium data’ technologies.
The Observatory is a collection of observations, each of which is a statement that at a given time along a given path, a given condition is held.
Paths are expressed in terms of IP addresses, prefix, or BGP Autonomous System Numbers, allowing multiple resolutions of data depending on the sensitivity of the raw address information. Conditions are defined in terms of the feature (for example, ECN, TCP Fast Open) and the aspect of that feature (for example, connectivity impairment, ability to negotiate, ability to signal) they measure.
Each feature and aspect is associated with a number of states: for example, the ‘ecn.connectivity’ aspect has the states ‘works’ (trying to use ECN doesn’t have any impact on connectivity), ‘broken’ (trying to use ECN makes a connection impossible, because ECN SYN packets are dropped, as an example), or ‘offline’ (no measurement was possible because the far endpoint wasn’t online).
Data in the raw store is normalized into observations in terms of these conditions and paths, which makes queries across heterogeneous source data made by different measurement tools possible, as long as they are measuring the same aspects of the same features.
PTO design allows for repeatability
The PTO is implemented as a RESTful API around two data stores, as shown in the figure above:
- An observation store, backed by a relational database whose tables implement the PTO data model. Observations can be uploaded and downloaded as sets, or returned from queries over the entire database. Observations and queries that follow are generally made public.
- A raw data store containing unmodified measurement results received as flat files from measurement tools such as PATHspider. Normalization and analysis refine this raw data into observations for public access. Access to raw data is generally limited to the owner of that data.
Each query over the observation store has metadata referencing the observations and raw data from which it is derived, and the analyses that ran to derive them. Each object in the PTO thereby stores its provenance, fostering repeatability of experiments and analyses.
PTO release will help with locating network impairments on the Internet
As reported earlier this year at ANRW in Prague, the normalization and analysis refinement of path transparency measurements at scale offered by the PTO has already led to interesting insights about the nature of path transparency on the Internet.
Impairments to the use of ECN, which are already found to be rare, appear to be dependent on the path between the source and destination; primarily in jurisdictions with documented deployments of heterogeneous, TCP-intercepting Internet censorship infrastructure. Impairment to ECN in the network is, therefore, most likely a side effect of deliberate interference with traffic, as opposed to a more difficult to debug transient accident of middlebox implementation, and therefore less risky to deploy.
Our current work focuses on using the PTO to join heterogeneous datasets:
- Determining the extent to which impairments in various protocols are correlated.
- Joining conditions measured by PATHspider with AS-level topology and impairment information to improve localization of impairments on the Internet.
Public release of our new lightweight PTO will happen in the coming weeks. It is based on an implementation of a RESTful API in bare Go (that is, without any web frameworks), with raw data storage backed by the filesystem, and observation store in the venerable PostgreSQL RDBMS.
Brian Trammell is a Senior Researcher at ETH Zürich whose work focuses on Internet measurement and Internet architecture.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.