RIPE 77: Geo is hard

Ingrid Wijte led a great discussion on the third day of the RIPE 77 meeting, happening this week in Amsterdam, on country codes in the extended delegated statistics and RIPE DB.

Watch the session via the RIPE 77 video archive.

The summary of the problem discussed was the use of ISO 3166 economy codes — often called country code or CC, but this is clearly a misnomer since the data includes: regions; economic areas such as Hong Kong or Macau; former Soviet Union member states, now collectively known as ‘transition states’; and even functional unions like the African Fisheries Forum, the titular holder of the ‘AP’ code — a very complex and fraught space.

Some people want them to refer to the legal entity location because this is what can be validated by the registry in the normal process with its resource holders.

Others want the freedom to specify the economy for their own outcomes, such as geolocation of IP addresses for Intellectual Property Rights (IPR) filtering and CDN traffic direction reasons.

It can also be related to addresses in transfer — inter-RIR — or a range of other issues; none of which combine easily into the public information sources a Regional Internet Registry (RIR) typically maintains, such as whois, or statistical summaries.

The current RIR statistics file and extended-statistics file format is the result of a historical process between the RIRs, which has its roots in a file format initially proposed and implemented by APNIC and subsequently adopted in common across the RIR framework. The simple stats file format was designed to account for resources in use, and combined historical data from disparate sources into a single canonical format.

Subsequently, during the final IPv4 rundown, it became vital to account for as-yet unallocated resources; an extended file format was developed to both account for reserved and available resources in the RIR system, and resources held in IANA, and also to begin to identify unique entities holding addresses.

It has become clear that the consumers of the files and related whois data are now conscious of the divergence in meaning of the various fields of these files and sources. This goes beyond economy codes to include the question of date fields in these files — is it the origination date, or the most recent transactional change moment of the resource? This makes the fields much harder to apply to questions in the real world. We really do need to re-converge on a meaning for these values and understand them consistently.

During the discussion, some interesting points came up:

Carlos Martinez, from LACNIC, observed that they now regard the RIR file as best suited to the legal entity status; the economy of company registration, which is a validated input during engagement, with the holding entity. They can stand by the value because they have a process behind it.
Related to this point, to address the needs asset holders have, LACNIC now assert new files, which are (at this time) unique to LACNIC and permit the Internet resources geolocation attributes to be asserted by the holder. This is now clearly not an RIR assertion; the RIR is acting as a publication point for self-asserted values. Carlos said that at least one of these GEO forms was an expired IETF draft, so, potentially, a format that was openly specified and in wider use.

I took the opportunity to also note that CDN providers believe they have significantly better data from their global BGP view, but that this represents company intellectual property and isn’t likely to be shared. To me, this suggests a publicly maintained GEO service remains something of interest.

I also observed that some other agencies in our ecology prefer to see the RIR as a source of ‘proof of possession’ and would like to be able to do SHA-hash exchange with an entity (address holder) and have them show they control the resource by publishing that in a public view like whois, to then begin a discussion with them about things like local peering, whitelisting and geolocation.

Geoff Huston noted during the discussion that as both a producer and consumer of this data, he increasingly finds commercial service sources, such as MaxMind, more viable; which is a shame as this space feels like one that should be much simpler and understood. Why is it so hard?

My over-arching concern here is that the current stats file format was a delicate negotiation between the RIRs, and in reviewing the meaning of fields and considering future changes to the file format and structure we need to be mindful of a need to converge. We want this report to be consistent in form and meaning across the entire system. That’s what makes these reports so valuable.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply Cancel reply