Should we trust the geolocation databases to geolocate routers?

By on 3 Nov 2017

Category: Tech matters

Tags: , , ,

2 Comments

Blog home

Geolocation databases are used by researchers and network operators to learn the real-world location of a given IP address.

As a network researcher or operator, you might be in a situation where you need to know the real-world location of some routers in a network path.

Research examples include the detection of routing paths that experience international detours — that is paths that start and end in the same country but visit other countries in between — and studying where censorship and monitoring activities happen around the world.

For a network operator, location information might provide more context to debug emergent network issues.

Geolocation databases are often used by both researchers and network operators to learn the real-world location of a given IP address. But how reliable are these in terms of coverage and accuracy at both country- or city-level resolutions?

Evaluation studies of previous geolocation databases are dominated by the results over end-host addresses. As a result, the users are left unsure about the reliability of geolocation databases over infrastructure IP addresses such as those of router interfaces.

In our paper ‘A Look at Router Geolocation in Public and Commercial Databases’, which was recently accepted by the ACM Internet Measurement Conference 2017 committee, we focus on router geolocation in databases in an attempt to complement previous databases evaluation studies.

We studied four popular geolocation databases, two of which are free — MaxMind GeoLite and IP2Location DB11.Lite — and two that are commercial — MaxMind GeoIP2 and Digital Envoy NetAcuity. We quantified the coverage and accuracy of the databases at country- and city-level, and evaluated and compared the performance of the databases regionally to provide recommendations to users who want to use databases to geolocate routers.

Geolocation databases coverage

To evaluate the databases coverage, we extracted a dataset of around 1,638k router interface IP addresses from the CAIDA topology dataset and geolocated them in all databases. Table 1 shows the databases coverage results at both country- and city-level. Perhaps the most interesting result here is the low city-level coverage of the MaxMind databases.

Geo-DBIP2Location-LiteNetAcuityMaxMind-GeoLiteMaxMind-Paid
Country100%100%99.3%99.3%
City99.9%99.9%43%61.6%

Table 1: Geolocation databases country- and city-level coverage.

Quantifying geolocation databases overall accuracy

The accuracy of the databases is evaluated using a ground truth set of 16,586 router IP addresses and their city-level locations.

Our ground truth was created using two approaches, a DNS-based approach and a delay-based approach (RTT-proximity). Table 2 shows for each approach, the number of extracted addresses and their regional distribution. Note that transit Autonomous Systems announce 99.9% of the addresses in the DNS-based set and 74.5% of the addresses in the RTT-inferred set.

Ground truthDNS-basedRTT-proximity
IP count11,8574,838
Countries53118
Unique coordinates2381,347
ARIN9,5881,123
APNIC560372
AFRINIC0131
LACNIC052
RIPENCC1,7093,160

Table 2: Regional distribution of the DNS-based and RTT-proximity ground truth IP addresses (available via Impact).

The databases have surprisingly low accuracy at country-level. IP2Location-Lite and MaxMind DBs are comparable with accuracy ranging between between 77.5% to 78.6%. NetAcuity is better at 89.4%.

In terms of distribution of the geolocation error (city-level), IP2Location-Lite has high coverage but low accuracy. MaxMind-Paid has about 41% coverage and 52% city-level accuracy, which is about 11% and 5% better than MaxMind-GeoLite’s coverage and accuracy, respectively.  NetAcuity outperforms the other databases with near perfect coverage and about 73% accuracy.

Figure 1: Databases vs. ground truth geolocation error at city-level. The number of addresses in each CDF is enclosed in parenthesis. The error is computed as the distance between the location (the coordinates) from the database and the ground truth for each IP address.

Quantifying geolocation databases regional accuracy

When looking at country-level accuracy for each RIR region, NetAcuity is the most accurate database in all regions. We also observed that IP2Location-Lite and the two MaxMind databases’ country-level accuracy results are comparable in all regions except for APNIC, where IP2Location-Lite is significantly less accurate.

Note: the accuracy of the databases varies greatly from one country to another, especially the IP2location and the MaxMind’s databases — see the paper for full results.

Figure 2: Country-level accuracy breakdown by RIR for the ground truth IP addresses. Each column shows the number of correctly and incorrectly geolocated addresses. The percentage above each column shows the fraction of incorrectly geolocated addresses.

Finally, we evaluated the city-level accuracy by region — figures 3-6 show the distribution of geolocation errors for all databases with a breakdown by RIR.

 

Figure 3: IP2Location-Lite (99.7% of ground truth).Figure 3: IP2Location-Lite (99.7% of ground truth).
Figure4: MaxMind-GeoLite (30.4% of ground truth).Figure 4: MaxMind-GeoLite (30.4% of ground truth).

 

Figure 5: MaxMind-Paid (41.3% of ground truth data).Figure 5: MaxMind-Paid (41.3% of ground truth data).
Figure 6: NetAcuity (99.6% of ground truth data).Figure 6: NetAcuity (99.6% of ground truth data).

 

IP2Location-Lite has almost perfect city-level coverage in all regions but has relatively low accuracy, especially for ARIN addresses.

Apart from ARIN, MaxMind seems to provide city-level locations only when it has some confidence in them, which could explain their low city-level coverage and relatively good city-level accuracy.

NetAcuity shows high coverage for all regions with comparable or better accuracy results to other databases. Like the other databases, NetAcuity is least accurate in ARIN with only 69.2% accuracy, which is still about 30% better than that of the MaxMind-Paid.

Which database to use?

Overall, all geolocation databases have room to improve their router geolocation accuracy at both country- and city-level. Researchers and network operators need to be aware of inaccuracies and their impact on their results.

That said, if you intend to use one of the geolocation databases we tested, here are our recommendations:

  • NetAcuity – we recommend using this to geolocate routers if using a geolocation database is the only available option. NetAcuity has the best combination of coverage and accuracy across all regions.
  • MaxMind – we recommend using the commercial version of MaxMind over the public version if city-level accuracy and better coverage are required. That said, we don’t recommend MaxMind databases if high city-level accuracy and coverage are required — the city-level accuracy is especially bad in the ARIN region, however, we do see relatively good city-level results for MaxMind in the RIPE NCC and APNIC regions.
  • IP2Location-Lite – we don’t recommend this service given the overall accuracy is too low.

Note: These recommendations are based on the following assumptions: they are mostly meaningful in ARIN and RIPE NCC and APNIC regions, where most of our ground truth IP addresses are located, and NetAcuity might have benefitted from the nature of the DNS-based ground truth data — NetAcuity is the only database that shows clearly better city-level accuracy results over the DNS-based data compared to the RTT-proximity data in all regions — but we still argue that it has the best city-level accuracy and coverage as the results over both ground truth datasets show.

One last recommendation is that users must be extra careful when geolocating ARIN addresses at city-level regardless of the geolocation database used. For example, traceroute and latency measurement might be used to identify lucid databases geolocation errors.

We plan to extend this work in the future to include IPv6 addresses and we also plan to generate more ground truth data in all regions, especially for the AFRINIC and LACNIC Regional Internet Registries. If you’re interested in learning more, read our paper and let us know what you think below.

Contributors: Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos

Manaf Gharaibeh is a PhD Candidate at Colorado State University. His main area of interest is Internet measurement with a focus on IP geolocation research.

 

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

2 Comments

  1. Tim

    This is a biased research paper. The research was sponsored by NetAcuity for their commercial database. They are comparing Maxmind and IP2Location free edition instead of comparing their commercial edition.

    APNIC should published this important disclaimer inside this post or remove this post.

    Reply
    1. Manaf

      Tim, thanks for your comment. However, you got your facts wrong, as is clearly stated in the paper, the main message of this work is that all the studied geolocation databases are not reliable enough for router geolocation, and that researchers who use them for that purpose should be aware of the impact the inaccuracies might have on their research results. The post and the paper are clear about which databases are used in the study, which includes MaxMind GeoIP2 (the MaxMind commercial version). NetAcuity DID NOT sponsor this work, they only agreed to make their database available for free.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Please answer the math question * Time limit is exhausted. Please reload CAPTCHA.

Top