Opened 5 months ago

Last modified 4 months ago

#26585 new enhancement

improve AS number and name coverage (switch maxmind to RIPE Stat)

Reported by: nusenu Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Onionoo currently uses maxmind for IP to as_number and as_name resolution.
This is fast as it is a local DB lookup but it is less up-to-date and has less coverage than RIPEstat https://stat.ripe.net/

This is a problem for tools that depend on onionoo's as_name and as_number fields like Relay Search, OrNetStats and OrNetRadar
(I don't know maybe there are also others that are affected?)

Currently it might takes weeks or months before new ASes get added to maxmind so this information
is also missing when people lookup relays on Relay Search.

As of today onionoo is missing AS level data for about 100 relays,
but this value depends on how far we are away from the last maxmind update.

How about we use RIPEstat API as a data source + local cache.

To minimize the amount of required online queries against the RIPEstat API we can do the following
to create the IP to AS map initially (pseudocode):

if ip_prefix in cache

use cached entry

else

perform an online lookup (query RIPEstat API)
add new prefix entry to cache

expire cache entries after 15 days?
(it makes sense to log how many entries changed
after 15 days so we know whether this value is to large or to small)

This will significantly reduce the amount of required online API calls.

To give you an idea of the scale (based on onionoo data from a random
day in May 2018)

total relay records: 8116
unique IPv4 addresses: 7794
unique IPv4 BGP prefixes: 3884

each day about 50 new relays appear,
lets assume the worst case (every new relay is not in an

So on the estimated daily amount of queries you do would be around
4000/15+50 = ~320 requests/day = 1 req every ~4 minutes
which appears acceptable.

IP to prefixes (this can return multiple matches), as_number and as_name lookup:
https://stat.ripe.net/data/related-prefixes/data.json?resource=103.114.160.21

IP to prefix and as number (no as_name) lookup:
https://stat.ripe.net/data/network-info/data.json?resource=103.114.160.21

ASN to as_name lookup:
https://stat.ripe.net/data/as-overview/data.json?resource=AS40676

documentation:
https://stat.ripe.net/docs/data_api

Child Tickets

TicketTypeStatusOwnerSummary
#27155enhancementnewmetrics-teamInclude BGP prefix information in details documents

Attachments (1)

comparison.csv (617.2 KB) - added by irl 5 months ago.
MaxMind and RIPEstat Comparison CSV

Download all attachments as: .zip

Change History (7)

comment:1 Changed 5 months ago by irl

For country codes, there are 321 relays where there are disagreement and 7837 in agreement (κ = 0.959 excluding relays for which MaxMind had no country code). There were no relays for which RIPEstat did not return a country code, but there were 21 relays for which MaxMind was missing a country code. This leaves 300 relays for which both MaxMind and RIPEstat had a country code but there was disagreement.

RIPEstat does return 7 relays with the country code "eu" and 1 relay with the country code "ap" for Europe and Asia/Pacific respectively. MaxMind have documentation indicating that they also use these codes, but did not return any results with these codes. In all of these cases, MaxMind did not have a country code.

Without ground truth to compare to, it is not possible to say whether MaxMind or RIPEstat are correct in the cases where there were disagreement. It is also possible that MaxMind and RIPEstat agree on a country code that is incorrect.

For AS numbers, there are 269 relays where there are disagreement and 7889 in agreement (κ = 0.979 excluding relays for which either MaxMind or RIPEstat had no AS number). There were 101 relays for which MaxMind did not return an AS number and 2 relays for which RIPEstat did not return an AS number. Both of the relays for which RIPEstat did not return an AS number were in the 1.0.0.0/8 BGP prefix which has the "cn" country code for RIPEstat, but the "au" country code from MaxMind. MaxMind placed these relays in AS 4804.

It is not clear to me what our threshold on agreement should be. As the MaxMind database is distributed to users and can be used, for example, to disable/prefer the use of exit relays in specific countries, it may be dangerous to users if they get mixed information about the country code assigned to relays. It may be equally dangerous to incorrectly assign country codes, but without ground truth to compare to it is not possible to say whether a switch would improve that situation or not.

We should conduct an analysis of the different databases and feeds available to us, to determine which best fits our requirements. As for querying RIPEstat, I have a tool which I have used in the above analysis and would make it easier to integrate this into Onionoo if we were to choose to integrate data from RIPEstat.

I don't believe we should consider outright replacing MaxMind with RIPEstat for the reason that we distribute this to end clients and we need a database that we can do this with, but I can see that having additional information when MaxMind does not have any information, and also to add the BGP prefix information (finer grained topology information than just AS) would be valuable to some users.

What do you think about the addition of two new fields: 'country_source' and 'as_source' to indicate the source of country/as information? We could then supplement the MaxMind data with data from RIPEstat where MaxMind does not have the information while being able to make it clear to users where the information has come from if that is important to them.

We could also additionally add a 'bgp_prefix' field with prefix data from RIPEstat.

Changed 5 months ago by irl

Attachment: comparison.csv added

MaxMind and RIPEstat Comparison CSV

comment:2 Changed 5 months ago by irl

Keywords: metrics-geoip added

comment:3 in reply to:  1 Changed 5 months ago by cypherpunks

Keywords: metrics-geoip removed

Replying to irl:

For country codes

This ticket is about AS data (number and name).

(RIPEstat uses maxmind itself, so it does not make much sense to use RIPEstat for geolocation
to replace maxmind?)
https://stat.ripe.net/docs/data_api#Geoloc

2 relays for which RIPEstat did not return an AS number

when posting numbers, please include specific APIs used and with what parameters (as some RIPEstat APIs filter prefixes with low visibility by default)

It is not clear to me what our threshold on agreement should be. As the MaxMind database is distributed to users and can be used, for example, to disable/prefer the use of exit relays in specific countries,

onionoo data is not directly used by tor client daemons in any way (or to disable specific exits)

We should conduct an analysis of the different databases and feeds available to us, to determine which best fits our requirements.

what are the specific onionoo requirements?

As for querying RIPEstat, I have a tool which I have used in the above analysis

Note, that canid does not use the specific APIs listed in the first post of this ticket according to their README on github.

I don't believe we should consider outright replacing MaxMind with RIPEstat for the reason that we distribute this to end clients

what is a 'end client' in the onionoo context? is RS an end client?

to make it clear to users where the information has come from if that is important to them.

transparency is always positive

This ticket is the preparation to add BGP ROA enums in the future.

comment:4 Changed 5 months ago by irl

https://bgpstream.caida.org/docs/api/broker could also provide further information.

comment:5 Changed 4 months ago by nusenu

from practically using RIPEstat I can now say that
https://stat.ripe.net/data/network-info/data.json
is more reliable in terms of "has data for every IP we asked for" than
https://stat.ripe.net/data/related-prefixes/data.json
so the former is preferred.

We should also specify how we deal with IPs announced by multiple ASes.

Last edited 4 months ago by nusenu (previous) (diff)

comment:6 Changed 4 months ago by nusenu

related: #27235

Note: See TracTickets for help on using tickets.