Bring back the relays-by-country graph

added component::metrics/website owner::metrics-team priority::low severity::normal status::assigned type::enhancement labels

Trac:
Type: defect to enhancement
Summary: Fix and re-enable relays-by-country graph on metrics website to Bring back the relays-by-country graph
Sponsor: N/A to N/A
Severity: N/A to Normal
Reviewer: N/A to N/A

Handing over to metrics-team, because I'm not currently working on this.

Trac:
Status: new to assigned
Owner: karsten to metrics-team

What kind of resources are required?

Can a dedicated server allocated only to do this task help to bring back the relays-by-country graphs?

Trac:
Cc: N/A to anadahz

Replying to anadahz:

What kind of resources are required?

Can a dedicated server allocated only to do this task help to bring back the relays-by-country graphs?

Unfortunately, it's not just a question of hardware. The code used for the blog post is good enough to run it once for a blog post, but it needs more work for being run periodically. Here are a few issues:

Every time this code runs, it processes all descriptors in the in/ directory. In a production environment we'd want it to skip descriptors it has processed before and use previously processed aggregations from them.
Updating geoip files is a manual steps. In fact, we're currently using the very same geoip file in a graph covering years of data. We'll need to find a way for automating updating geoip files. And we need to define which geoip file we're using for any given consensus. That last sentence alone is far from being trivial if we want to ensure that two people have a chance to independently produce the same graph.
Everything here works with files, but we'll want to use a database, or we'll be sad whenever the server reboots in the wrong moment. And we want the database schema to scale for the next five years.

Replying to karsten:

Replying to anadahz:

What kind of resources are required?

Can a dedicated server allocated only to do this task help to bring back the relays-by-country graphs?

Unfortunately, it's not just a question of hardware. The code used for the blog post is good enough to run it once for a blog post, but it needs more work for being run periodically. Here are a few issues:

Every time this code runs, it processes all descriptors in the in/ directory. In a production environment we'd want it to skip descriptors it has processed before and use previously processed aggregations from them.

Updating geoip files is a manual steps. In fact, we're currently using the very same geoip file in a graph covering years of data. We'll need to find a way for automating updating geoip files. And we need to define which geoip file we're using for any given consensus. That last sentence alone is far from being trivial if we want to ensure that two people have a chance to independently produce the same graph.

Aren't these the same GeoIP files as the ones used for Tor metrics currently?

Everything here works with files, but we'll want to use a database, or we'll be sad whenever the server reboots in the wrong moment. And we want the database schema to scale for the next five years.

Nonetheless do you think that these issues can be created as separate sub-tickets?

Replying to anadahz:

Replying to karsten:

Updating geoip files is a manual steps. In fact, we're currently using the very same geoip file in a graph covering years of data. We'll need to find a way for automating updating geoip files. And we need to define which geoip file we're using for any given consensus. That last sentence alone is far from being trivial if we want to ensure that two people have a chance to independently produce the same graph.

Aren't these the same GeoIP files as the ones used for Tor metrics currently?

Well, Onionoo uses the latest of these GeoIP files in MaxMind's format. But nothing else in Tor Metrics uses these files. Nothing of this is hard, it's just a couple substeps that need to be done.

Everything here works with files, but we'll want to use a database, or we'll be sad whenever the server reboots in the wrong moment. And we want the database schema to scale for the next five years.

Nonetheless do you think that these issues can be created as separate sub-tickets?

Not really. These were just some examples, not a list of things that need to be done to resolve this ticket. I'd like to leave the implementation steps to whoever implements this.

mentioned in issue #8279 (moved)

mentioned in issue #17736 (moved)

Bring back the relays-by-country graph

Child items 0

Activity