Opened 7 years ago

Closed 7 years ago

#2512 closed enhancement (implemented)

Provide CSV file with the number of running relays per country over time

Reported by: karsten Owned by: karsten
Priority: High Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We have been asked for a CSV file with the number of running relays per country over time. This CSV file would be downloaded once an hour and would contain numbers with a resolution of 1 hour. I assume it's okay to limit the CSV file to the last 7 or 14 days, because only the recent development will be visualized. Also, I assume it's okay if the CSV file download takes up to 1 minute, which might be necessary to generate it on demand.

The CSV file would most likely have the following format (example data):

datetime,country,relays
2011-02-08 11:00:00,de,385
2011-02-08 11:00:00,fr,332
2011-02-08 11:00:00,it,98
2011-02-08 11:00:00,us,434
...
2011-02-08 12:00:00,de,392
2011-02-08 12:00:00,fr,341
...

The CSV file would be generated on demand. The database is updated every hour between :15 and :25, so it should be safe to download the CSV file at :30 every hour.

In order to implement this, we'll have to

  • extend the metrics-db database schema to store a GeoIP database,
  • add a function get_country(address CHARACTER VARYING(15)) to resolve an IP address in dotted notation to a two-letter country code,
  • add a view that calculates aggregates for the CSV file, and
  • add a new CSV file type to metrics-web.

Future work would be to

  • turn the view into a materialized view and add a graph to the metrics website, and
  • update the GeoipDB regularly.

Child Tickets

Change History (4)

comment:1 Changed 7 years ago by arma

If you find yourself with extra time, exporting a rolling-24-hour-window csv of users-by-country would probably be exciting too.

As would figuring out some way to specify where on the map the relay is, rather than just "all US relays are in this one central dot in Nebraska" (as Vidalia does) or "let's mark the entire north half of Asia red to show that we have a relay in Russia"

comment:2 in reply to:  description Changed 7 years ago by arma

Replying to karsten:

Also, I assume it's okay if the CSV file download takes up to 1 minute, which might be necessary to generate it on demand.

[...]

The CSV file would be generated on demand. The database is updated every hour between :15 and :25, so it should be safe to download the CSV file at :30 every hour.

If it really takes a whole minute to generate, we might want to kick off an automatic generation request when the db is done being updated. Then the output would just be a static text file that people could fetch as many times as they wanted. That approach will probably reduce headaches for people who fetch it, since I imagine they're just whipping some scripts together too.

comment:3 Changed 7 years ago by karsten

Status: newassigned

I made some progress here by adding a GeoIP database including country codes, latitude, and longitude.

What's left is writing an SQL view and the R code to export CSV files.

Changing the CSV generation mechanism to something that pre-generates results would be a new ticket though. Right now, downloading CSV files means that the database has to generate them first.

comment:4 Changed 7 years ago by karsten

Resolution: implemented
Status: assignedclosed

The CSV output is now implemented, even based on a materialized view that is updated every three hours. This file contains daily averages of relays by country in the following format:

date,country,relays
2011-01-01,a1,3
2011-01-01,ae,0
2011-01-01,aq,1
2011-01-01,ar,5
2011-01-01,at,38
[...]

Closing this ticket.

Note: See TracTickets for help on using tickets.