Opened 6 years ago

Closed 5 years ago

#10331 closed enhancement (implemented)

Provide per-bridge usage statistics in Onionoo

Reported by: karsten Owned by: karsten
Priority: Low Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: phw, rndm, lunar@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Bridge operators often wonder if their bridge is useful and if they're actually helping users in censoring countries. In theory, we have data available that would tell them how many users connect to their bridge every day. We could include aggregate these data and make them available in Onionoo, similar to how we aggregate bandwidth histories and path-selection weights histories. Atlas, Globe, and other web applications could then visualize these statistics.

Here's the data we have, coming from an extra-info descriptor (only partially shown here):

extra-info bzoum C19309EB35EBC06CFDD5B6E5BED937184DF7D10C
dirreq-stats-end 2013-11-21 16:56:40 (86400 s)
dirreq-v3-resp ok=744,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=200,busy=0
bridge-ips ir=560,us=200,sy=192,??=184,my=64,ru=48,gb=40,in=40,de=24,fr=24,th=24,au=16,bd=16,br=16,ca=16,cn=16,eg=16,fi=16,it=16,jp=16,mx=16,nl=16,pk=16,ro=16,se=16,tr=16,a1=8,ae=8,af=8,at=8,be=8,bg=8,bh=8,bn=8,by=8,ch=8,cl=8,co=8,cr=8,cu=8,cy=8,cz=8,dk=8,dz=8,ee=8,es=8,ge=8,gr=8,gt=8,gy=8,hk=8,hu=8,id=8,ie=8,il=8,ke=8,kr=8,kz=8,lb=8,lu=8,lv=8,ma=8,mk=8,mu=8,ng=8,no=8,nz=8,om=8,pa=8,pe=8,ph=8,pl=8,pt=8,rs=8,sa=8,sg=8,sk=8,sn=8,tn=8,tw=8,ua=8,uz=8,ve=8,ye=8
bridge-ip-versions v4=1680,v6=0
bridge-ip-transports <OR>=8,obfs2=968,obfs3=712

The most relevant number is in line dirreq-v3-resp telling that this bridge gave out 744 consensuses to clients. We assume that clients make 10 such requests per day, so there were on average 74 clients connected to this bridge in the 24 hours until 2013-11-21 16:56:40 (from dirreq-stats-end line).

We can now multiply 74 with the fraction of connecting IP addresses coming from a given country from the bridge-ips line, e.g., 560 of 2104 IP addresses coming from Iran means 74 * 560 / 2104 = 20 clients. The same goes for 74 * 1680 / 1680 = 74 users using IP version 4 (from bridge-ip-versions line) and 74 * 712 / 1688 = 31 users using transport obfs3 (from bridge-ip-transports line).

All the math here is the same that we also use to estimate total user numbers in the network. The only difference is that, for total numbers in the network, we have to compensate for bridges not reporting statistics.

Note that we cannot apply this approach to relays, because not every relay is a directory mirror. So, a directory mirror would show many more daily clients that it actually has, whereas other relays would show no clients at all. Directory guards probably make this even more difficult for directory mirrors. But the described approach should work fine for bridges.

One problem that remains is that I don't know a good data format for these statistics. There are so many countries in the world that we cannot include graph data for every single one of them in Onionoo documents. Ideally, the new documents containing these usage statistics shouldn't be significantly larger than bandwidth or weights documents. I have no clue yet how to achieve that.

Another problem is that Atlas, Globe, and other clients would have to visualize these new data. Cc'ing phw and rndm to get some feedback on possible data formats.

Child Tickets

Change History (8)

comment:1 Changed 5 years ago by phw

Could we aggregate the usage statistics and perhaps only track which countries contributed to the aggregated statistics? Rather than saving the exact number for every country?

So instead of:

China 0 0 2 5 0
Iran  2 1 0 2 1
Syria 0 0 0 1 0
...

like this:

Total     2  1  2  8        1
Countries ir ir cn cn,ir,sy ir

comment:2 Changed 5 years ago by karsten

I like the idea of tracking totals and only giving out aggregate per-country details. But I see two drawbacks in your suggestion: 1) the number of countries contributing to statistics can be really long (see the bridge-ips line in the ticket description above); and more importantly, 2) I don't see how Atlas or Globe would visualize these per-country data.

Based on your suggestion, how about we provide per-country fractions over the entire time period? We would probably want to limit the list of countries to the top-10 or so and include the rest as "other". We could do the same for per-transport and per-IP-version fractions. Using your example data, that could be:

Total: 2, 1, 2, 8, 1
By-Country: cn=0.50, ir=0.43, sy=0.07 
By-Transport: <OR>=0.50, obfs3=0.35, obfs2=0.15
By-Version: v4=1.00

Atlas and Globe could visualize these data in a) a single time plot per time period and b) tables or pie charts with fractions per country, transport, or IP version.

Would that be useful?

comment:3 in reply to:  2 Changed 5 years ago by phw

Replying to karsten:

I like the idea of tracking totals and only giving out aggregate per-country details. But I see two drawbacks in your suggestion: 1) the number of countries contributing to statistics can be really long (see the bridge-ips line in the ticket description above); and more importantly, 2) I don't see how Atlas or Globe would visualize these per-country data.

You are right. Fractions are better for visualisation.

Total: 2, 1, 2, 8, 1
By-Country: cn=0.50, ir=0.43, sy=0.07 
By-Transport: <OR>=0.50, obfs3=0.35, obfs2=0.15
By-Version: v4=1.00

Atlas and Globe could visualize these data in a) a single time plot per time period and b) tables or pie charts with fractions per country, transport, or IP version.

Would that be useful?

I like it and I think it's a good compromise. Focusing on the top-n countries also sounds reasonable.

comment:4 Changed 5 years ago by lunar

Cc: lunar@… added

comment:5 Changed 5 years ago by karsten

Owner: set to karsten
Status: newassigned

Working on this.

comment:6 Changed 5 years ago by karsten

Very minor note, but just so it doesn't get lost: we'll probably want to include country fractions with at least 1% of users in the considered time period, not only the top-n countries.

comment:7 Changed 5 years ago by karsten

I finished a first version of this code today. Here's a sample clients document:

{"hashed_fingerprint":"DE6397A047ABE5F78B4C87AF725047831B221AAB",
"average_clients":{
"1_week":{"first":"2014-02-26 12:00:00","last":"2014-02-28 12:00:00","interval":86400,"factor":0.096168168,"count":3,"values":[999,949,985],"countries":{"ca":0.0273,"de":0.0103,"gb":0.0177,"in":0.0173,"ir":0.5179,"ru":0.0231,"sy":0.0607,"tr":0.0140,"us":0.0960},"transports":{"obfs2":0.9849,"obfs3":0.0120},"versions":{"v4":1.0000}},
"1_month":{"first":"2014-02-02 12:00:00","last":"2014-02-28 12:00:00","interval":86400,"factor":0.139049349,"count":27,"values":[384,371,354,349,374,432,503,485,458,493,536,null,509,524,576,607,622,null,635,509,566,774,999,945,690,656,681],"countries":{"??":0.1725,"cn":0.0187,"in":0.1801,"ir":0.2421,"ru":0.0101,"se":0.1733,"sy":0.0316,"us":0.0395},"transports":{"<??>":0.5471,"obfs2":0.4460},"versions":{"v4":1.0000}},
"3_months":{"first":"2013-12-03 12:00:00","last":"2014-02-28 12:00:00","interval":86400,"factor":0.139049349,"count":88,"values":[250,299,293,285,300,287,277,316,311,285,391,486,465,null,411,432,542,661,655,615,578,550,499,521,570,547,523,533,539,450,314,309,345,340,339,367,351,362,380,369,347,347,373,362,298,null,302,316,348,393,420,430,412,358,441,499,525,516,428,366,373,384,371,354,349,374,432,503,485,458,493,536,null,509,524,576,607,622,null,635,509,566,774,999,945,690,656,681],"countries":{"??":0.2935,"in":0.1843,"ir":0.1114,"nl":0.0261,"se":0.2938,"sy":0.0118,"us":0.0147},"transports":{"<??>":0.8312,"obfs2":0.1662},"versions":{"v4":1.0000}},
"1_year":{"first":"2013-05-14 00:00:00","last":"2014-02-28 00:00:00","interval":172800,"factor":0.123398448,"count":146,"values":[482,435,458,416,431,487,390,357,340,289,310,343,453,399,412,400,null,254,361,395,364,436,406,463,527,504,509,394,359,364,344,405,428,null,421,377,399,398,392,388,484,410,456,478,466,448,393,372,376,430,397,424,359,362,356,348,343,378,383,462,360,333,292,242,252,257,217,228,254,327,287,326,324,321,354,298,321,298,326,null,357,313,287,298,286,305,288,291,306,296,342,359,323,308,362,367,389,348,277,269,271,247,300,326,331,334,336,460,524,480,678,716,635,575,629,595,557,351,386,398,402,422,391,414,336,351,418,479,434,529,587,447,427,408,408,500,531,586,null,620,695,716,621,999,922,757],"countries":{"??":0.4255,"cn":0.0261,"gb":0.0221,"in":0.2083,"ir":0.0509,"nl":0.0744,"se":0.1419,"us":0.0111},"transports":{"<??>":0.9363,"obfs2":0.0628},"versions":{"v4":1.0000}}}
}

The data format is the same as for bandwidth and weights documents, except that there are three additional fields:

  • "countries": Object containing fractions of clients by country in the considered time period. Keys are two-letter lower-case country codes as found in a GeoIP database. Values are numbers between 0 and 1 standing for the fraction of clients by country. A country is only included if at least 1% of clients came from this country.
  • "transports": Object containing fractions of clients by transport in the considered time period. Keys are transport names, or "<OR>" for the default onion-routing transport protocol or "<??>" for unknown transports. Values are numbers between 0 and 1 standing for the fraction of clients by transport.
  • "versions": Object containing fractions of clients by IP version in the considered time period. Keys are either "v4" for IPv4 or "v6" for IPv6. Values are numbers between 0 and 1 standing for the fraction of clients by version.

Thoughts on the data format?

comment:8 Changed 5 years ago by karsten

Resolution: implemented
Status: assignedclosed

This is now implemented and deployed. Here's a sample clients document (https://onionoo.torproject.org/clients?lookup=DE6397A047ABE5F78B4C87AF725047831B221AAB) in the deployed format:

{"relays_published":"2014-03-11 07:00:00",
"relays":[
],
"bridges_published":"2014-03-11 06:37:04",
"bridges":[
{"fingerprint":"DE6397A047ABE5F78B4C87AF725047831B221AAB",
"average_clients":{
"1_week":{"first":"2014-03-04 12:00:00","last":"2014-03-10 12:00:00","interval":86400,"factor":0.077587187,"count":7,"values":[954,998,986,906,967,939,962],"countries":{"bd":0.0103,"de":0.0141,"gb":0.0238,"in":0.0188,"ir":0.5333,"ru":0.0188,"sy":0.0613,"tr":0.0104,"ua":0.0120,"us":0.0894},"transports":{"obfs2":0.9825,"obfs3":0.0141},"versions":{"v4":1.0000}},
"1_month":{"first":"2014-02-09 12:00:00","last":"2014-03-10 12:00:00","interval":86400,"factor":0.139049349,"count":30,"values":[485,458,493,536,null,null,524,576,607,622,null,635,null,566,774,999,945,690,656,674,683,650,590,532,557,550,505,539,524,536],"countries":{"ca":0.0110,"cn":0.0157,"gb":0.0145,"in":0.1103,"ir":0.3656,"ru":0.0144,"se":0.0999,"sy":0.0454,"us":0.0599},"transports":{"obfs2":0.6798},"versions":{"v4":1.0000}},
"3_months":{"first":"2013-12-09 12:00:00","last":"2014-03-10 12:00:00","interval":86400,"factor":0.139049349,"count":92,"values":[282,316,311,285,391,null,465,null,null,432,542,661,655,615,578,550,499,521,570,547,523,533,539,450,314,309,345,340,339,367,351,362,380,369,347,347,373,362,298,null,null,316,348,393,420,430,412,358,441,499,525,516,428,366,373,384,371,354,349,374,432,null,485,458,493,536,null,null,524,576,607,622,null,635,null,566,774,999,945,690,656,674,683,650,590,532,557,550,505,539,524,536],"countries":{"in":0.1610,"ir":0.1713,"nl":0.0141,"se":0.2574,"sy":0.0186,"us":0.0245},"transports":{"obfs2":0.2788},"versions":{"v4":0.9515}},
"1_year":{"first":"2013-05-14 00:00:00","last":"2014-03-10 00:00:00","interval":172800,"factor":0.123398448,"count":151,"values":[482,435,458,416,431,487,390,357,340,302,291,343,453,399,412,400,null,254,361,395,364,436,406,null,518,509,509,394,359,364,344,405,428,null,421,377,399,398,392,388,484,410,456,478,466,448,393,372,376,430,397,424,359,362,356,351,332,378,383,462,360,333,292,242,252,257,217,228,254,327,296,338,324,321,354,298,321,298,326,null,357,313,287,298,286,305,288,291,306,296,342,359,323,308,362,367,389,348,277,null,271,247,282,334,330,330,336,460,524,480,678,716,635,575,629,595,557,351,386,398,402,422,391,414,336,351,418,479,434,529,587,447,427,408,408,500,531,586,null,620,695,716,621,999,922,750,751,632,624,589,599],"countries":{"cn":0.0248,"gb":0.0221,"in":0.1986,"ir":0.0759,"nl":0.0708,"se":0.1348,"us":0.0149},"transports":{"obfs2":0.1098},"versions":{"v4":0.3749}}}
}
]}

I'm not convinced yet that the "countries", "transports", and "versions" fields are the way to go. They make these history objects quite distinct from the other history objects, which makes it harder to re-use code in Onionoo clients. I marked these fields as BETA in the protocol specification. Maybe we'll want to do something better there. Let's see.

Feature is implemented, changing it deserves a new ticket. closing.

Note: See TracTickets for help on using tickets.