Opened 7 years ago

Closed 6 years ago

#8462 closed task (implemented)

Implement new bridge user counting algorithm (was Why don't .ir bridge users fall off when Tor gets censored by DPI?)

Reported by: arma Owned by: karsten
Priority: Medium Milestone:
Component: Metrics Utilities Version:
Severity: Keywords:
Cc: karsten, nikita@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

https://metrics.torproject.org/users.html?graph=direct-users&start=2012-12-01&end=2013-03-13&country=ir&events=points#direct-users sees a pretty clear fall-off. But https://metrics.torproject.org/users.html?graph=bridge-users&start=2012-12-01&end=2013-03-13&country=ir#bridge-users doesn't show any sort of drop-off.

Similarly, https://metrics.torproject.org/users.html?graph=direct-users&start=2012-12-01&end=2013-03-13&country=sy&events=points#direct-users sees very clear blocking of Tor users, and I assume it's by DPI also (actually I've been assuming it's all SSL), but then the bridge graph doesn't look at all like this: https://metrics.torproject.org/users.html?graph=bridge-users&start=2012-12-01&end=2013-03-13&country=sy#bridge-users

Is our new bridge user counting algorithm wrong in some way? Or are the connections arriving at the bridges, and they're counting them, even if the SSL handshake doesn't finish? Or does it finish but just not proceed past that?

At this point I'm avoiding showing the bridge graphs to funders, since they make no sense to me.

Child Tickets

Attachments (3)

userstats-ir-2013-04-24.png (55.5 KB) - added by karsten 7 years ago.
userstats-ir-2013-04-24-2.png (53.7 KB) - added by karsten 7 years ago.
results-delay-2013-05-06.png (119.8 KB) - added by karsten 7 years ago.

Download all attachments as: .zip

Change History (13)

comment:1 Changed 7 years ago by arma

Cc: karsten added

comment:2 Changed 7 years ago by karsten

The bridge graphs are still based on the old bridge user counting algorithm. That means we simply show the sum of all unique connecting IP addresses over 24 hours. My guess is that we either count IPs even if the SSL handshake doesn't finish, or that we also count IPs of connections to the unencrypted Dir port. The new bridge user counting algorithm would probably show the same pattern as the direct user graphs. Somebody should implement it.

comment:3 Changed 7 years ago by arma

Component: AnalysisMetrics Utilities
Summary: Why don't .ir bridge users fall off when Tor gets censored by DPI?Implement new bridge user counting algorithm (was Why don't .ir bridge users fall off when Tor gets censored by DPI?)
Type: defecttask

Changed 7 years ago by karsten

Attachment: userstats-ir-2013-04-24.png added

comment:4 Changed 7 years ago by karsten

Owner: set to karsten
Status: newassigned

I implemented the new user counting algorithm for bridges (and relays). See the attached graph for Iranian Tor users in 2013. The two lines are based on the same user-counting algorithm that is based on directory requests. Does that graph make sense to you?

comment:5 Changed 7 years ago by rransom

The ‘bridge’ subgraph of userstats-ir-2013-04-24.png does not have a y=0 line. It should.

Changed 7 years ago by karsten

comment:6 in reply to:  5 Changed 7 years ago by karsten

Replying to rransom:

The ‘bridge’ subgraph of userstats-ir-2013-04-24.png does not have a y=0 line. It should.

You're right. See the updated graph.

Changed 7 years ago by karsten

comment:7 Changed 7 years ago by karsten

An important requirement for the new implementation of our user-counting algorithm was to reduce the delay between observing and reporting Tor usage. The earlier we know about user numbers dropping off in a given country or using a given pluggable transport the better we can respond.

I attached a graph visualizing this delay for relays (bridges are expected to be similar). This graph probably needs some explanation:

  • Uptime is the information how many relays were running on a given day on average. The left-most line shows relay uptime for April 4: it starts at 00:00 (all times are UTC) of April 4 and reaches its maxium at 23:00. The reason is simply that we learn about uptime from consensuses, and these are published once per hour.
  • Bytes are bandwidth histories published by relays in their extra-info descriptors. Similarly, the April 4 line starts early on April 4, but it reaches its maximum at about 18:00 of April 5. The reason is that it can take up to 18 hours for some relays to publish their next descriptor containing their bandwidth history.
  • Responses are directory request statistics published by relays. Here, we see that it can take until 12:00 of April 6 to report all responses for April 4. The reason for this is that directory request statistics intervals are 24 hours long and can end at any time of the day, possibly on April 4 at 23:59. And then it takes another 12 hours (possibly even 18 hours) for relays to publish their next descriptor.
  • Frac is the fraction of responses, weighted by bandwidth, that are available to estimate daily Tor users. The higher this fraction the better the estimation. A value of 0.1 might be a fine lower limit for believing results at least a little bit and a value of 0.5 means results are about as rock-solid as this estimation can ever be. 0.1 is reached during the evening of April 4, 0.5 at the end of April 5.

tl;dr: we can make our very first guess about Tor users on a given day by the evening of that day and will have credible numbers by the end of the next day.

comment:8 Changed 7 years ago by karsten

New graphs on estimated daily Tor users by country, transport, or IP version available: https://metrics.torproject.org/users.html#userstats

comment:9 Changed 7 years ago by nikita

Cc: nikita@… added

comment:10 Changed 6 years ago by karsten

Resolution: implemented
Status: assignedclosed

The code produced for this ticket is running for a couple of months now, and graphs have replaced the earlier user number estimates on https://metrics.torproject.org/users.html. Closing as implemented.

Note: See TracTickets for help on using tickets.