Opened 3 weeks ago

Last modified 3 weeks ago

#28529 new task

Confirm that the strange onionoo flood is resolved

Reported by: arma Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Normal Keywords:
Cc: teor, pastly, n8fr8, dgoulet, sysrqb, irl Actual Points:
Parent ID: Points:
Reviewer: Sponsor: Sponsor19

Description

Context part one: Georgetown and NRL et al have a paper at IMC 2018 that found "that ~40% of the sites accessed over Tor have a torproject.org domain name". At the time it was a mystery why.

Context part two: pastly or teor or somebody in Mexico City realized that Orbot had a design mistake where it did an onionoo lookup (over Tor) of the exit relay for its circuit, rather than shipping its own geoip db.

Context part two-b: They also discovered that around the time of that feature roll-out in Orbot, our onionperf graphs start to look uglier especially in terms of performance variance.

Context part three: I think Nathan thinks he fixed the issue in the "Orbot 16.0.3-BETA-2" release on Oct 12. I remember we tracked down what we thought was the fix to a commit that had an unrelated commit message.

Today I saw another Orbot release mail, for "Orbot 16.0.5-RC-1-tor-0.3.4.9".

So: is there a graph or stats somewhere about adoption of Orbot versions, to get a handle on how many Orbot users have this fixed version vs how many don't?

And, can we see any changes on the onionperf results as the fix got rolled out?

And, once we think the fix has been rolled out to most Orbot users, could we do another privcount run on some exits to see if the anomaly went away?

Child Tickets

Change History (10)

comment:1 Changed 3 weeks ago by arma

(If somebody wants to put this into a better trac component, please do.)

Also, if anybody has graphs or pointers or supporting facts for any of my vague statements above, to get us closer to having some specifics here, that would be grand.

David spoke earlier of doing some "Tor network retrospectives" when weird things happen and we track them down and resolve them, so we can document some case studies for posterity. That might be a fun plan here too.

comment:2 Changed 3 weeks ago by arma

Sponsor: Sponsor19

comment:3 in reply to:  description Changed 3 weeks ago by teor

Replying to arma:

Context part one: Georgetown and NRL et al have a paper at IMC 2018 that found "that ~40% of the sites accessed over Tor have a torproject.org domain name". At the time it was a mystery why.

onionoo was 40% of the domains accessed by the first stream on each circuit, excluding circuits where the first stream used an IP address.

Page 6 of https://www.ohmygodel.com/publications/tor-usage-imc18.pdf

Context part two: pastly or teor or somebody in Mexico City realized that Orbot had a design mistake where it did an onionoo lookup (over Tor) of the exit relay for its circuit, rather than shipping its own geoip db.

Matt Finkel discovered the relevant code in Orbot:
https://gitweb.torproject.org/orbot.git/tree/orbotservice/src/main/java/org/torproject/android/service/TorEventHandler.java#n226

This code performs an Onionoo lookup for every relay in every circuit
built by Tor. Each lookup creates one stream, and possibly one circuit
(if the previous Onionoo circuit has timed out).

Tor always tries to keep 6 preemptive circuits open, leading to 6*3 = 18
lookups when Orbot starts.
Every new circuit triggers another 3 lookups, one for each relay in the
circuit. (There is no caching, as far as I can see.)

Context part two-b: They also discovered that around the time of that feature roll-out in Orbot, our onionperf graphs start to look uglier especially in terms of performance variance.

The feature was added in September 2016:
https://gitweb.torproject.org/orbot.git/tree/orbotservice/src/main/java/org/torproject/android/service/TorEventHandler.java#n25

The initial version of the code was correlated with a significant rise
in circuit download times across the whole Tor network:
https://metrics.torproject.org/torperf.html?start=2015-01-01&end=2018-09-30&source=all&server=public&filesize=50kb

(I'm not sure why download times became much more consistent in
mid-2017. Perhaps the Orbot code changed, or we improved tor network
performance another way.)

Context part three: I think Nathan thinks he fixed the issue in the "Orbot 16.0.3-BETA-2" release on Oct 12. I remember we tracked down what we thought was the fix to a commit that had an unrelated commit message.

Today I saw another Orbot release mail, for "Orbot 16.0.5-RC-1-tor-0.3.4.9".

So: is there a graph or stats somewhere about adoption of Orbot versions, to get a handle on how many Orbot users have this fixed version vs how many don't?

Good question.

And, can we see any changes on the onionperf results as the fix got rolled out?

I see no changes yet, but they are beta and RC releases.

https://metrics.torproject.org/torperf.html?start=2018-09-01&end=2018-11-20&source=all&server=public&filesize=50kb

And, once we think the fix has been rolled out to most Orbot users, could we do another privcount run on some exits to see if the anomaly went away?

I no longer control any exits, but the rest of the deployment might still exist.
I still have the configurations for all the measurements we did for the paper.

comment:4 Changed 3 weeks ago by arma

Cc: sysrqb added

comment:5 Changed 3 weeks ago by sysrqb

Yep. It looks like this was disabled and it is included in the new RC:
https://gitweb.torproject.org/orbot.git/commit/?id=bcae0035532ef214ef015bd4cf26ec87400a24bc

The private class ExternalIPFetcher is now commented out.

comment:6 Changed 3 weeks ago by arma

Cc: irl added
Summary: Confirm that Orbot geoip lookup flaw is resolvedConfirm that the strange onionoo flood is resolved

Switching the ticket to a more productive destination: let's convince ourselves that the onionoo flood is resolved by this bugfix.

And that makes me realize another angle we can look at: the onionoo webserver has a varnish in front of it to handle the request flood (what's the right ticket for referencing that flood and our decision to put a varnish front-end in place? #15766?). Can we get some graphs of how that flood is going?

If we don't have that data, let's start keeping it, so we can be in a position to answer this mystery from both sides.

comment:7 Changed 3 weeks ago by n8fr8

Yes, it is done!

comment:8 Changed 3 weeks ago by n8fr8

So: is there a graph or stats somewhere about adoption of Orbot versions, to get a handle on how many >Orbot users have this fixed version vs how many don't?

I can check in on Google Play stats in a few days to see where we are in upgrades to the new release. It has been pushed out to all 2.5M active users.

comment:9 Changed 3 weeks ago by n8fr8

As of right now 1,082,156 users have upgrade to the latest version.

comment:10 Changed 3 weeks ago by irl

These graphs came from a service (munin) that no longer exists.

Note: See TracTickets for help on using tickets.