When upgrading Briar's Tor binaries from 0.2.9.16 to 0.3.4.8, we noticed a difference in Tor's startup time on older Android phones.
Measuring the startup time of several recent Tor versions revealed an interesting pattern. The time that elapses between starting the Tor process and the creation of the authentication cookie file hasn't changed across versions, but the time between the creation of the cookie file and the response to the AUTHENTICATE command has changed substantially. (Briar sends the AUTHENTICATE command as soon as the cookie file's created.)
I measured five runs of each version on a Motorola Moto G 4G running Android 5.1. Here are the min and max times in seconds for each version:
= Tor version =
= Min =
= Max =
0.2.9.16
3.5
4.3
0.2.9.17
3.5
4.2
0.3.0.1-alpha
4.8
13.0
0.3.1.1-alpha
9.9
16.2
0.3.2.1-alpha
15.3
19.9
0.3.3.1-alpha
15.8
18.5
0.3.4.1-alpha
16.1
18.4
0.3.4.8
16.2
20.9
0.3.4.9
16.1
23.7
The min and max have both increased substantially since 0.2.9, and the distribution has widened. This is having a noticeable impact on how long it takes for Briar to connect to contacts when the app's started.
I'll repeat these experiments on Linux x64 to see whether this is Android-specific.
Trac: Username: akwizgran
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Results on Linux x86_64 are similar, although the device is much faster so all the times are scaled down:
= Tor version =
= Min =
= Max =
0.2.9.16
0.29
0.51
0.2.9.17
0.30
0.44
0.3.0.1-alpha
0.64
0.88
0.3.1.1-alpha
0.73
0.95
0.3.2.1-alpha
0.88
1.05
0.3.3.1-alpha
0.85
1.37
0.3.4.1-alpha
0.77
9.46 (outlier)
0.3.4.8
0.88
1.18
0.3.4.9
0.99
1.50
0.3.5.5-alpha
0.54
0.75
There doesn't seem to be a corresponding change in CPU time (user or system). Any thoughts on what might be causing this, and whether we can expect it to get better or worse in future versions?
Hm, interesting. That result suggests that the problem is indeed CPU-based, if it scales that cleanly with with a faster CPU. TBH that wasn't what I had expected; I had expected something far more complicated.
In this case, a CPU profile would probably be useful.
I've done a quick-and-dirty startup profile to try to track this down, using a hack branch called "exit_immediate" in my public repository. It seems that the biggest offender in 0.3.5 is stuff under "router_reload_consensus_networkstatus". It looks like we're spending our time re-parsing and re-validating stuff from storage.
Can you confirm that your results are significantly slow if Tor starts with cached-* files?
We already made some improvements here in 0.4.0, with #27225 (moved). I think we can do better, though.
I've made some callgraphs with perf, and I'm attaching them here. To make them I used my "exit_immediate" branch with:
To follow up: what would be most useful for me at this points is cpu profiling of tor on Android for this use case, if you can get that: I want to confirm that the hotspots on Android are the same as on my desktop.
I'll run some tests with and without cached files, and look into what kind of profiling I can do on Android.
In the meantime here's some more data from the tests I already ran, which seems to show there isn't a significant difference in CPU usage across versions (or at least, nothing big enough to explain the difference in startup time). This was measured on Linux x86_64 with bash's time builtin. Min and max over 5 runs per version, as before.
User time:
= Tor version =
= Min =
= Max =
0.2.9.16
3.55
4.02
0.2.9.17
3.50
3.88
0.3.0.1-alpha
3.54
3.79
0.3.1.1-alpha
3.56
3.87
0.3.2.1-alpha
3.44
3.92
0.3.3.1-alpha
3.49
3.86
0.3.4.1-alpha
3.52
3.78
0.3.4.8
3.53
3.76
0.3.4.9
3.57
3.96
0.3.5.5-alpha
3.43
3.78
System time:
= Tor version =
= Min =
= Max =
0.2.9.16
0.46
0.50
0.2.9.17
0.39
0.56
0.3.0.1-alpha
0.44
0.56
0.3.1.1-alpha
0.38
0.52
0.3.2.1-alpha
0.44
0.54
0.3.3.1-alpha
0.40
0.52
0.3.4.1-alpha
0.40
0.57
0.3.4.8
0.42
0.57
0.3.4.9
0.39
0.46
0.3.5.5-alpha
0.42
0.54
Real time:
= Tor version =
= Min =
= Max =
0.2.9.16
30.37
30.40
0.2.9.17
30.38
30.39
0.3.0.1-alpha
30.38
30.39
0.3.1.1-alpha
30.38
30.39
0.3.2.1-alpha
30.37
30.39
0.3.3.1-alpha
30.37
30.39
0.3.4.1-alpha
30.38
30.39
0.3.4.8
30.38
30.39
0.3.4.9
30.38
30.39
0.3.5.5-alpha
30.37
30.39
These are from the same runs as the startup times above, so we'd be looking for a difference of about 0.7 seconds to explain the difference in startup time. I'm not seeing anything on that scale, or any overall pattern. But you're right that the difference in scale between these results and the Android results suggests that some resource provided by the device is the bottleneck.
Could it be disk rather that CPU that's the bottleneck resource? That would fit with your theory about cached files. I'll get back to you when I have those results.
Hm, that's quite confusing to me. I think that our measurements are different enough that we may be measuring the time between different points in the startup process. I'm measuring from the time Tor is started to the time that the initialization is done and we enter run_main_loop_until_done(). I had assumed that this would be the same as what you were measuring, but now I wonder.
Do you have a script or something I can use to reproduce the start/stop events that you are using? I don't have an easy script that lets me launch a control connection and send AUTHENTICATE as soon as the cookie file is created, but I'm guessing you do?
I'm not at work today, but I'll post the scripts and build instructions on Monday. I'm afraid my script is based on a headless version of Briar with the Tor binary embedded in the JAR, so it will take some setup, but Gradle should download everything you need except the JDK.
I re-ran the experiments last night, deleting the cache-* files before each run. That reduces the min startup time by a factor of 10 and seems to eliminate the difference between Tor versions, or at least push it down into the noise. I haven't analysed the results properly yet though; I'll post them on Monday.
Startup times for Linux x64_64 with cache-* files deleted:
= Tor version =
= Min =
= Max =
0.2.9.16
0.051
0.27
0.2.9.17
0.072
0.14
0.3.0.1-alpha
0.058
1.02 (outlier)
0.3.1.1-alpha
0.067
0.20
0.3.2.1-alpha
0.044
0.17
0.3.3.1-alpha
0.055
0.39
0.3.4.1-alpha
0.053
0.30
0.3.4.8
0.063
0.38
0.3.4.9
0.056
0.54
0.3.5.5-alpha
0.048
0.14
The times are much faster than before, and it looks like the trend across versions may have disappeared, though it's hard to judge given the amount of noise.
I've reworked my scripts to remove the Briar dependency; the results are still the same. I'm attaching the following files:
native.sh - bash script that starts Tor, authenticates to the control port and stops Tor, printing the time (seconds and nanoseconds since the epoch) at each step. Specify the location of the Tor binary (and dynamic libs if required) with TOR_DIRECTORY. Requires GNU sleep, hexdump, expect, nc.
run-native.sh - bash script that runs native.sh with binaries for various Tor versions, 5 runs per version. Specify the location of the binaries with TOR_BINARIES_DIRECTORY.
startup-time-native.pl - Perl script that extracts elapsed times from a log produced by run-native.sh. Feed the log to stdin.
I used statically linked binaries built with tor-reproducer. native.sh should also work with Tor Browser binaries, but you'll need to copy the dynamic libs into the cwd before running run-native.sh.