Opened 8 months ago

Last modified 4 weeks ago

#28481 assigned defect

Tor's startup time is getting slower on Android

Reported by: akwizgran Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.4.9
Severity: Normal Keywords: android startup performance controller
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When upgrading Briar's Tor binaries from 0.2.9.16 to 0.3.4.8, we noticed a difference in Tor's startup time on older Android phones.

Measuring the startup time of several recent Tor versions revealed an interesting pattern. The time that elapses between starting the Tor process and the creation of the authentication cookie file hasn't changed across versions, but the time between the creation of the cookie file and the response to the AUTHENTICATE command has changed substantially. (Briar sends the AUTHENTICATE command as soon as the cookie file's created.)

I measured five runs of each version on a Motorola Moto G 4G running Android 5.1. Here are the min and max times in seconds for each version:

Tor version Min Max
0.2.9.163.54.3
0.2.9.173.54.2
0.3.0.1-alpha4.813.0
0.3.1.1-alpha9.916.2
0.3.2.1-alpha15.319.9
0.3.3.1-alpha15.818.5
0.3.4.1-alpha16.118.4
0.3.4.816.220.9
0.3.4.916.123.7

The min and max have both increased substantially since 0.2.9, and the distribution has widened. This is having a noticeable impact on how long it takes for Briar to connect to contacts when the app's started.

I'll repeat these experiments on Linux x64 to see whether this is Android-specific.

Child Tickets

TicketStatusOwnerSummaryComponent
#28837closednickmUse openssl's SHA3 implementations when they are faster.Core Tor/Tor
#28838closednickmMake curve25519_basepoint_spot_check() fasterCore Tor/Tor
#28839closednickmSpeed up parsing RSA onion keys from microdescriptorsCore Tor/Tor
#28851closednickmDo we really need to check our hardwired DH primes at startup?Core Tor/Tor
#28852closednickmRemove memset() from get_token_arguments.Core Tor/Tor
#28853closednickmparse_short_policy() could be fasterCore Tor/Tor
#28856closednickmDiscard strcmp_len()Core Tor/Tor

Attachments (4)

tor-init-callgraphs-20181212.tar.bz2 (32.3 KB) - added by nickm 7 months ago.
native.sh (1.2 KB) - added by akwizgran 7 months ago.
run-native.sh (629 bytes) - added by akwizgran 7 months ago.
startup-time-native.pl (723 bytes) - added by akwizgran 7 months ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 8 months ago by nickm

Milestone: Tor: 0.4.0.x-final

Thanks for the report! Could you also try on 0.3.5?

comment:2 Changed 8 months ago by akwizgran

0.3.5.5-alpha (patched for #28286) looks a bit better:

Tor version Min Max
0.3.5.5-alpha11.414.4

comment:3 Changed 7 months ago by akwizgran

Results on Linux x86_64 are similar, although the device is much faster so all the times are scaled down:

Tor version Min Max
0.2.9.160.290.51
0.2.9.170.300.44
0.3.0.1-alpha0.640.88
0.3.1.1-alpha0.730.95
0.3.2.1-alpha0.881.05
0.3.3.1-alpha0.851.37
0.3.4.1-alpha0.779.46 (outlier)
0.3.4.80.881.18
0.3.4.90.991.50
0.3.5.5-alpha0.540.75

There doesn't seem to be a corresponding change in CPU time (user or system). Any thoughts on what might be causing this, and whether we can expect it to get better or worse in future versions?

Changed 7 months ago by nickm

comment:4 Changed 7 months ago by nickm

Hm, interesting. That result suggests that the problem is indeed CPU-based, if it scales that cleanly with with a faster CPU. TBH that wasn't what I had expected; I had expected something far more complicated.

In this case, a CPU profile would probably be useful.

I've done a quick-and-dirty startup profile to try to track this down, using a hack branch called "exit_immediate" in my public repository. It seems that the biggest offender in 0.3.5 is stuff under "router_reload_consensus_networkstatus". It looks like we're spending our time re-parsing and re-validating stuff from storage.

Can you confirm that your results are significantly slow if Tor starts with cached-* files?

We already made some improvements here in 0.4.0, with #27225. I think we can do better, though.

I've made some callgraphs with perf, and I'm attaching them here. To make them I used my "exit_immediate" branch with:

perf record -g --call-graph dwarf ./src/app/tor -ExitImmediatly 1 
perf report -g --stdio > filename.callgraph

If my hunch is right, then we want to be looking for savings opportunities in the "with-cache" case for 0.4.0.

comment:5 Changed 7 months ago by nickm

To follow up: what would be most useful for me at this points is cpu profiling of tor on Android for this use case, if you can get that: I want to confirm that the hotspots on Android are the same as on my desktop.

comment:6 Changed 7 months ago by nickm

Parent ID: #26630

comment:7 Changed 7 months ago by nickm

Sponsor: Sponsor8-can

comment:8 Changed 7 months ago by akwizgran

I'll run some tests with and without cached files, and look into what kind of profiling I can do on Android.

In the meantime here's some more data from the tests I already ran, which seems to show there isn't a significant difference in CPU usage across versions (or at least, nothing big enough to explain the difference in startup time). This was measured on Linux x86_64 with bash's time builtin. Min and max over 5 runs per version, as before.

User time:

Tor version Min Max
0.2.9.163.554.02
0.2.9.173.503.88
0.3.0.1-alpha3.543.79
0.3.1.1-alpha3.563.87
0.3.2.1-alpha3.443.92
0.3.3.1-alpha3.493.86
0.3.4.1-alpha3.523.78
0.3.4.83.533.76
0.3.4.93.573.96
0.3.5.5-alpha3.433.78

System time:

Tor version Min Max
0.2.9.160.460.50
0.2.9.170.390.56
0.3.0.1-alpha0.440.56
0.3.1.1-alpha0.380.52
0.3.2.1-alpha0.440.54
0.3.3.1-alpha0.400.52
0.3.4.1-alpha0.400.57
0.3.4.80.420.57
0.3.4.90.390.46
0.3.5.5-alpha0.420.54

Real time:

Tor version Min Max
0.2.9.1630.3730.40
0.2.9.1730.3830.39
0.3.0.1-alpha30.3830.39
0.3.1.1-alpha30.3830.39
0.3.2.1-alpha30.3730.39
0.3.3.1-alpha30.3730.39
0.3.4.1-alpha30.3830.39
0.3.4.830.3830.39
0.3.4.930.3830.39
0.3.5.5-alpha30.3730.39

These are from the same runs as the startup times above, so we'd be looking for a difference of about 0.7 seconds to explain the difference in startup time. I'm not seeing anything on that scale, or any overall pattern. But you're right that the difference in scale between these results and the Android results suggests that some resource provided by the device is the bottleneck.

Could it be disk rather that CPU that's the bottleneck resource? That would fit with your theory about cached files. I'll get back to you when I have those results.

comment:9 Changed 7 months ago by nickm

Hm, that's quite confusing to me. I think that our measurements are different enough that we may be measuring the time between different points in the startup process. I'm measuring from the time Tor is started to the time that the initialization is done and we enter run_main_loop_until_done(). I had assumed that this would be the same as what you were measuring, but now I wonder.

Do you have a script or something I can use to reproduce the start/stop events that you are using? I don't have an easy script that lets me launch a control connection and send AUTHENTICATE as soon as the cookie file is created, but I'm guessing you do?

comment:10 Changed 7 months ago by akwizgran

I'm not at work today, but I'll post the scripts and build instructions on Monday. I'm afraid my script is based on a headless version of Briar with the Tor binary embedded in the JAR, so it will take some setup, but Gradle should download everything you need except the JDK.

I re-ran the experiments last night, deleting the cache-* files before each run. That reduces the min startup time by a factor of 10 and seems to eliminate the difference between Tor versions, or at least push it down into the noise. I haven't analysed the results properly yet though; I'll post them on Monday.

comment:11 Changed 7 months ago by akwizgran

Startup times for Linux x64_64 with cache-* files deleted:

Tor version Min Max
0.2.9.160.0510.27
0.2.9.170.0720.14
0.3.0.1-alpha0.0581.02 (outlier)
0.3.1.1-alpha0.0670.20
0.3.2.1-alpha0.0440.17
0.3.3.1-alpha0.0550.39
0.3.4.1-alpha0.0530.30
0.3.4.80.0630.38
0.3.4.90.0560.54
0.3.5.5-alpha0.0480.14

The times are much faster than before, and it looks like the trend across versions may have disappeared, though it's hard to judge given the amount of noise.

comment:12 Changed 7 months ago by akwizgran

I've reworked my scripts to remove the Briar dependency; the results are still the same. I'm attaching the following files:

native.sh - bash script that starts Tor, authenticates to the control port and stops Tor, printing the time (seconds and nanoseconds since the epoch) at each step. Specify the location of the Tor binary (and dynamic libs if required) with TOR_DIRECTORY. Requires GNU sleep, hexdump, expect, nc.

run-native.sh - bash script that runs native.sh with binaries for various Tor versions, 5 runs per version. Specify the location of the binaries with TOR_BINARIES_DIRECTORY.

startup-time-native.pl - Perl script that extracts elapsed times from a log produced by run-native.sh. Feed the log to stdin.

I used statically linked binaries built with tor-reproducer. native.sh should also work with Tor Browser binaries, but you'll need to copy the dynamic libs into the cwd before running run-native.sh.

Changed 7 months ago by akwizgran

Attachment: native.sh added

Changed 7 months ago by akwizgran

Attachment: run-native.sh added

Changed 7 months ago by akwizgran

Attachment: startup-time-native.pl added

comment:13 Changed 6 months ago by gaba

Sponsor: Sponsor8-can

comment:14 Changed 6 months ago by nickm

Parent ID: #26630

comment:15 Changed 5 months ago by nickm

Milestone: Tor: 0.4.0.x-finalTor: unspecified
Owner: set to nickm
Status: newaccepted

How did we do here? We merged a ton of patches in 0.4.0.1-alpha, with another patch or two to come in 0.4.1.x

comment:16 Changed 5 months ago by akwizgran

0.4.0 is looking much better on Linux x64_64:

Tor version Min Max
0.2.9.160.320.51
0.3.4.90.901.02
0.3.5.80.570.60
0.4.0.2-alpha0.280.30

As before, the differences between versions disappear into the noise when cached-* files are deleted before each run:

Tor version Min Max
0.2.9.160.0510.13
0.3.4.90.110.31
0.3.5.80.0140.041
0.4.0.2-alpha0.0130.067

comment:17 Changed 5 months ago by akwizgran

Things aren't looking quite as good on Android. 0.4.0 performs much better than 0.3.4, but not as well as 0.2.9:

Tor version Min Max
0.2.9.163.25.1
0.3.4.816.720.3
0.3.5.813.814.1
0.4.0.2-alpha11.312.4

comment:18 Changed 4 weeks ago by nickm

Owner: nickm deleted
Status: acceptedassigned

These tickets are not things I'm currently working on. They may be important, but they don't need to be done by me specifically. Un-assigning.

Note: See TracTickets for help on using tickets.