Opened 19 months ago

Last modified 12 months ago

#30162 new defect

Tor Browser bootstrap process got stuck after interrupting it

Reported by: gk Owned by: tbb-team
Priority: Medium Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: tbb-mobile, tbb-8.5, TorBrowserTeam201905
Cc: igt0, sisbell, sysrqb, hans@…, n8fr8 Actual Points:
Parent ID: Points: 0.5
Reviewer: Sponsor:


I am still trying to figure out good steps to reproduce that bug but if you click on the gear icon and configure bridges then during bootstrap click again and change them or get back to start without bridges then you end up with a broken bootstrap process saying:

SUCCESS connected to Tor control port.
Cookie Auth file not created
Unable to start Tor: Cookie Auth file not created: /data/user/0/org.torproject.torbrowser_alpha/app_torservice/lib/tor/control_auth_cookie, len = 0

There is usually no way to recover from that and one has to start over again by kiling the app.

That's with the TOPL changes landed.

Child Tickets

Change History (23)

comment:1 Changed 19 months ago by gk

Priority: HighVery High
Summary: Tor Browser bootsrap process got stuck after changing bridges sometimesTor Browser bootsrap process got stuck after interrupting it

Okay, here are steps that work for me:

1) Open TBA and tap the start button
2) In the log see the SUCCESS connected to Tor control port. line and tap the gear icon
3) Don't configure anything but go back to the bootstrap panel and start again.
4) You get the exception and bootstrap is broken

comment:2 Changed 19 months ago by gk

Summary: Tor Browser bootsrap process got stuck after interrupting itTor Browser bootstrap process got stuck after interrupting it

comment:3 Changed 19 months ago by sisbell

I've seen this message before when running the Java Tor client. It occurs (100% reproducible) when there is an existing tor control connection already running. I didn't think this condition would occur on Android.

First, I'll verify we aren't somehow starting Tor twice.

Second, I'll look into if old processes aren't getting cleaned up. In this case, we can takeownership and it should clean up automatically when the Android app closes. I have an open issue for this:

comment:4 Changed 18 months ago by sisbell

I've been unable to reproduce a tor process not getting cleaned up

We can detect multiple processes running by running

adb shell ps -ef | grep libTor

This will display something like

u0_a236      26768     1 95 07:53:19 ?    00:00:20 -f /data/user/0/org.torproject.torbrowser_alpha/app_torservice/torrc __OwningControllerProcess 26167

If there are two entries, then its not cleaning up.

The reason I suspect two processes are running on some devices is that a tor process will create a lock on some files while in use. A second tor process will block, unable to modify the file.

I think it will be easier to proceed with debugging a Java version where there is easily reproducible and test out the fix.

comment:5 Changed 18 months ago by sisbell

A few comments:

1) we can call the method OnionProxyManager.startWithRepeat so that if the first startup fails, we at least try one more time.

2) There is another thing that may be useful to look at. OrbotService tries to detect if tor process is already running (it will not attempt restart). I think we can maybe try something similar to at least log and abort if there is already a tor process running.

3) we don't have a timeout on the connection getting stuck during bootstrap. I'll look into timeout if the bootstrap 100% complete event isn't fired within x seconds. x should be longish since we won't know capabilities of device.

comment:6 Changed 18 months ago by sisbell

Opened 2 new issues for TOPL project to address boostrap

Detect and log if another tor process is running

Timeout for Bootstrap

comment:7 Changed 18 months ago by sisbell

I followed the steps to reproduce. It takes me about a dozen times of configuring a new bridge and reconnecting to see the problem but it does show up.

pid 31900 is the currently starting (original) tor process. When I reconnected, I briefly see two new tor processes starting (not sure how we get two here): pid 32088 and 32092. Then a second later, those new processes die with the CookieAuth failure.

adb shell ps -ef | grep libTor
u0_a236      31900     1 11 09:52:50 ?    00:00:18 -f /data/user/0/org.torproject.torbrowser_alpha/app_torservice/torrc __OwningControllerProcess 31391
u0_a236      32088 31391 2 09:55:41 ?     00:00:00 -f /data/user/0/org.torproject.torbrowser_alpha/app_torservice/torrc __OwningControllerProcess 31391
u0_a236      32091 32088 0 09:55:41 ?     00:00:00 []
u0_a236      32092     1 0 09:55:41 ?     00:00:00 -f /data/user/0/org.torproject.torbrowser_alpha/app_torservice/torrc __OwningControllerProcess 31391

adb shell ps -ef | grep libTor
u0_a236      31900     1 10 09:52:50 ?    00:00:18 -f /data/user/0/org.torproject.torbrowser_alpha/app_torservice/torrc __OwningControllerProcess 31391

So we can see this is related to not cleaning up the existing tor process. I believe this can be handled as a check: before starting any tor process, cleanup old processes (either done in TOPL or tor-android-service).

comment:8 Changed 18 months ago by sisbell

Status: newneeds_review

I added the takeownership feature to TOPL. I verified with the Java client that the tor process is now being killed. I expect the same for Android but this will still need to be verified.

comment:9 Changed 18 months ago by gk

Keywords: TorBrowserTeam201904R added; TorBrowserTeam201904 removed

comment:10 Changed 18 months ago by sisbell

I went through the orbotservice code. It looks like it checks to see if the controlport file exists. If so it tries to connect to the tor control port. If the connection is successful, it does not attempt to restart Tor.

I'm going to implement something similar in TOPL. I'll add an additional check to make sure the pid of the calling app and the pid of the tor process match. I'll also add an additional step to reload the conf, just in case the user has made changes to the torrc file.

comment:11 Changed 18 months ago by gk

Keywords: TorBrowserTeam201904 added; TorBrowserTeam201904R removed
Status: needs_reviewnew

Thanks, sounds good. Let's wait with the review then.

comment:12 Changed 18 months ago by sisbell

Fixed in branch 0505


After fixes, I am unable to reproduce this bug.

I can however, get it into a state where if I hit the 'connect' button prior to tor shutting down, it won't restart. The logs in the window correctly say that tor has shut down. If the user goes back in to the settings and back out and then hit connect, tor starts correctly. This is better than the original bug in the sense that the logs are correct and the user doesn't need to exist the app.

To fix this, we can handle this at the UI level, perhaps with a way to disable the connect button during a shutdown phase. We do have a STOPPING event in the code but maybe something is not syncing between the service and the UI. The other option would be to try to handle a start queue in the service itself, which queues up start events during a shutdown.

I also noticed that the logs say that two control ports are starting up. This still works since I added the behavior to handle reusing an existing tor process with a new control connection.  But I'm wondering if maybe somewhere we are calling start twice and that this has something to do with the original problem of multiple tor processes starting. I'll need to go through some more investigation but this isn't a blocker anymore due to code changes in topl which handle multiple starts.

comment:13 Changed 18 months ago by sisbell

The default torrc file includes ControlPort auto. We also re-add another ControlPort auto when we save the config file. Having two entries will cause two control ports to open. It won't affect anything but I'll track this for a fix.

comment:14 Changed 18 months ago by sisbell

The behavior that causes problem on restart (from branch 0505)

  1. Tor starts bootstrap
  2. User clicks icon and exits settings screen
  3. App tells TorService to shutdown
  4. TorService sends TERM to tor process and begins shutdown
  5. User clicks connect
  6. TorService reuses tor process (since it hasn't finished shutdown) and opens new connection
  7. TorService reloads, causing second tor bootstrap (the shutdown is also still in process)
  8. Tor shuts down and the connect button is not reset

comment:15 Changed 18 months ago by gk

Keywords: TorBrowserTeam201905R added; TorBrowserTeam201904 removed

comment:16 Changed 18 months ago by gk

Keywords: TorBrowserTeam201905 added; TorBrowserTeam201905R removed

Removing from review as this is bound to patches #30166 which changes state as well.

comment:17 Changed 18 months ago by gk

Status: newneeds_information

Okay, testing the 0505 branch I think the situation improved, so I picked this up for the 8.5 release. It seems we still have issues, though, right? (e.g. the one in comment:14) sisbell: Do you want to use this ticket for fixing those (as it seems the scenario in comment:14 is still impeding proper bootstrapping) or do you want to open a new one? (I am fine with either option)

comment:18 Changed 17 months ago by sisbell

Let's leave this issue open. After the latest fixes we still have an issue where if the user attempts to start tor during the middle of a shutdown request on the control connection, it will end in a shutdown state.

comment:19 Changed 17 months ago by gk

Keywords: tbb-8.5 added; tbb-8.5-must removed
Priority: Very HighMedium
Status: needs_informationnew

comment:20 Changed 17 months ago by gk

Parent ID: #27609

comment:21 Changed 12 months ago by sysrqb

Points: 0.5

comment:22 Changed 12 months ago by sisbell

I'm not sure this is work fixing right now, given the complexity of these race conditions. Once we use JNI for tor (in place of launching processes), this issue will go away.

comment:23 Changed 12 months ago by eighthave

I totally agree with sisbell. Now that I have the full stack prototype working for the new native Android TorService, it is clear to me that we all should be moving away from daemons as fast as possible. Managing them is really brittle and time consuming for devs.

For example, TorService right now uses a UNIX socket for the ControlSocket then uses Linux inotify to watch for the ControlSocket and control_auth_cookie. When those are present, then it is clear that tor has started. No unpacking assets, no timeouts, no port conflicts, etc.

Note: See TracTickets for help on using tickets.