I am still trying to figure out good steps to reproduce that bug but if you click on the gear icon and configure bridges then during bootstrap click again and change them or get back to start without bridges then you end up with a broken bootstrap process saying:
SUCCESS connected to Tor control port.Cookie Auth file not createdUnable to start Tor: java.io.Exception: Cookie Auth file not created: /data/user/0/org.torproject.torbrowser_alpha/app_torservice/lib/tor/control_auth_cookie, len = 0
There is usually no way to recover from that and one has to start over again by kiling the app.
That's with the TOPL changes landed.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
In the log see the SUCCESS connected to Tor control port. line and tap the gear icon
Don't configure anything but go back to the bootstrap panel and start again.
You get the exception and bootstrap is broken
Trac: Priority: High to Very High Summary: Tor Browser bootsrap process got stuck after changing bridges sometimes to Tor Browser bootsrap process got stuck after interrupting it
I've seen this message before when running the Java Tor client. It occurs (100% reproducible) when there is an existing tor control connection already running. I didn't think this condition would occur on Android.
First, I'll verify we aren't somehow starting Tor twice.
Second, I'll look into if old processes aren't getting cleaned up. In this case, we can takeownership and it should clean up automatically when the Android app closes. I have an open issue for this:
If there are two entries, then its not cleaning up.
The reason I suspect two processes are running on some devices is that a tor process will create a lock on some files while in use. A second tor process will block, unable to modify the file.
I think it will be easier to proceed with debugging a Java version where there is easily reproducible and test out the fix.
we can call the method OnionProxyManager.startWithRepeat so that if the first startup fails, we at least try one more time.
There is another thing that may be useful to look at. OrbotService tries to detect if tor process is already running (it will not attempt restart). I think we can maybe try something similar to at least log and abort if there is already a tor process running.
we don't have a timeout on the connection getting stuck during bootstrap. I'll look into timeout if the bootstrap 100% complete event isn't fired within x seconds. x should be longish since we won't know capabilities of device.
I followed the steps to reproduce. It takes me about a dozen times of configuring a new bridge and reconnecting to see the problem but it does show up.
pid 31900 is the currently starting (original) tor process. When I reconnected, I briefly see two new tor processes starting (not sure how we get two here): pid 32088 and 32092. Then a second later, those new processes die with the CookieAuth failure.
So we can see this is related to not cleaning up the existing tor process. I believe this can be handled as a check: before starting any tor process, cleanup old processes (either done in TOPL or tor-android-service).
I added the takeownership feature to TOPL. I verified with the Java client that the tor process is now being killed. I expect the same for Android but this will still need to be verified.
I went through the orbotservice code. It looks like it checks to see if the controlport file exists. If so it tries to connect to the tor control port. If the connection is successful, it does not attempt to restart Tor.
I'm going to implement something similar in TOPL. I'll add an additional check to make sure the pid of the calling app and the pid of the tor process match. I'll also add an additional step to reload the conf, just in case the user has made changes to the torrc file.
I can however, get it into a state where if I hit the 'connect' button prior to tor shutting down, it won't restart. The logs in the window correctly say that tor has shut down. If the user goes back in to the settings and back out and then hit connect, tor starts correctly. This is better than the original bug in the sense that the logs are correct and the user doesn't need to exist the app.
To fix this, we can handle this at the UI level, perhaps with a way to disable the connect button during a shutdown phase. We do have a STOPPING event in the code but maybe something is not syncing between the service and the UI. The other option would be to try to handle a start queue in the service itself, which queues up start events during a shutdown.
I also noticed that the logs say that two control ports are starting up. This still works since I added the behavior to handle reusing an existing tor process with a new control connection. But I'm wondering if maybe somewhere we are calling start twice and that this has something to do with the original problem of multiple tor processes starting. I'll need to go through some more investigation but this isn't a blocker anymore due to code changes in topl which handle multiple starts.
The default torrc file includes ControlPort auto. We also re-add another ControlPort auto when we save the config file. Having two entries will cause two control ports to open. It won't affect anything but I'll track this for a fix.
Okay, testing the 0505 branch I think the situation improved, so I picked this up for the 8.5 release. It seems we still have issues, though, right? (e.g. the one in comment:14) sisbell: Do you want to use this ticket for fixing those (as it seems the scenario in comment:14 is still impeding proper bootstrapping) or do you want to open a new one? (I am fine with either option)
Let's leave this issue open. After the latest fixes we still have an issue where if the user attempts to start tor during the middle of a shutdown request on the control connection, it will end in a shutdown state.