Opened 6 years ago

Closed 4 years ago

#9659 closed defect (fixed)

Optimistic Data SOCKS variant (patch for #3875) leads to loop on HTTP requests if no SOCKS response yet

Reported by: cypherpunks Owned by: tbb-team
Priority: Medium Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: tbb-usability, tbb-crash-hang, tbb-firefox-patch, GeorgKoppen201511R, TorBrowserTeam201512R
Cc: gk, starlight@…, brade, mcs Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Some change to tbb-firefox.exe after TBB for windows 2.3.25-6 has introduced a bad loop which throttles the CPU on every request sent through the proxy. This will go unnoticed if you are not using all of your CPU at the time. Running something like prime95 (www.mersenne.org) in the background will make the problem evident, as TBB versions after 2.3.25-6 start to hang the system as you click open links or have other traffic through the proxy while waiting for their connections to be made and complete. Something changed in how requests were handled in later versions which introduced the problem.

In the old versions, when you sent a request for a web page (lets say www.google.com), you could have the tor network map open with the view the network button and watch how it works. The browser status bar will say connecting to www.google.com and it will say that until a circuit is open in the network map (www.google.com appears in the list), at which point the browser status bar will change to say connected to www.google.com. It then sends the request, and the status bar changes to waiting for www.google.com, then the page loads. I can watch this happen in task manager and it uses hardly any CPU.

In the new versions, this behavior is different. When you request a web page, the browser immediately says waiting for www.google.com, even though there is no circuit open in tor, and this is where the bad loop is, as watching tbb-firefox.exe in task manager the CPU usage becomes abnormally high. This will cause the system to hang, becoming temporarily unresponsive until the loops finish, if the rest of the system is under heavy cpu load.

Child Tickets

Attachments (1)

socks5.py (62 bytes) - added by cypherpunks 4 years ago.
socks5 proxy, yes I'm lame to write snippets even.

Download all attachments as: .zip

Change History (35)

comment:1 Changed 6 years ago by cypherpunks

Tor Browser Bundle (2.3.25-7); suite=windows
* Firefox patch changes:
     - Apply font limits to @font-face local() fonts and disable fallback
       rendering for @font-face. (closes: #8455)
     - Use Optimistic Data SOCKS handshake (improves page load performance).
       (closes: #3875)
     - Honor the Windows theme for inverse text colors (without leaking those
       colors to content). (closes: #7920)
     - Increase pipeline randomization and try harder to batch pipelined
       requests together. (closes: #8470)
     - Fix an image cache isolation domain key misusage. May fix several image
       cache related crash bugs with New Identity, exit, and certain websites.
       (closes: #8628)

comment:2 Changed 6 years ago by gk

Cc: g.koppen@… added

comment:3 Changed 6 years ago by cypherpunks

Simple way to reproduce it, to use socks5 proxy server that behaves like normal socks5 proxy but never answer 'success' like it does Tor while waits circuit, and attaching stream.

Used simple socks5 proxy that written in Python, found in code google projects (https://code.google.com/p/python-socks5/). Simplified even more and modified it so it never connects to anything. Code license marked as GNU GPL v3, so I hope nothing violates with it. If any troubles then delete it.

It binds on 127.0.0.1:1080 and behave like socks5 proxy. You need to change proxy settings in Torbutton's preferences for test.

Attaching modified code.

comment:4 Changed 6 years ago by arma

I talked to skruffy on irc and he said this was due to the Optimistic data patch.

comment:5 Changed 5 years ago by cypherpunks

This bug optimistically burns my hardware. Why such bugs implemented without chance to turn it off?

comment:6 Changed 5 years ago by cypherpunks

Can you revert this patch? Most of sites supported by https-everywhere addon. 1.5 of plain http sites can't reduce user's experience but can to destroy cheap hardware with such optimistic cpu eater.

comment:7 Changed 5 years ago by cypherpunks

Priority: normalmajor

This is most bad to .onion URLs. Same bug? Lots of CPU waste while any .onion load.

comment:8 Changed 5 years ago by mikeperry

Keywords: tbb-usability tbb-crash-hang added; hang unresponsive throttle removed

comment:9 Changed 5 years ago by gk

Cc: gk added; g.koppen@… removed
Milestone: TorBrowserBundle 2.3.x-stable
Version: Tor: 0.2.3.25

comment:10 Changed 5 years ago by erinn

Keywords: tbb-firefox-patch added

comment:11 Changed 5 years ago by erinn

Component: Firefox Patch IssuesTor Browser
Owner: changed from mikeperry to tbb-team

comment:12 Changed 4 years ago by cypherpunks

Can you make #3875 configurable at least. Such option will split userbase but will save some hardware at least.

comment:13 Changed 4 years ago by cypherpunks

I wonder if everyone in world using lab networks. Have you tried to use real world network connections?

Changed 4 years ago by cypherpunks

Attachment: socks5.py added

socks5 proxy, yes I'm lame to write snippets even.

comment:14 Changed 4 years ago by cypherpunks

Summary: high cpu usage from tbb-firefox.exe on web requestshigh cpu usage on web requests

comment:15 Changed 4 years ago by cypherpunks

Summary: high cpu usage on web requestsOptimistic Data SOCKS variant (patch for #3875) leads to high cpu usage on web requests if no success response yet

comment:16 Changed 4 years ago by cypherpunks

Summary: Optimistic Data SOCKS variant (patch for #3875) leads to high cpu usage on web requests if no success response yetOptimistic Data SOCKS variant (patch for #3875) leads to high cpu usage on web requests if no HTTP response yet

comment:17 Changed 4 years ago by cypherpunks

Summary: Optimistic Data SOCKS variant (patch for #3875) leads to high cpu usage on web requests if no HTTP response yetOptimistic Data SOCKS variant (patch for #3875) leads to high cpu usage on HTTP requests if no SOCKS response yet

comment:18 Changed 4 years ago by cypherpunks

Summary: Optimistic Data SOCKS variant (patch for #3875) leads to high cpu usage on HTTP requests if no SOCKS response yetOptimistic Data SOCKS variant (patch for #3875) leads to loop on HTTP requests if no SOCKS response yet

comment:19 Changed 4 years ago by cypherpunks

Priority: majornormal
Status: newneeds_review
--- nsSocketTransport2.cpp.orig
+++ nsSocketTransport2.cpp
@@ -1868,10 +1868,18 @@
             // If the connect is still not ready, then continue polling...
             //
             if ((PR_WOULD_BLOCK_ERROR == code) || (PR_IN_PROGRESS_ERROR == code)) {
-                // Set up the select flags for connect...
-                mPollFlags = (PR_POLL_EXCEPT | PR_POLL_WRITE);
-                // Update poll timeout in case it was changed
-                mPollTimeout = mTimeouts[TIMEOUT_CONNECT];
+                if (mState == STATE_SENTGET) { // stuff sent, nothing to write
+                    // Set up the select flags for response reading...
+                    mPollFlags = (PR_POLL_EXCEPT | PR_POLL_READ);
+                    // Update poll timeout in case it was changed
+                    mPollTimeout = mTimeouts[TIMEOUT_READ_WRITE];
+                }
+                else {
+                    // Set up the select flags for connect...
+                    mPollFlags = (PR_POLL_EXCEPT | PR_POLL_WRITE);
+                    // Update poll timeout in case it was changed
+                    mPollTimeout = mTimeouts[TIMEOUT_CONNECT];
+                }
             }
             //
             // The SOCKS proxy rejected our request. Find out why.

What do you think?

comment:20 Changed 4 years ago by cypherpunks

Or simply next one?

--- nsSocketTransport2.cpp.orig
+++ nsSocketTransport2.cpp
@@ -1844,6 +1844,7 @@
             mOutput.OnSocketReady(NS_OK);
         }
         mPollTimeout = mTimeouts[TIMEOUT_READ_WRITE];
+        mPollFlags = (PR_POLL_EXCEPT | PR_POLL_READ);
         mState = STATE_SENTGET;
     }

comment:21 Changed 4 years ago by gk

Keywords: TorBrowserTeam201509R GeorgKoppen201509R added

I think I've seen this, too. I'll take a look at it. We'll see if I can squeeze this into September.

comment:22 Changed 4 years ago by starlight

Cc: starlight@… added

comment:23 Changed 4 years ago by gk

Moving needs_review tickets to October 2015.

comment:24 Changed 4 years ago by gk

Keywords: TorBrowserTeam201510R added; TorBrowserTeam201509R removed

Batch modification for realz now.

comment:25 Changed 4 years ago by gk

Keywords: GeorgKoppen201510 added

Moving my tickets to October 2015

comment:26 Changed 4 years ago by gk

Keywords: GeorgKoppen201510R added; GeorgKoppen201509R GeorgKoppen201510 removed

comment:27 Changed 4 years ago by mcs

Cc: brade mcs added

comment:28 Changed 4 years ago by gk

Keywords: TorBrowserTeam201511R added; TorBrowserTeam201510R removed

comment:29 Changed 4 years ago by mcs

Severity: Normal

Kathy and I reviewed the patch from comment:20 and in general tried to understand the original patch and what might be going wrong. We are not sure that we reproduced the problem that other people experienced, but on Mac OS with nsSocketTransport logging enabled we did observe a lot of spinning.

We created a revised patch that includes improved comments and also avoids modifying the behavior of the socket state machine when SOCKS is not being used (with the patch from comment:20 alone, we observed an assertion failure inside the NSPR code for a non-SOCKS connection). It would be good to have another review:
https://gitweb.torproject.org/user/brade/tor-browser.git/commit/?h=bug9659-01&id=2cbc406dad55db718bd8a27db4abbd10c0f84bda

comment:30 Changed 4 years ago by gk

Keywords: GeorgKoppen201511R added; GeorgKoppen201510R removed

comment:31 in reply to:  29 ; Changed 4 years ago by cypherpunks

Replying to mcs:

It would be good to have another review:

It still need to handle case of partial socks answer to be generally complete, else code re-starts writing.

            //
            // If the connect is still not ready, then continue polling...
            //
            if ((PR_WOULD_BLOCK_ERROR == code) || (PR_IN_PROGRESS_ERROR == code)) {
                // Set up the select flags for connect...
                mPollFlags = (PR_POLL_EXCEPT | PR_POLL_WRITE);
                // Update poll timeout in case it was changed
                mPollTimeout = mTimeouts[TIMEOUT_CONNECT];
            }

comment:32 in reply to:  31 Changed 4 years ago by mcs

Replying to cypherpunks:

It still need to handle case of partial socks answer to be generally complete, else code re-starts writing.
...

Thanks. Kathy and I think it is enough to skip resetting mPollFlags in that case. Here is a revised patch:
https://gitweb.torproject.org/user/brade/tor-browser.git/commit/?h=bug9659-02&id=7ee43c7281e03636b2bab5384ce77073fd6a0aab

comment:33 Changed 4 years ago by mikeperry

Keywords: TorBrowserTeam201512R added; TorBrowserTeam201511R removed

comment:34 Changed 4 years ago by gk

Resolution: fixed
Status: needs_reviewclosed

Thanks, cherry-picked onto tor-browser-38.4.0esr-5.5-1 (commit bbd7c24a1019ecfac43aeaac959143b575cca07f).

Note: See TracTickets for help on using tickets.