Opened 6 years ago

Closed 4 years ago

Last modified 3 years ago

#8902 closed enhancement (fixed)

Rumors that hidden services have trouble scaling to 100 concurrent connections

Reported by: arma Owned by:
Priority: Medium Milestone:
Component: Core Tor/Tor Version: Tor: 0.2.7
Severity: Normal Keywords: tor-hs
Cc: asn, griffin@… Actual Points:
Parent ID: Points: large
Reviewer: Sponsor: SponsorR

Description

tomaw from freenode/oftc tells us the freenode hidden service doesn't work well once there are 100 users on it.

This is a great example of something a high-coverage perhaps-chutney-based test network could test (and then regression-test).

Child Tickets

Attachments (1)

torhs-pyloris-nov9.tgz (8.0 KB) - added by anon 5 years ago.
Hidden service oriented PyLoris script for performance testing

Download all attachments as: .zip

Change History (28)

comment:1 Changed 6 years ago by nickm

I'd also love to see some oprofile/perf output from a seriuosly loaded hidden service.

comment:2 Changed 6 years ago by tomaw

A couple of log errors that may prove useful:
Jan 09 04:18:46.000 [warn] Giving up launching first hop of circuit to rendezvous point [scrubbed] for service p4fsi4ockecnea7l.
Jan 09 04:18:45.000 [warn] Couldn't relaunch rendezvous circuit to '[scrubbed]'.

There are lots of both of this in the logs, in blocks of each.

comment:3 Changed 6 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:4 Changed 5 years ago by asn

Cc: asn added

comment:5 Changed 5 years ago by saint

Cc: griffin@… added

comment:6 Changed 5 years ago by arma

Keywords: SponsorR added

comment:7 Changed 5 years ago by anon

can https://facebookcorewwwi.onion/ provide some insight? they are setting "Connecion: keep-alive" in responses to clients, which should make their perspective more useful.

Last edited 5 years ago by anon (previous) (diff)

Changed 5 years ago by anon

Attachment: torhs-pyloris-nov9.tgz added

Hidden service oriented PyLoris script for performance testing

comment:8 Changed 5 years ago by anon

I have attached a hidden service friendly PyLoris script with changes to handle .onion resolution properly and a "sendimmediate" option to test the baseline case of requests immediately made by client, instead of client trickling out request bytes to server slowly.

The torhs-loris.py must be modified as indicated at the top of the file. Then run it!

Reproducible scenarios of connection failures or Tor crashes would be useful.

comment:9 Changed 5 years ago by anon

It would be nice if these changes could be integrated back into upstream; also, the tor circuit switcher may be useful to manually manage circuits if various SOCKS isolation options are insufficient for the desired behavior.

Finally, note that the random padding option in PyLoris by default adds a cookie, rather than altering the request URI as desired for a particular unnamed use case. Some proxies will "be smart" at first CR/LF and trigger responses here, thus curtailing the ability to send a slow, large, random cookie.

comment:10 Changed 5 years ago by dgoulet

I've collected profiling data with perf and here is the result I have.

My use case is simple, I have an IRC server behind an HS on a remote far away server from me. I use torsocks with irctorture (git://git.breakpoint.cc/fw/irctorture.git) and hammer the server. Each torsocks connection uses a user/password different thus putting every irctorture client on its own circuit. I spawned 100 irctorture tests for a duration of 10 minutes. I did multiple runs and the perf data shown below is consistant at each run.

You can find the perf data of the top 5 calls:

https://people.torproject.org/~dgoulet/tor-hs-perf-100-circ.png

You can't see it on the pic but the next call is this which I thought was quite important to provide:

-   3.85%  tor  [.] compute_weighted_bandwidths                                                                                                                                                                                                          ▒
   - compute_weighted_bandwidths                                                                                                                                                                                                                         ▒
      - 96.97% node_sl_choose_by_bandwidth                                                                                                                                                                                                               ▒
           router_choose_random_node                                                                                                                                                                                                                     ▒
           circuit_establish_circuit                                                                                                                                                                                                                     ◆
         - circuit_launch_by_extend_info                                                                                                                                                                                                                 ▒
            - 79.17% rend_service_introduce                                                                                                                                                                                                              ▒
                 rend_process_relay_cell                                                                                                                                                                                                                 ▒
                 connection_edge_process_relay_cell                                                                                                                                                                                                      ▒
                 circuit_receive_relay_cell                                                                                                                                                                                                              ▒
                 command_process_cell                                                                                                                                                                                                                    ▒
                 channel_tls_handle_cell                                                                                                                                                                                                                 ▒
                 connection_or_process_cells_from_inbuf                                                                                                                                                                                                  ▒
                 connection_handle_read_impl                                                                                                                                                                                                             ▒
                 conn_read_callback                                                                                                                                                                                                                      ▒
                 event_base_loop                                                                                                                                                                                                                         ▒
                 do_main_loop                                                                                                                                                                                                                            ▒
                 tor_main                                                                                                                                                                                                                                ▒
                 __libc_start_main                                                                                                                                                                                                                       ▒
                 _start                                                                                                         

Clearly seems like most of our time is in rend_service_introduce(). Also, node_get_prim_orport() using 13.60% of the CPU!, I feel like there is maybe something here that we can investiguate for improvement.

Note that the HS after a while prints these so it is loaded but also seems limited by the guard.

Nov 08 06:51:37.000 [warn] Your Guard [scrubbed] is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 122/244. Use counts are 60/60. 212 circuits completed, 0 were unusable, 90 collapsed, and 5 timed out. For reference, your timeout cutoff is 60 seconds.

comment:11 Changed 5 years ago by nickm

Oh interesting. Can we make a new ticket for reducing the numbers of those particular calls?

And can you upload a more complete (compressed version) of your perf output somewhere?

I'm a little confused to see that rend_service_introduce is so high, but the crypto isn't that high. It looks like we're spending most of our time picking out nodes to build circuits, and not actually in DH/RSA. That's strange!

comment:12 Changed 5 years ago by dgoulet

So for the optimization: #13739

Also, on the idea of bringing that HS crypto to worker: #13738

The crypto is actually the highest here at 19.81%. I'll try to make available that data somewhere so it's readable by anyone. I might need to extract a lot of debug data especially for the kernel but I'll see what's possible and update back that ticket.

comment:13 Changed 5 years ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.7.x-final

These may be worth looking at for 0.2.7.

comment:14 Changed 5 years ago by nickm

Status: newassigned

comment:15 Changed 4 years ago by nickm

Keywords: 027-triaged-1-in added

Marking some tickets as triaged-in for 0.2.7 based on early triage

comment:16 Changed 4 years ago by isabela

Keywords: SponsorU added
Points: unclear
Version: Tor: 0.2.7

comment:17 Changed 4 years ago by teor

I am working on the suggested chutney-based network, including performance testing in #14175 and configurable sent data size in #14174.

When I run with 60 clients and 1 hidden service, the hidden service is very janky - it hangs repeatedly for several seconds while transmitting data.
When I run with 1 client and 60 hidden services, the data transmission is smooth.

I expect this means that hidden services with high numbers of clients experience a lot of contention and blocking behaviour.

I should have branches with these changes to chutney ready over the next few days.

comment:18 Changed 4 years ago by teor

dgoulet and I spoke about this on IRC last week.

I can't seem to replicate the exact behavior described in the logs above. But I can bring down the hidden service ("the hidden service is not available") with 60 client connections, repeatedly connecting and sending a megabyte of data. (The chutney test sends random data from client to exit/hs and verifies it reaches the other end intact). Unfortunately, 60 tor instances is close to the capacity of my machine, and it tends to bring down applications, essential system services, or networking. So that makes it hard to confirm that the HS is going down because of the clients, and not just general overload on the machine.

One of these days, I'll set up a virtual machine or VPS, but until then, I'll try and get these chutney changes into a state where I can release them.

comment:19 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:20 Changed 4 years ago by nickm

Keywords: SponsorU removed
Sponsor: SponsorU

Bulk-replace SponsorU keyword with SponsorU field.

comment:21 Changed 4 years ago by dgoulet

Keywords: SponsorR removed
Sponsor: SponsorUSponsorR

comment:22 Changed 4 years ago by dgoulet

Keywords: 027-triaged-1-in removed
Milestone: Tor: 0.2.8.x-finalTor: 0.2.???
Points: unclearlarge
Type: defectenhancement

comment:23 Changed 4 years ago by nickm

Severity: Normal

with our scalability work this past year, can we call this one done?

comment:24 Changed 4 years ago by saint

I've tried to replicate this several times, and have never been able to (using either a standard onion service or onionshare). Like teor, to make my service "go down" meant effectively maxing out the resources on the machine. Reports seemed to be intermittent at best. Given the Facebook move to .onion, this bug looks to be squashed.

comment:25 Changed 4 years ago by nickm

Resolution: fixed
Status: assignedclosed

great. Let's reopen if we reproduce it.

comment:26 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:27 Changed 3 years ago by nickm

Milestone: Tor: 0.3.???

Milestone deleted

Note: See TracTickets for help on using tickets.