Opened 7 years ago

Last modified 2 years ago

#4580 new defect

Some Tor clients go nuts requesting the consensus if there is no recent enough consensus

Reported by: arma Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Major Keywords: tor-client, tor-dos zombies deprecation
Cc: ln5 Actual Points:
Parent ID: Points: 4.5
Reviewer: Sponsor:

Description

Yesterday we failed to make a consensus for whatever reason.

Soon after, we (the directory authorities) started getting bombed by begindir requests. I'm not sure what they were requesting, but early analysis shows they were requesting a consensus. Most of the requests sent their begindir cell but never got their actual HTTP request through.

When I noticed, moria1 was at 2+ gigs of ram. I did a kill -USR1 and it spent an hour dumping 195000 conns, and I killed it at that point. No idea how many conns it had total. They were almost all linked conns, e.g.:

Nov 25 02:34:45.000 [info] Conn 195295 (socket -1) type 5 (Exit), state 3 (open), created 145 secs ago
Nov 25 02:34:45.000 [info] Conn 195295 is to [x]:1.
Nov 25 02:34:45.000 [info] Conn 195295: 0 bytes waiting on inbuf (len 0, last read 145 secs ago)
Nov 25 02:34:45.000 [info] Conn 195295: 0 bytes waiting on outbuf (len 0, last written 145 secs ago)
Nov 25 02:34:45.000 [info] Conn 195295 has Exit-ward circuit: circID 0 (other side 46154), state 3 (open), born 1322201975:
Nov 25 02:34:45.000 [info] Conn 195296 (socket -1) type 9 (Directory), state 5 (waiting for command), created 145 secs ago
Nov 25 02:34:45.000 [info] Conn 195296 is to [x]:0.
Nov 25 02:34:45.000 [info] Conn 195296: 0 bytes waiting on inbuf (len 0, last read 145 secs ago)
Nov 25 02:34:45.000 [info] Conn 195296: 0 bytes waiting on outbuf (len 0, last written 145 secs ago)

For this particular [x], there were 16665 conns open because of it.

But only 27 of them were OR conns, e.g.:

Nov 25 01:26:05.000 [info] Conn 121 (socket 371) type 4 (OR), state 8 (open), created 704 secs ago
Nov 25 01:26:05.000 [info] Conn 121 is to [x]:62067.
Nov 25 01:26:05.000 [info] Conn 121: 496 bytes waiting on inbuf (len 32736, last read 10 secs ago)
Nov 25 01:26:05.000 [info] Conn 121: 0 bytes waiting on outbuf (len 0, last written 10 secs ago)
Nov 25 01:26:05.000 [info] Conn 121: 0/18437 bytes used on OpenSSL read buffer; 0/18698 bytes used on write buffer.
Nov 25 01:26:05.000 [info] Conn 121 has App-ward circuit: circID 36152 (other side 0), state 3 (open), born 1322201666:
Nov 25 01:26:35.000 [info] Conn 1530 (socket 163) type 4 (OR), state 8 (open), created 208 secs ago
Nov 25 01:26:35.000 [info] Conn 1530 is to [x]:62767.
Nov 25 01:26:35.000 [info] Conn 1530: 0 bytes waiting on inbuf (len 0, last read 10 secs ago)
Nov 25 01:26:35.000 [info] Conn 1530: 0 bytes waiting on outbuf (len 0, last written 95 secs ago)
Nov 25 01:26:35.000 [info] Conn 1530: 0/18437 bytes used on OpenSSL read buffer; 0/18698 bytes used on write buffer.
Nov 25 01:26:35.000 [info] Conn 1530 has App-ward circuit: circID 53174 (other side 0), state 3 (open), born 1322202160:
Nov 25 01:27:25.000 [info] Conn 3762 (socket 480) type 4 (OR), state 8 (open), created 408 secs ago
Nov 25 01:27:25.000 [info] Conn 3762 is to [x]:62152.
Nov 25 01:27:25.000 [info] Conn 3762: 432 bytes waiting on inbuf (len 32736, last read 10 secs ago)
Nov 25 01:27:25.000 [info] Conn 3762: 32768 bytes waiting on outbuf (len 36576, last written 10 secs ago)
Nov 25 01:27:25.000 [info] Conn 3762: 0/18437 bytes used on OpenSSL read buffer; 0/18698 bytes used on write buffer.
Nov 25 01:27:25.000 [info] Conn 3762 has App-ward circuit: circID 57857 (other side 0), state 3 (open), born 1322201977:
Nov 25 01:28:39.000 [info] Conn 7024 (socket 192) type 4 (OR), state 8 (open), created 107 secs ago
Nov 25 01:28:39.000 [info] Conn 7024 is to [x]:62866.
Nov 25 01:28:39.000 [info] Conn 7024: 0 bytes waiting on inbuf (len 0, last read 98 secs ago)
Nov 25 01:28:39.000 [info] Conn 7024: 0 bytes waiting on outbuf (len 0, last written 99 secs ago)
Nov 25 01:28:39.000 [info] Conn 7024: 0/18437 bytes used on OpenSSL read buffer; 0/18698 bytes used on write buffer.
Nov 25 01:32:14.000 [info] Conn 17016 (socket 417) type 4 (OR), state 8 (open), created 202 secs ago
Nov 25 01:32:14.000 [info] Conn 17016 is to [x]:62772.
Nov 25 01:32:14.000 [info] Conn 17016: 480 bytes waiting on inbuf (len 32736, last read 10 secs ago)
Nov 25 01:32:14.000 [info] Conn 17016: 32768 bytes waiting on outbuf (len 36576, last written 10 secs ago)
Nov 25 01:32:14.000 [info] Conn 17016: 0/18437 bytes used on OpenSSL read buffer; 0/18698 bytes used on write buffer.
Nov 25 01:32:14.000 [info] Conn 17016 has App-ward circuit: circID 40388 (other side 0), state 3 (open), born 1322202167:

So my first thought is that moria1 just can't keep up with hearing all these new requests, and also pumping bytes out to the requestor, and things just queue up, which uses ram, which makes seeks slower, lather rinse repeat.

Is there a Tor client version out there that asks for a consensus without checking if it has already asked for a consensus?

Child Tickets

Change History (30)

comment:1 Changed 7 years ago by ln5

Cc: ln5 added

comment:2 Changed 7 years ago by nickm

Milestone: Tor: 0.2.2.x-finalTor: 0.2.4.x-final

comment:3 Changed 7 years ago by nickm

Keywords: tor-client added

comment:4 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:5 Changed 6 years ago by arma

To be clear, this is not a "when the consensus is out of date, lots of clients ask for a new one" ticket. This is a "when the consensus is out of date, clients behave very rudely when asking for a new one" ticket.

comment:6 Changed 6 years ago by arma

#4872 raises a theory.

comment:7 Changed 6 years ago by nickm

The theory of #4872 looks a little more plausible than the one here, given the test I just did on #5506. I think the next step on this one is to try a chutney network with 3 DAs, let it bootstrap and run for a while, then kill 2 of the DAs and see what the clients and servers do.

If the 0.2.4 clients and servers I try this with first behave nicely, it's time to experiment with adding on 0.2.3 and 0.2.2 and (gulp) 0.2.1 and earlier clients/servers to see if they freak out and explode.

Of course, if anybody else can try this too it would save me some brain cells.

comment:8 Changed 6 years ago by nickm

I started this with some nodes running master, configured as per chutney's "networks/basic", except that I tweaked one of the clients to use regular router descriptors.

I let everybody bootstrap, then I killed 2/3 descriptors just now. Let's let it simmer for a while and see what happens. Hope I remember to check on it before it runs out of disk.

comment:9 Changed 6 years ago by nickm

err, I killed 2/3 authorities on the testing network, not "2/3 descriptors." Anyways, let's see if fun ensues.

comment:10 Changed 6 years ago by nickm

Still running. The 0.2.4 nodes on this network are well-behaved. Next I guess I should try a 0.2.2/0.2.4 network.

One screwy thing: We reset our "download status" once an hour, which means that instead of downloading once an hour, we decide once an hour that it's time to download as if we had never downloaded before. That's a bug. Opening #8625 for that.

comment:11 Changed 6 years ago by nickm

Okay. Now it's a mixed network with 0.2.4 authorities and a mixed ecosystem of 0.2.1 and 0.2.4 nodes and clients. I've let it bootstrap, then killed 2/3 authorities. Will the fun begin now? We'll see in ~24 hours I guess

comment:12 Changed 6 years ago by arma

No fun?

comment:13 Changed 6 years ago by nickm

No _crazy_ fun. Here's a few hours worth of requests from an old 0.2.1 client:

Apr 12 08:32:56.790 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 08:43:06.405 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 09:13:36.248 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 09:27:50.189 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 09:28:51.251 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 09:33:56.558 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 10:04:26.398 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 10:28:50.874 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 10:29:51.937 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 10:34:56.244 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 10:45:06.861 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 11:28:49.524 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 11:28:49.524 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Apr 12 11:29:50.587 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.

And here's what a newer 0.2.4 client did:

Apr 12 08:28:36.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 08:29:37.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 08:39:47.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 09:28:35.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 09:29:36.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 09:39:46.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 10:28:34.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 10:29:35.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 10:39:45.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.
Apr 12 11:28:33.000 [info] void update_consensus_networkstatus_downloads(): Launching microdesc networkstatus consensus download.

Now, could this be enough to explain the badness on its own? Is 3-5 requests per client per hour enough to DoS an authority? I'm not sure.

comment:14 Changed 6 years ago by nickm

Keywords: 024-backport added
Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

comment:15 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:16 Changed 4 years ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.8.x-final

comment:17 Changed 4 years ago by nickm

Points: medium/large

comment:18 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

Throw most 0.2.8 "NEW" tickets into 0.2.9. I expect that many of them will subsequently get triaged out.

comment:19 Changed 3 years ago by nickm

Keywords: tor-dos added
Severity: Normal
Sponsor: SponsorS-can

This speaks to the maintainability of the network and the liveness of the dirauths.

comment:20 Changed 3 years ago by isabela

Points: medium/large4.5

comment:21 Changed 3 years ago by andrea

Owner: set to andrea
Status: newassigned

Taking ownership for 0.2.9 triage

comment:22 Changed 3 years ago by nickm

Keywords: nickm-deferred-20160905 added
Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final

Deferring some andrea-assigned items from 0.2.9. Andrea, please move any of these back if you disagree; this is just a first approximation.

comment:23 Changed 2 years ago by dgoulet

Keywords: triage-out-030-201612 added
Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

Triaged out on December 2016 from 030 to 031.

comment:24 Changed 2 years ago by nickm

Sponsor: SponsorS-can

Clear sponsorS and sponsorU from open tickets in 0.3.1

comment:25 Changed 2 years ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: unspecified
Owner: andrea deleted

comment:26 Changed 2 years ago by nickm

Keywords: triage-out-030-201612 removed

comment:27 Changed 2 years ago by nickm

Keywords: nickm-deferred-20160905 removed

comment:28 Changed 2 years ago by nickm

Keywords: 024-backport removed

None of these is ripe for backport to 0.2.4 even if it does get fixed.

comment:29 Changed 2 years ago by nickm

Status: assignednew

Change the status of all assigned/accepted Tor tickets with owner="" to "new".

comment:30 Changed 2 years ago by nickm

Keywords: zombies deprecation added
Severity: NormalMajor
Note: See TracTickets for help on using tickets.