Opened 22 months ago

Last modified 5 months ago

#25096 assigned task

Bump up NumNTorsPerTAP to squeeze out v2 onion service traffic?

Reported by: arma Owned by:
Priority: Medium Milestone:
Component: Core Tor/DirAuth Version:
Severity: Normal Keywords:
Cc: asn Actual Points:
Parent ID: #25955 Points:
Reviewer: Sponsor:

Description

Right now the consensus is voting NumNTorsPerTAP=100, i.e. relays will handle one tap handshake for every 100 ntor handshakes they handle. We put this feature into place during the 2013 botnet overload (#9574).

TAP handshakes are used by obsolete clients (we don't know how many of these remain, but I think it might be quite few), and for v2 onion service clients reaching intro points, and for v2 onion services reaching rendezvous points.

With the recent overload that has to do with v2 onion services, the TAP frequency has gone up, e.g.

Jan 30 11:46:23.580 [notice] Circuit handshake stats since last time: 1350439/1350439 TAP, 68743431/68743431 NTor.
Jan 30 17:46:23.592 [notice] Circuit handshake stats since last time: 1183340/1183340 TAP, 71590118/71590118 NTor.
Jan 30 23:50:19.525 [notice] Circuit handshake stats since last time: 1069004/1069004 TAP, 72357977/72357977 NTor.

It's still low compared to the NTor frequency, but 1M TAP handshakes per 6 hours is 46 second per second to my relay.

(Also note that these log messages don't include stats from client connections, because we wanted to leave those out to be cautious about client privacy.)

The key realization here is that we can squeeze down v2 onion service usage, by squeezing down the prioritization for TAP handshakes.

Now, on my relay above, I'm able to handle all of both kinds, so changing the ratio will just change which cells get answered first -- and given that ntor cells are so much cheaper to answer than tap cells, there could be a moderate win there.

But for relays that can't handle the load, if they're similarly getting 1:70 ratios, we could potentially have a much bigger impact by cranking up the balance. If we got to the point where most of the ntors are handled and some of the taps are left unhandled, that seems like a fine balance.

So: good idea, bad idea? And if good idea, what's a good new number? 500? 1000?

Child Tickets

Change History (7)

comment:1 Changed 22 months ago by arma

Note that there are actually two parts to squeezing down the v2 onion service traffic with this approach: there's squeezing down the client circuits that are trying to reach the intro points (yay), and squeezing down the service circuits that are trying to reach the rend points (boo). I say yay for the first one because fewer introductions means less response traffic, and boo for the second one because by the time you're at that stage of the rendezvous, it sure would be best to just finish it.

But I don't know of an easy way to distinguish between the two, especially before we've processed the create cell, so I am willing to lump them together here.

comment:2 Changed 22 months ago by asn

I think we should probably not do any drastic measures here (e.g. increase it tenfold to 1k), before having some sort of evaluation mechanism to see if this actually helps relays. I'm also saying this because I'm sorta afraid that throttling rend circuits might backfire by having them timeout, or users retrying them, which could cause more traffic on the network.

I think pumping it to 200 might be a reasonable thing to do at this point tho.

A way to evaluate the effects of this change from the client-side could be a tool that connects to a few hundred relays with TAP/ntor and checks the delays and success rates.

comment:3 Changed 22 months ago by nickm

I'm fine with this if you want to do it. 200 is the highest I'd go right now; it might be more reasonable to test 150 and see what effect that has.

comment:4 Changed 22 months ago by dgoulet

Cc: dgoulet removed
Component: Core Tor/TorCore Tor/DirAuth
Milestone: Tor: 0.3.3.x-final
Owner: set to dgoulet
Status: newaccepted
Type: defecttask

Ok so this is a change in the dirauth-conf repository, nothing on the tor side afaict.

Usually with consensus parameter change like that, I like to inform the dirauth list about the rationale.

In the name of safety, what about ramping up to 200 by going through 150 before? And we'll see how that goes with our relay stats or if someone does hack a client to checks delays on v2 onion vs v3 onion? And we can progressively ramp up more and more if we see that it helps. There is also an argument here to try to be more loud publicly about v3 adoption considering that we start squeezing out v2...

If we have consensus, I can do the push and email.

comment:5 Changed 22 months ago by dgoulet

Status: acceptedneeds_review

comment:6 Changed 15 months ago by teor

Parent ID: #25955
Status: needs_reviewnew

I think this ticket is obsolete.

But we should reconsider it as we deprecate v2 onion services, so that they don't take down older tor relay versions.

comment:7 Changed 5 months ago by gaba

Owner: dgoulet deleted
Status: newassigned

Releasing some old tickets.

Note: See TracTickets for help on using tickets.