Opened 6 years ago

Last modified 3 weeks ago

#7707 needs_information defect

Impose a minimum write size for TLS writes

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay, performance, bwbug, bandwidth sponsor8-maybe tls nagle, 032-unreached, sponsor8-removed
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Reported pseudonymously:

If our TokenBucketRefillInterval is very low, we'll frequently wind up with very small writes, which can be exceptionally bad with TLS. One answer is to say "don't do that then" and keep TokenBucketRefillInterfal to about 100msec or so. Another answer is to nagle our TLS writes, and never write less than the full amount in the output buffer, or one cell, whichever is smaller.

For non-TLS writes, the kernel should nagle for us, so we're probably fine, though it might be sensible to impose a write threshold there too.

Child Tickets

Change History (27)

comment:1 Changed 6 years ago by nickm

Status: newneeds_review

I've got a tentative implementation in nagle_tls_writes.

I'm not sure whether it's a safe patch as it is, though: since we can always read/write on edge connections, I worry that merging this patch as-is will make TLS connections stall because of comparatively low global_*_write_bucket values. Should I be concerned?

comment:2 Changed 6 years ago by nickm

Status: needs_reviewneeds_revision

Also, before concluding this works, check out connection_bucket_round_robin and its callers and their callers. This isn't quite going to work here, I think.

comment:3 in reply to:  description Changed 6 years ago by cypherpunks

Replying to nickm:

For non-TLS writes, the kernel should nagle for us, so we're probably fine, though it might be sensible to impose a write threshold there too.

Nagle can't magically decrease latency of Tor network, so most of the times exit relays rapids by data over internet.
Many Web servers can wrongly identify such behavior like slow HTTP attack made by Slowloris software (https://en.wikipedia.org/wiki/Slowloris). Most effective way to stop attack is to drop connection. Everyone can see such defence right now if using TBB. Tor surfing becomes very pain full.

comment:4 Changed 6 years ago by cypherpunks

This bug is not only about minimum write size for TLS writes, (or) just because TokenBucketRefillInterval is very low.

It's general bug about small data values operated by Tor. If exit relay allowed to read 1 byte from edge connection then resulted overhead per data cell will be more than 500%. And you can't fix it by limiting minimum of plain text per TLS record, you can just prevent even more overhead percentage with such limits.

comment:5 in reply to:  4 Changed 6 years ago by cypherpunks

Replying to cypherpunks:

more than 500%.

Man, bad math.
Infinite overhead. For such case we should count userbase*500, effective load over Tor network is 250M of users in some extreme case. Congrats, major number of states surfing over Tor. It's win, math win. Just wonder it's still working.

comment:6 Changed 6 years ago by nickm

Ouch.

I've opened a new #7743 for the not-so-full cells issue. There are a couple of possible solutions there: one easy, one almost-easy.

comment:7 Changed 6 years ago by nickm

Keywords: bwbug added

comment:8 Changed 6 years ago by cypherpunks

Resolution: user disappeared
Status: needs_revisionclosed

comment:9 Changed 6 years ago by aagbsn

Resolution: user disappeared
Status: closedreopened

comment:10 Changed 6 years ago by aagbsn

Status: reopenedneeds_revision

comment:11 Changed 6 years ago by nickm

Before I apply any fix here, it seems like a good idea to add some instrumentation like we did for #7743 to see whether this is actually happening for anyone, and if so, by how much. The obvious fix would be to compare total bytes sent over the net for TLS connections vs total bytes sent on TLS connections to see what the overhead is.

comment:12 Changed 6 years ago by nickm

Status: needs_revisionneeds_review

There's a diagnostic branch in branch "bug7707_diagnostic". Let's apply it and see what the numbers look like in practice.

comment:13 Changed 6 years ago by andrea

The bug7707_diagnostic branch looks okay to me.

comment:14 Changed 6 years ago by nickm

Keywords: 024-backport added
Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final
Status: needs_reviewneeds_information

Merged it; deferring this to 0.2.5 with an 024-backport option and putting it in needs_information. Thanks!

comment:15 Changed 6 years ago by arma

What are we hoping to learn from the TLS write overhead stats? Some relay operators like Jens report stats like 13%. On moria5 run now it's 8%. On moria1 lately it's 4%.

It seems like 4% is a good number, but 13% is a high number.

Or to give a more useful question: what TLS write overhead do we expect if we're always putting each 512-byte cell in its own TLS record? Which SSL's do the "empty record" trick still, and I assume that adds to this per-cell overhead?

Does the TLS overhead we're counting here mush together both the overhead from normal TLS records and also the overhead from TLS handshakes?

And finally, once we know what overhead percentage range we're hoping for, we should ask relay operators to tell us if they see anything out of that range -- or even turn that into an LD_BUG or something.

comment:16 Changed 6 years ago by nickm

Hm. Assuming recent TLS versions (to ignore the empty record trick for now), we're looking at a 20 byte MAC, a 16 byte IV, 1-16 bytes of padding, and 5-10 bytes of headers if I'm reading this format right. That comes to something like 40-50 bytes of overhead per record, which makes it non-crazy to have ~10% overhead.

Somebody else should check my math and spec-fu.

comment:17 Changed 6 years ago by nickm

But just because 10% overhead is non-crazy, doesn't mean it's *good*. Trying to flush a couple cells at once when flushing cells is probably sensible.

comment:18 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:19 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:20 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:21 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:22 Changed 2 years ago by nickm

Keywords: 024-backport removed

None of these is ripe for backport to 0.2.4 even if it does get fixed.

comment:23 Changed 2 years ago by nickm

Keywords: bandwidth sponsor8-maybe tls nagle added
Severity: Normal

comment:24 Changed 23 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.2.x-final
Sponsor: Sponsor8-can

comment:25 Changed 20 months ago by nickm

Keywords: 032-unreached added
Milestone: Tor: 0.3.2.x-finalTor: unspecified

comment:26 Changed 4 months ago by gaba

Keywords: sponsor8-removed added
Sponsor: Sponsor8-can

comment:27 Changed 3 weeks ago by cypherpunks

nagling is not good for low latency OR interactive streams

Note: See TracTickets for help on using tickets.