Impose a minimum write size for TLS writes

changed milestone to %Tor: unspecified

added 032-unreached bandwidth bwbug component::core tor/tor milestone::Tor: unspecified nagle performance priority::medium severity::normal sponsor8-maybe sponsor8-removed status::needs-information tls tor-relay type::defect labels

I've got a tentative implementation in nagle_tls_writes.

I'm not sure whether it's a safe patch as it is, though: since we can always read/write on edge connections, I worry that merging this patch as-is will make TLS connections stall because of comparatively low global_*_write_bucket values. Should I be concerned?

Trac:
Status: new to needs_review

Also, before concluding this works, check out connection_bucket_round_robin and its callers and their callers. This isn't quite going to work here, I think.

Trac:
Status: needs_review to needs_revision

Replying to nickm:

For non-TLS writes, the kernel should nagle for us, so we're probably fine, though it might be sensible to impose a write threshold there too.

Nagle can't magically decrease latency of Tor network, so most of the times exit relays rapids by data over internet. Many Web servers can wrongly identify such behavior like slow HTTP attack made by Slowloris software (https://en.wikipedia.org/wiki/Slowloris). Most effective way to stop attack is to drop connection. Everyone can see such defence right now if using TBB. Tor surfing becomes very pain full.

This bug is not only about minimum write size for TLS writes, (or) just because TokenBucketRefillInterval is very low.

It's general bug about small data values operated by Tor. If exit relay allowed to read 1 byte from edge connection then resulted overhead per data cell will be more than 500%. And you can't fix it by limiting minimum of plain text per TLS record, you can just prevent even more overhead percentage with such limits.

Replying to cypherpunks:

more than 500%.

Man, bad math. Infinite overhead. For such case we should count userbase*500, effective load over Tor network is 250M of users in some extreme case. Congrats, major number of states surfing over Tor. It's win, math win. Just wonder it's still working.

Ouch.

I've opened a new #7743 (moved) for the not-so-full cells issue. There are a couple of possible solutions there: one easy, one almost-easy.

Trac:
Keywords: tor-relay performance deleted, tor-relay performance bwbug added

Trac:
Status: needs_revision to closed
Resolution: N/A to user disappeared

Trac:
Status: closed to reopened
Resolution: user disappeared to N/A

Trac:
Status: reopened to needs_revision

Before I apply any fix here, it seems like a good idea to add some instrumentation like we did for #7743 (moved) to see whether this is actually happening for anyone, and if so, by how much. The obvious fix would be to compare total bytes sent over the net for TLS connections vs total bytes sent on TLS connections to see what the overhead is.

There's a diagnostic branch in branch "bug7707_diagnostic". Let's apply it and see what the numbers look like in practice.

Trac:
Status: needs_revision to needs_review

The bug7707_diagnostic branch looks okay to me.

Merged it; deferring this to 0.2.5 with an 024-backport option and putting it in needs_information. Thanks!

Trac:
Status: needs_review to needs_information
Milestone: Tor: 0.2.4.x-final to Tor: 0.2.5.x-final
Keywords: tor-relay performance bwbug deleted, tor-relay performance bwbug 024-backport added

What are we hoping to learn from the TLS write overhead stats? Some relay operators like Jens report stats like 13%. On moria5 run now it's 8%. On moria1 lately it's 4%.

It seems like 4% is a good number, but 13% is a high number.

Or to give a more useful question: what TLS write overhead do we expect if we're always putting each 512-byte cell in its own TLS record? Which SSL's do the "empty record" trick still, and I assume that adds to this per-cell overhead?

Does the TLS overhead we're counting here mush together both the overhead from normal TLS records and also the overhead from TLS handshakes?

And finally, once we know what overhead percentage range we're hoping for, we should ask relay operators to tell us if they see anything out of that range -- or even turn that into an LD_BUG or something.

Hm. Assuming recent TLS versions (to ignore the empty record trick for now), we're looking at a 20 byte MAC, a 16 byte IV, 1-16 bytes of padding, and 5-10 bytes of headers if I'm reading this format right. That comes to something like 40-50 bytes of overhead per record, which makes it non-crazy to have ~10% overhead.

Somebody else should check my math and spec-fu.

But just because 10% overhead is non-crazy, doesn't mean it's good. Trying to flush a couple cells at once when flushing cells is probably sensible.

Trac:
Milestone: Tor: 0.2.5.x-final to Tor: 0.2.???

Milestone renamed

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.3.???

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

Trac:
Keywords: tor-relay performance bwbug 024-backport deleted, 024-backport, tor-relay, performance, tor-03-unspecified-201612, bwbug added
Milestone: Tor: 0.3.??? to Tor: unspecified

Remove an old triaging keyword.

Trac:
Keywords: tor-03-unspecified-201612 deleted, N/A added

None of these is ripe for backport to 0.2.4 even if it does get fixed.

Trac:
Keywords: 024-backport deleted, N/A added

Trac:
Sponsor: N/A to N/A
Severity: N/A to Normal
Reviewer: N/A to N/A
Keywords: N/A deleted, bandwidth sponsor8-maybe tls nagle added

Trac:
Sponsor: N/A to Sponsor8-can
Milestone: Tor: unspecified to Tor: 0.3.2.x-final

Trac:
Keywords: N/A deleted, 032-unreached added
Milestone: Tor: 0.3.2.x-final to Tor: unspecified

Trac:
Sponsor: Sponsor8-can to N/A
Keywords: N/A deleted, sponsor8-removed added

nagling is not good for low latency OR interactive streams

mentioned in issue #7743 (moved)

moved to tpo/core/tor#7707

mentioned in issue tpo/core/tor#7743 (closed)

Impose a minimum write size for TLS writes

Child items 0

Activity