If our TokenBucketRefillInterval is very low, we'll frequently wind up with very small writes, which can be exceptionally bad with TLS. One answer is to say "don't do that then" and keep TokenBucketRefillInterfal to about 100msec or so. Another answer is to nagle our TLS writes, and never write less than the full amount in the output buffer, or one cell, whichever is smaller.
For non-TLS writes, the kernel should nagle for us, so we're probably fine, though it might be sensible to impose a write threshold there too.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I've got a tentative implementation in nagle_tls_writes.
I'm not sure whether it's a safe patch as it is, though: since we can always read/write on edge connections, I worry that merging this patch as-is will make TLS connections stall because of comparatively low global_*_write_bucket values. Should I be concerned?
Also, before concluding this works, check out connection_bucket_round_robin and its callers and their callers. This isn't quite going to work here, I think.
For non-TLS writes, the kernel should nagle for us, so we're probably fine, though it might be sensible to impose a write threshold there too.
Nagle can't magically decrease latency of Tor network, so most of the times exit relays rapids by data over internet.
Many Web servers can wrongly identify such behavior like slow HTTP attack made by Slowloris software (https://en.wikipedia.org/wiki/Slowloris). Most effective way to stop attack is to drop connection. Everyone can see such defence right now if using TBB. Tor surfing becomes very pain full.
This bug is not only about minimum write size for TLS writes, (or) just because TokenBucketRefillInterval is very low.
It's general bug about small data values operated by Tor. If exit relay allowed to read 1 byte from edge connection then resulted overhead per data cell will be more than 500%. And you can't fix it by limiting minimum of plain text per TLS record, you can just prevent even more overhead percentage with such limits.
Man, bad math.
Infinite overhead. For such case we should count userbase*500, effective load over Tor network is 250M of users in some extreme case. Congrats, major number of states surfing over Tor. It's win, math win. Just wonder it's still working.
Before I apply any fix here, it seems like a good idea to add some instrumentation like we did for #7743 (moved) to see whether this is actually happening for anyone, and if so, by how much. The obvious fix would be to compare total bytes sent over the net for TLS connections vs total bytes sent on TLS connections to see what the overhead is.
What are we hoping to learn from the TLS write overhead stats? Some relay operators like Jens report stats like 13%. On moria5 run now it's 8%. On moria1 lately it's 4%.
It seems like 4% is a good number, but 13% is a high number.
Or to give a more useful question: what TLS write overhead do we expect if we're always putting each 512-byte cell in its own TLS record? Which SSL's do the "empty record" trick still, and I assume that adds to this per-cell overhead?
Does the TLS overhead we're counting here mush together both the overhead from normal TLS records and also the overhead from TLS handshakes?
And finally, once we know what overhead percentage range we're hoping for, we should ask relay operators to tell us if they see anything out of that range -- or even turn that into an LD_BUG or something.
Hm. Assuming recent TLS versions (to ignore the empty record trick for now), we're looking at a 20 byte MAC, a 16 byte IV, 1-16 bytes of padding, and 5-10 bytes of headers if I'm reading this format right. That comes to something like 40-50 bytes of overhead per record, which makes it non-crazy to have ~10% overhead.