"Currently tor_gzip_uncompress() starts with
an assumed 50% compression guess. Typical
consensus document compression is slightly
better than 65% and this assumption results
in at least one realloc() and sometimes
two realloc() calls during decompression.
Adjust the starting assumption to 75%
compression to eliminate the cost
of the realloc()."
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
This modifies the estimate for every gzip decompression performed by tor.
We could go two ways on this:
allow the caller to supply an estimate in a new tor_gzip_uncompress_estimate() (and leave existing callers as-is)
modify the estimate globally
I will now check the range of compression ratios on tor directory files.
I plan to use gzip -9 as a proxy for tor_gzip_compress. Note that the gzip command generates file headers, but tor generates HTTP/[Tor/]TCP/IP headers. The difference is likely to be negligible for large files.
And it's the largest files that matter for decompression speed and memory usage.
The microdescriptor consensus and descriptors are downloaded by most Tor clients (and, therefore, by most Tor instances). The descriptors are downloaded individually, so their ratios may be slightly lower. (The "full" consensus and descriptors aren't used by most clients, so they can be ignored for the purposes of this analysis.)
I suggest we increase the expected ratio to 75%. We currently double the size of the buffer when we need to reallocate anyway, so we are already using that much RAM for every decompression, we're just allocating 50%, then reallocating 75%. (Except if the individual microdescriptor ratios fall under 50%, then we're only using 50%.)
Alternately, we could increase the expected ratio to 70%, saving 1 reallocation and 5% RAM in typical cases. This would be a win for both performance and RAM usage.
Open questions:
What is the range of compression ratios on recent microdescriptor consensuses? Do they vary much?
The micro descriptor consensuses are 1.2 MBytes, and the combined size of all cached microdescriptors is 2.9MBytes.
The 5% difference between 70% and 75% is 61 KByte for the 1.2 MByte micro descriptor consensus, which is the largest individual file. This optimisation costs us nothing, except that if we cut it too close, and the compression ratio improves, we get a memory doubling and reallocation during decompression, which we’re trying to avoid. This would be most likely to happen in the full consensus decompression (currently 66% compression ratio on 1.4 MBytes), which is the one we care least about, because it's on the most powerful servers [citation needed].
Do we want to go with the 70% option to save both RAM and reallocation performance?
Or do we want to go with the 75% option to avoid reallocation, even if the ratio improves?
Move a few tickets out of 0.2.8. I would take a good patch for most of these if somebody writes one. (If you do, please make the ticket needs_review and move it back into maint-0.2.8 milestone. :) )
Trac: Milestone: Tor: 0.2.8.x-final to Tor: 0.2.???