Opened 2 years ago
Closed 2 years ago
#22819 closed defect (wontfix)
Choice of compressors seems to be suboptimal
Reported by: | yurivict271 | Owned by: | ahf |
---|---|---|---|
Priority: | Medium | Milestone: | Tor: 0.3.1.x-final |
Component: | Core Tor/Tor | Version: | |
Severity: | Normal | Keywords: | |
Cc: | ahf | Actual Points: | |
Parent ID: | Points: | ||
Reviewer: | Sponsor: |
Description
The latest tor-devel uses 2 compression libraries: zstd and lzma.
Based on this graph https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/DCspeed5.png lzma only slightly exceeds zstd in a small range of values.
Why didn't you choose lz4? Based on this lz4 description https://github.com/lz4/lz4 it offers an advantage in a different range of values: towards the lower left corner of the first graph. lz4 can compress with lower ratio but with much higher speed.
If you want to choose several libraries, doesn't it make sense to cover a wider range of values, rather than choose two libraries that cover a similar range?
zstd + lz4 seems to be a better choice.
Child Tickets
Change History (10)
comment:1 Changed 2 years ago by
Component: | - Select a component → Core Tor/Tor |
---|
comment:2 Changed 2 years ago by
comment:3 Changed 2 years ago by
The list of proposals ends with 279: https://gitweb.torproject.org/torspec.git/tree/proposals
comment:5 Changed 2 years ago by
The valid way to compare compression methods for a particular purpose is to take your data samples and run it through lzbench (https://github.com/inikep/lzbench).
Compression method should be a function of the current network speed. When the speed is over a certain threshold, no compression is needed. Below the threshold compression method is chosen based on the graph for all available methods, like the first link above. When the speed is slightly below the threshold, fast and slight compressors are appropriate, like lz4. With slower speeds, deeper and slower compressors kick in, like zstd, and later lzma.
I didn't see you even considering lz4 in proposal 278 references.
comment:6 Changed 2 years ago by
Milestone: | → Tor: unspecified |
---|
comment:7 Changed 2 years ago by
Cc: | ahf added |
---|---|
Milestone: | Tor: unspecified → Tor: 0.3.1.x-final |
ahf, do you have time to re-run the analysis on lz4 before 0.3.1 ships?
comment:8 Changed 2 years ago by
Owner: | set to ahf |
---|---|
Status: | new → assigned |
Yes. I will take a look at this.
comment:9 Changed 2 years ago by
I just added a simple LZ4 test to our own little benchmarking tool which can be found at https://gitlab.com/ahf/tor-sponsor4-compression/tree/master/bench
I did not measure the memory patterns of LZ4, which was the reason we wrote this tool in the beginning, but it does look like LZ4 is a lot faster than anything else we have (even beats gzip at compression level 1), but the compression ratio is also worse than anything else we have for the cached-consensus
document (~2.2 MB of structured text).
The results can be found here: https://docs.google.com/spreadsheets/d/1WzFXQGfH8yI4WCCRAEQBt1Z_sC-3hMkmE6HNKDaaSiE/edit#gid=0
If someone is able to make the ratio better in the bench
tool I'm all open to rerun these tests. I could only see an acceleration
value to tune in the lz4.h
file, which would allow me to make LZ4 even faster, but not get a better compression ratio.
Since we are currently batch processing the compression of our "larger" files in the background I don't think we would win much by including LZ4 here.
comment:10 Changed 2 years ago by
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
I'm going to close this for now. Feel free to reopen if there is an issue with the way we analyse the LZ4 compression in our bench
tool.
We analyzed the algorithms performance on the actual data we send; see proposal 280 and its references for details.