Opened 5 months ago

Closed 4 months ago

#22819 closed defect (wontfix)

Choice of compressors seems to be suboptimal

Reported by: yurivict271 Owned by: ahf
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: ahf Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The latest tor-devel uses 2 compression libraries: zstd and lzma.

Based on this graph https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/DCspeed5.png lzma only slightly exceeds zstd in a small range of values.

Why didn't you choose lz4? Based on this lz4 description https://github.com/lz4/lz4 it offers an advantage in a different range of values: towards the lower left corner of the first graph. lz4 can compress with lower ratio but with much higher speed.

If you want to choose several libraries, doesn't it make sense to cover a wider range of values, rather than choose two libraries that cover a similar range?

zstd + lz4 seems to be a better choice.

Child Tickets

Change History (10)

comment:1 Changed 5 months ago by arma

Component: - Select a componentCore Tor/Tor

comment:2 Changed 5 months ago by nickm

We analyzed the algorithms performance on the actual data we send; see proposal 280 and its references for details.

comment:3 Changed 5 months ago by yurivict271

comment:4 Changed 5 months ago by nickm

Whoops. That should have been 278.

comment:5 Changed 5 months ago by yurivict271

The valid way to compare compression methods for a particular purpose is to take your data samples and run it through lzbench (https://github.com/inikep/lzbench).

Compression method should be a function of the current network speed. When the speed is over a certain threshold, no compression is needed. Below the threshold compression method is chosen based on the graph for all available methods, like the first link above. When the speed is slightly below the threshold, fast and slight compressors are appropriate, like lz4. With slower speeds, deeper and slower compressors kick in, like zstd, and later lzma.

I didn't see you even considering lz4 in proposal 278 references.

comment:6 Changed 5 months ago by dgoulet

Milestone: Tor: unspecified

comment:7 Changed 4 months ago by nickm

Cc: ahf added
Milestone: Tor: unspecifiedTor: 0.3.1.x-final

ahf, do you have time to re-run the analysis on lz4 before 0.3.1 ships?

comment:8 Changed 4 months ago by ahf

Owner: set to ahf
Status: newassigned

Yes. I will take a look at this.

comment:9 Changed 4 months ago by ahf

I just added a simple LZ4 test to our own little benchmarking tool which can be found at https://gitlab.com/ahf/tor-sponsor4-compression/tree/master/bench

I did not measure the memory patterns of LZ4, which was the reason we wrote this tool in the beginning, but it does look like LZ4 is a lot faster than anything else we have (even beats gzip at compression level 1), but the compression ratio is also worse than anything else we have for the cached-consensus document (~2.2 MB of structured text).

The results can be found here: https://docs.google.com/spreadsheets/d/1WzFXQGfH8yI4WCCRAEQBt1Z_sC-3hMkmE6HNKDaaSiE/edit#gid=0

If someone is able to make the ratio better in the bench tool I'm all open to rerun these tests. I could only see an acceleration value to tune in the lz4.h file, which would allow me to make LZ4 even faster, but not get a better compression ratio.

Since we are currently batch processing the compression of our "larger" files in the background I don't think we would win much by including LZ4 here.

comment:10 Changed 4 months ago by ahf

Resolution: wontfix
Status: assignedclosed

I'm going to close this for now. Feel free to reopen if there is an issue with the way we analyse the LZ4 compression in our bench tool.

Note: See TracTickets for help on using tickets.