Opened 2 months ago

Last modified 3 weeks ago

#24368 new defect

Tune zstd parameters to decrease memory usage during streaming

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.1-alpha
Severity: Normal Keywords: regression, compression, zstd, tor-dir, 032-backport
Cc: ahf Actual Points:
Parent ID: Points: 1
Reviewer: Sponsor:

Description (last modified by teor)

Using the cached-microdesc-consensus that is:

valid-after 2017-11-21 11:00:00

I get the following results:

$ zstd cached-microdesc-consensus
$ gzip -9 cached-microdesc-consensus
$ du -h cached-microdesc-consensus*
1.9M	cached-microdesc-consensus
564K	cached-microdesc-consensus.gz
576K	cached-microdesc-consensus.zst

It's only 2% larger, but I thought zstd was meant to produce smaller consensuses than gzip?
Or did I get the compression settings wrong?

Child Tickets

Change History (4)

comment:1 Changed 2 months ago by teor

Description: modified (diff)
Summary: A zstd-compressed cached-microdesc-consensus is 1.5% larger than a gzipped oneA zstd-compressed cached-microdesc-consensus is 2% larger than a gzipped one

Oops, that's 2% larger.

comment:2 Changed 2 months ago by nickm

But if you use -9 with both of them:

$ gzip -9 -c ~/.tor/cached-microdesc-consensus | wc -c
583762
$ zstd -9 -c ~/.tor/cached-microdesc-consensus | wc -c
554019

And in fact:

$ zstd -5 -c ~/.tor/cached-microdesc-consensus  | wc -c
579944

In practice, we use these settings:

compression_level_t zlib 'level' zlib memLevel setting zlib windowBits setting zstd 'preset' setting
BEST Z_BEST_COMPRESSION (9) 9 15 9
HIGH 9 8 15 9
MEDIUM 9 7 13 8
LOW 9 | 6 11 7

This gives us this memory usage for compression, assuming that the calculations in our files are approximately right.

compression_level_t zlib KB (approx) zstd KB usage (approx)
BEST 386 10880
HIGH 258 10880
MEDIUM 98 9856
LOW 42 8832

and this compressed output size (measured in a hacked Tor):

compression_level_t zlib consensus size zstd consensus size
BEST 525841 492916
HIGH 526470 492916
MEDIUM 578218 495020
LOW 663334 496860

Hm. It looks like, if our numbers are right, zstd is far more memory-hungry than gzip is. That's fine for precompression, but for streaming usage, we should probably tune our zstd parameter choices.

comment:3 Changed 8 weeks ago by teor

Summary: A zstd-compressed cached-microdesc-consensus is 2% larger than a gzipped oneTune zstd parameters to decrease memory usage during streaming

Rename ticket for new objective

comment:4 Changed 3 weeks ago by nickm

Keywords: 032-backport added
Note: See TracTickets for help on using tickets.