Opened 11 months ago

Last modified 7 months ago

#24368 assigned defect

Tune zstd parameters to decrease memory usage during streaming

Reported by: teor Owned by: ahf
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.1.1-alpha
Severity: Normal Keywords: regression, compression, zstd, tor-dir, 032-backport, 034-triage-20180328, 034-removed-20180328
Cc: ahf Actual Points:
Parent ID: Points: 1
Reviewer: Sponsor:

Description (last modified by teor)

Using the cached-microdesc-consensus that is:

valid-after 2017-11-21 11:00:00

I get the following results:

$ zstd cached-microdesc-consensus
$ gzip -9 cached-microdesc-consensus
$ du -h cached-microdesc-consensus*
1.9M	cached-microdesc-consensus
564K	cached-microdesc-consensus.gz
576K	cached-microdesc-consensus.zst

It's only 2% larger, but I thought zstd was meant to produce smaller consensuses than gzip?
Or did I get the compression settings wrong?

Child Tickets

Attachments (1)

zstd_mem_menchmarks.py (2.3 KB) - added by nickm 9 months ago.

Download all attachments as: .zip

Change History (10)

comment:1 Changed 11 months ago by teor

Description: modified (diff)
Summary: A zstd-compressed cached-microdesc-consensus is 1.5% larger than a gzipped oneA zstd-compressed cached-microdesc-consensus is 2% larger than a gzipped one

Oops, that's 2% larger.

comment:2 Changed 11 months ago by nickm

But if you use -9 with both of them:

$ gzip -9 -c ~/.tor/cached-microdesc-consensus | wc -c
583762
$ zstd -9 -c ~/.tor/cached-microdesc-consensus | wc -c
554019

And in fact:

$ zstd -5 -c ~/.tor/cached-microdesc-consensus  | wc -c
579944

In practice, we use these settings:

compression_level_t zlib 'level' zlib memLevel setting zlib windowBits setting zstd 'preset' setting
BEST Z_BEST_COMPRESSION (9) 9 15 9
HIGH 9 8 15 9
MEDIUM 9 7 13 8
LOW 9 | 6 11 7

This gives us this memory usage for compression, assuming that the calculations in our files are approximately right.

compression_level_t zlib KB (approx) zstd KB usage (approx)
BEST 386 10880
HIGH 258 10880
MEDIUM 98 9856
LOW 42 8832

and this compressed output size (measured in a hacked Tor):

compression_level_t zlib consensus size zstd consensus size
BEST 525841 492916
HIGH 526470 492916
MEDIUM 578218 495020
LOW 663334 496860

Hm. It looks like, if our numbers are right, zstd is far more memory-hungry than gzip is. That's fine for precompression, but for streaming usage, we should probably tune our zstd parameter choices.

comment:3 Changed 11 months ago by teor

Summary: A zstd-compressed cached-microdesc-consensus is 2% larger than a gzipped oneTune zstd parameters to decrease memory usage during streaming

Rename ticket for new objective

comment:4 Changed 10 months ago by nickm

Keywords: 032-backport added

comment:5 Changed 9 months ago by ahf

Owner: set to ahf
Status: newassigned

comment:6 Changed 9 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

I've run some initial experimentation, and here's what I found:

Adjusting the pre-set values shouldn't be necessary if instead we tell zstd to build its own parameters (using ZSTD_getCParams() or ZSTD_getParams), with the estimatedSrcSize argument to tell zstd how big we expect the input to be.

I also think that our current estimates are higher than zstd actually uses, which is a good thing. I'm attaching a python script that I used for these tests; it requires the "zstandard" package.

Now the catch here is that we can't actually adjust the parameters to anything besides the presets unless we use the "advanced" (a.k.a "static-only" zstd APIs). I've opened ticket #25162 about doing that safely. But the complexity is enough that I think we should call this an 0.3.4.x ticket: it is more than simply tweaking a couple of numbers.

Changed 9 months ago by nickm

Attachment: zstd_mem_menchmarks.py added

comment:7 Changed 7 months ago by nickm

Keywords: 034-triage-20180328 added

comment:8 Changed 7 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:9 Changed 7 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

Note: See TracTickets for help on using tickets.