Tune zstd parameters to decrease memory usage during streaming

changed milestone to %Tor: unspecified

added 032-unreached-backport 034-removed-20180328 034-triage-20180328 component::core tor/tor compression milestone::Tor: unspecified points::1 priority::medium regression severity::normal status::new tor-dir type::defect version::tor 0.3.1.1-alpha zstd labels

Oops, that's 2% larger.

Trac:
Description: Using the cached-microdesc-consensus that is:

valid-after 2017-11-21 11:00:00

I get the following results:

$ zstd cached-microdesc-consensus
$ gzip -9 cached-microdesc-consensus
$ du -h cached-microdesc-consensus*
1.9M	cached-microdesc-consensus
564K	cached-microdesc-consensus.gz
576K	cached-microdesc-consensus.zst

It's only 1.5% larger, but I thought zstd was meant to produce smaller consensuses than gzip? Or did I get the compression settings wrong?

to

Using the cached-microdesc-consensus that is:

valid-after 2017-11-21 11:00:00

I get the following results:

$ zstd cached-microdesc-consensus
$ gzip -9 cached-microdesc-consensus
$ du -h cached-microdesc-consensus*
1.9M	cached-microdesc-consensus
564K	cached-microdesc-consensus.gz
576K	cached-microdesc-consensus.zst

It's only 2% larger, but I thought zstd was meant to produce smaller consensuses than gzip? Or did I get the compression settings wrong?
Summary: A zstd-compressed cached-microdesc-consensus is 1.5% larger than a gzipped one to A zstd-compressed cached-microdesc-consensus is 2% larger than a gzipped one

But if you use -9 with both of them:

$ gzip -9 -c ~/.tor/cached-microdesc-consensus | wc -c
583762
$ zstd -9 -c ~/.tor/cached-microdesc-consensus | wc -c
554019

And in fact:

$ zstd -5 -c ~/.tor/cached-microdesc-consensus  | wc -c
579944

In practice, we use these settings:

= compression_level_t =	= zlib 'level' =	= zlib memLevel setting =	= zlib windowBits setting =	= zstd 'preset' setting =
BEST	Z_BEST_COMPRESSION (9)	9	15	9
HIGH	9	8	15	9
MEDIUM	9	7	13	8
LOW	9		6	11

This gives us this memory usage for compression, assuming that the calculations in our files are approximately right.

= compression_level_t =	= zlib KB (approx) =	= zstd KB usage (approx) =
BEST	386	10880
HIGH	258	10880
MEDIUM	98	9856
LOW	42	8832

and this compressed output size (measured in a hacked Tor):

= compression_level_t =	= zlib consensus size =	= zstd consensus size =
BEST	525841	492916
HIGH	526470	492916
MEDIUM	578218	495020
LOW	663334	496860

Hm. It looks like, if our numbers are right, zstd is far more memory-hungry than gzip is. That's fine for precompression, but for streaming usage, we should probably tune our zstd parameter choices.

Rename ticket for new objective

Trac:
Summary: A zstd-compressed cached-microdesc-consensus is 2% larger than a gzipped one to Tune zstd parameters to decrease memory usage during streaming

Trac:
Keywords: N/A deleted, 032-backport added

Trac:
Status: new to assigned
Owner: N/A to ahf

I've run some initial experimentation, and here's what I found:

Adjusting the pre-set values shouldn't be necessary if instead we tell zstd to build its own parameters (using ZSTD_getCParams() or ZSTD_getParams), with the estimatedSrcSize argument to tell zstd how big we expect the input to be.

I also think that our current estimates are higher than zstd actually uses, which is a good thing. I'm attaching a python script that I used for these tests; it requires the "zstandard" package.

Now the catch here is that we can't actually adjust the parameters to anything besides the presets unless we use the "advanced" (a.k.a "static-only" zstd APIs). I've opened ticket #25162 (moved) about doing that safely. But the complexity is enough that I think we should call this an 0.3.4.x ticket: it is more than simply tweaking a couple of numbers.

Trac:
Milestone: Tor: 0.3.3.x-final to Tor: 0.3.4.x-final

Trac:
zstd_mem_menchmarks.py

Trac:
Keywords: N/A deleted, 034-triage-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

Trac:
Keywords: N/A deleted, 034-removed-20180328 added

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

Trac:
Milestone: Tor: 0.3.4.x-final to Tor: unspecified

0.3.2 is end of life, so 032-backport is now 032-unreached-backport.

Trac:
Keywords: 032-backport deleted, 032-unreached-backport added

Liberating some of the tickets that ahf had.

Trac:
Owner: ahf to N/A

Change tickets that are assigned to nobody to "new".

Trac:
Status: assigned to new

changed time estimate to 8h

mentioned in issue #24501 (moved)

mentioned in issue #25372 (moved)

moved to tpo/core/tor#24368 (closed)

mentioned in issue tpo/core/tor#24501 (closed)

Tune zstd parameters to decrease memory usage during streaming

Child items ...

Activity