Opened 5 years ago

Closed 5 years ago

#11791 closed defect (implemented)

Our zlib deflate objects require lots of memory per directory connection

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay, zlib, 025-backport, 026-triaged-1 nickm-patch
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Compressing descriptors on the fly requires a lot of memory with zlib's default settings. According to zconf.h:

/* The memory requirements for deflate are (in bytes):
            (1 << (windowBits+2)) +  (1 << (memLevel+9))
 that is: 128K for windowBits=15  +  128K for memLevel = 8  (default values)
 plus a few kilobytes for small objects.

We are using windowbits == 15 and memLevel == 8. This might explain in part why busy directories are so memory-hungry. In addition to our investigations for #11787, we should see whether we can safely reduce this to 32K or so with windowBits == 12 and memlevel == 5. It appears in preliminary investigation that this compression level would not hurt compression very much (0-5% for microdescs, 3-7% for server descriptors).

We could also choose our window size and memlevel based on current memory availability.

Child Tickets

Attachments (1)

zlibtest2.py (1.8 KB) - added by nickm 5 years ago.

Download all attachments as: .zip

Change History (10)

Changed 5 years ago by nickm

Attachment: zlibtest2.py added

comment:1 Changed 5 years ago by nickm

Keywords: 026-triaged-1 added

comment:2 Changed 5 years ago by Sebastian

Hrm. My first idea was to store them compressed, so we wouldn't need to do on the fly deflation, but that'd mean they don't compress too well. Do we have other options? Can we change the setting dynamically if we've had to invoke the oomkiller before?

comment:3 Changed 5 years ago by nickm

My first idea was to store them compressed, so we wouldn't need to do on the fly deflation, but that'd mean they don't compress too well.

Maybe we should look for designs where a bunch of clients will wind up requesting the same sets of descriptors? That might help. Could be tricky, though, and it would represent a big divergence from our current design.

Can we change the setting dynamically if we've had to invoke the oomkiller before?

Or we could change the settings if we are close to the memory limit. But I don't know whether that's a such win: we'd be using more bandwidth that way.

And maybe we could retain the current settings, but limit the number of descriptor downloads that we support concurrently.

comment:4 Changed 5 years ago by nickm

We could also fiddle with the memLevel parameter to deflateInit2:

The documentation says:

The memLevel parameter specifies how much memory should be allocated
for the internal compression state. memLevel=1 uses minimum memory but is
slow and reduces compression ratio; memLevel=9 uses maximum memory for
optimal speed. The default value is 8. See zconf.h for total memory usage
as a function of windowBits and memLevel.

comment:5 Changed 5 years ago by nickm

I tried some experiments on microdescriptors, and found that for those, we seem to be able to lower windowbits with not much real damage to our compression ratio. We could probably take windowbits down to 11 and not lose more than a few percent. More tests needed though.

For the performance hit of lowering memLevel: it didn't seem too bad to take it down to 7, but at 6 we took a 10% hit. (At windowbits==13, input size ==65536, level =8.)

Another possibility is to investigate the average number of desciptors or microdescriptors we serve per request. I'd note that for a 1024-byte input, lowering memLevel doesn't actually hurt performance at all, and lowering the compression ratio doesn't really waste that many bytes.

comment:6 Changed 5 years ago by nickm

Status: newneeds_review

I've done an implementation in branch feature11791 in my public repository. Please review?

comment:7 Changed 5 years ago by nickm

Keywords: nickm-patch added

comment:8 Changed 5 years ago by nickm

rl1987 says this looks good, but suggests that we could bikeshed harder on the return values of get_memlevel().

This patch still looks okay to me 7 weeks later.

comment:9 Changed 5 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Merged! And thanks for the review.

Note: See TracTickets for help on using tickets.