Opened 7 months ago

Closed 7 months ago

#25667 closed defect (worksforme)

LZMA/ZSTD descriptor compression support

Reported by: atagar Owned by:
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression? 033-must needs-analysis
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi lovely core tor folks. I've been working on Stem support for spec 278 which was merged in tor 0.3.1.1 but I'm struggling to find an example of it working in practice...

https://gitweb.torproject.org/torspec.git/commit/?id=1cb56af
https://gitweb.torproject.org/user/atagar/stem.git/commit/?h=compression

Moria1 is running tor 0.3.4.0 so it definitely should have lzma and zstd compression support, but when I query its dirport only the identity and deflate headers seem to work...

% curl --header "Accept-Encoding: identity" 128.31.0.39:9131/tor/server/fp/9695DFC35FFEB861329B9F1AB04C46397020CE31
router moria1 128.31.0.34 9101 0 9131
identity-ed25519
-----BEGIN ED25519 CERT-----
AQQABnxNAQS9ja600v/ZodOUiu7NepTkbPIOrFPgEVQE+03rGBtPAQAgBADKnR/C
2nhpr9UzJkkbPy83sqbfNh63VgFnCpkSTULAcq52z8xM7raRDCiTJTu/FK/BJGgE
dJcFQ8MgZJOuYgFKcMVyQ6j2FGbhDI0zQTK1+TAPNRG4ixiF7h7wqDT9Ugw=
-----END ED25519 CERT-----
master-key-ed25519 yp0fwtp4aa/VMyZJGz8vN7Km3zYet1YBZwqZEk1CwHI
platform Tor 0.3.4.0-alpha-dev on Linux
...
% curl --header "Accept-Encoding: deflate" 128.31.0.39:9131/tor/server/fp/9695DFC35FFEB861329B9F1AB04C46397020CE31
[ compressed data ]
% curl --header "Accept-Encoding: x-zstd" 128.31.0.39:9131/tor/server/fp/9695DFC35FFEB861329B9F1AB04C46397020CE31
[ uncompressed data, same as 'identity' ]

% curl --header "Accept-Encoding: x-tor-lzma" 128.31.0.39:9131/tor/server/fp/9695DFC35FFEB861329B9F1AB04C46397020CE31
[ uncompressed data, same as 'identity' ]

Child Tickets

Change History (13)

comment:1 Changed 7 months ago by nickm

Keywords: regression? 033-must needs-analysis added
Milestone: Tor: 0.3.3.x-final

We should figure out if we broke compression. If not, this might not be 033-must, but it would be a shame to ship with a bug like this.

First off -- just because Moria is 0.3.4, doesn't mean it will necessarily support zstd and/or lzma2. It only supports those if it detects them at compile time, and I think Roger builds his own tors from source. So that's something to check on.

One way to learn more about what's going on here might be to use the -v option with curl, so it will also log all the sent/received headers to stderr.

comment:2 Changed 7 months ago by atagar

Thanks Nick!

If not, this might not be 033-must, but it would be a shame to ship with a bug like this.

Agreed. To be clear I don't think this is particularly important. The introduction of lzma and zstd support are nice additions but if they don't work then not the end of the world. :)

First off -- just because Moria is 0.3.4, doesn't mean it will necessarily support zstd and/or lzma2. It only supports those if it detects them at compile time

Ahhh. My first thought when I read this was "how is this a useful feature?", but in reading proposal 278 I see now that the caller is supposed to provide *all* compression schemes they're willing to accept.

I think we need to change the spec in a few ways...

  1. The spec should say that relays 'MUST' support plaintext and deflate/gzip (ie. callers can rely on those) but that relays only 'SHOULD' support lzma and zstd.
  1. Callers should advertise all compression schemes they support in 'Accept-Encoding'. The relay will then pick the best it supports from among those (falling back to identity if none are supported). The compression scheme used is the response is indicated in the reply's 'Content-Encoding' header.
  1. The spec should say how we pick among the compression. That is to say, lzma > zstd > gzip > identity (or whatever the actual behavior is).

If this sounds good and you can tell me how tor actually picks for #3 I'll send ya a spec patch.

Cheers! -Damian

comment:3 Changed 7 months ago by atagar

Hi Nick. Just tried requesting lzma and zstd compressed descriptors from all present dirauths but they all fell back to plaintext responses as moria1 did. Do we know if this feature is exercised anywhere in practice?

comment:4 Changed 7 months ago by teor

The spec should say how we pick among the compression. That is to say, lzma > zstd > gzip > identity (or whatever the actual behavior is).

This is the required behaviour::

Relays MUST compress all directory documents with gzip. If lzma or zstd are available, each compression method MAY be used to compress some types of directory documents.

This is how the current implementation works, but clients shouldn't rely on the exact details:

Typically, compression methods are used for the documents that provide the best compression/CPU/RAM tradeoffs. Some compression methods are used to compress long-lived documents, then those documents are cached. Other methods are used for streaming compression when documents are requested.

This is the required behaviour:

When a directory mirror receives a request for compressed data, it MUST serve a format that is available and supported by the client.

This is how the current implementation works, but clients shouldn't rely on the exact details:

If multiple methods are supported, tor chooses the compression method in this order:

Client requests contain all supported methods in this order:
lzma > zstd > zlib > gzip > identity
https://gitweb.torproject.org/tor.git/tree/src/or/directory.c#n3627

Directory responses choose a common, supported method in this order:
Precompressed, cached data: lzma > zstd > zlib > gzip > identity
Streamed data: zstd > zlib > gzip > identity
Anonymous connections (e.g. HSDirs): zlib > gzip > identity
https://gitweb.torproject.org/tor.git/tree/src/or/directory.c#n3575

comment:5 Changed 7 months ago by atagar

Thanks Tim! I went ahead and pushed Stem support for compression headers. Once I have an example of lzma and zstd working in the live network I'll add integ coverage and whip up a spec patch.

comment:6 in reply to:  3 Changed 7 months ago by teor

Replying to atagar:

Hi Nick. Just tried requesting lzma and zstd compressed descriptors from all present dirauths but they all fell back to plaintext responses as moria1 did. Do we know if this feature is exercised anywhere in practice?

You can test lzma against any of these relays:
https://metrics.torproject.org/rs.html#search/radia

Unfortunately, they don't have zstd support, because it's not in the Debian release they are using.
If you ask on tor-relays, you can probably find a few operators who have only enabled zstd, or who have enabled both lzma and zstd.
The relevant configure options are --enable-lzma and --enable-zstd.

Tor also logs the supported compression methods on startup:
Tor %s running on %s with Libevent %s, OpenSSL %s, Zlib %s, Liblzma %s, and Libzstd %s.
https://gitweb.torproject.org/tor.git/tree/src/or/main.c#n3323

comment:7 Changed 7 months ago by teor

Hmm, I am having pkg-config issues on radia. No lzma support yet.

comment:8 Changed 7 months ago by teor

You can now test lzma and zstd against any of these relays:
https://metrics.torproject.org/rs.html#search/radia

[notice] Tor 0.3.2.10-dev (git-57b69160a4a50f38) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.2l, Zlib 1.2.8, Liblzma 5.1.0alpha, Libzstd 1.3.2, and PrivCount 3.0.0.

(The log line is slightly different because they are running a research version of Tor. That shouldn't matter for compression.)

comment:9 Changed 7 months ago by atagar

Thanks Tim, but no luck. Perhaps I'm doing something wrong? The following should fetch a lzma compressed descriptor from radia0...

curl --verbose --header "Accept-Encoding: x-tor-lzma" 91.121.230.208:9030/tor/server/fp/9695DFC35FFEB861329B9F1AB04C46397020CE31 > /tmp/desc
*   Trying 91.121.230.208...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 91.121.230.208 (91.121.230.208) port 9030 (#0)
> GET /tor/server/fp/9695DFC35FFEB861329B9F1AB04C46397020CE31 HTTP/1.1
> Host: 91.121.230.208:9030
> User-Agent: curl/7.47.0
> Accept: */*
> Accept-Encoding: x-tor-lzma
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Thu, 29 Mar 2018 20:10:25 GMT
< Content-Type: text/plain
< X-Your-Address-Is: 208.113.130.116
< Content-Encoding: identity
< Expires: Sat, 31 Mar 2018 20:10:25 GMT

comment:10 in reply to:  9 Changed 7 months ago by teor

Replying to atagar:

Thanks Tim, but no luck. Perhaps I'm doing something wrong? The following should fetch a lzma compressed descriptor from radia0...

Descriptors are not available in the lzma encoding, because they are streamed:

Replying to teor:

Directory responses choose a common, supported method in this order:
Precompressed, cached data: lzma > zstd > zlib > gzip > identity
Streamed data: zstd > zlib > gzip > identity
Anonymous connections (e.g. HSDirs): zlib > gzip > identity
https://gitweb.torproject.org/tor.git/tree/src/or/directory.c#n3575

This is what I get:

$ for encoding in x-tor-lzma x-zstd deflate gzip identity; do
    for doc in server/authority status-vote/current/consensus; do
        echo "Accept-Encoding: $encoding"
        echo "Requested: $doc"
        curl -s -O --header "Accept-Encoding: $encoding" \
            91.121.230.208:9030/tor/$doc && \
        file authority
    done
done
Accept-Encoding: x-tor-lzma
Requested: server/authority
authority: ASCII text
Accept-Encoding: x-tor-lzma
Requested: status-vote/current/consensus
authority: ASCII text
Accept-Encoding: x-zstd
Requested: server/authority
authority: Zstandard compressed data (v0.8+), Dictionary ID: None
Accept-Encoding: x-zstd
Requested: status-vote/current/consensus
authority: Zstandard compressed data (v0.8+), Dictionary ID: None
Accept-Encoding: deflate
Requested: server/authority
authority: zlib compressed data
Accept-Encoding: deflate
Requested: status-vote/current/consensus
authority: zlib compressed data
Accept-Encoding: gzip
Requested: server/authority
authority: gzip compressed data, max compression, from Unix
Accept-Encoding: gzip
Requested: status-vote/current/consensus
authority: gzip compressed data, max compression, from Unix
Accept-Encoding: identity
Requested: server/authority
authority: ASCII text
Accept-Encoding: identity
Requested: status-vote/current/consensus
authority: ASCII text

I think that the consensus might not be available in lzma because it was cached on disk by a previous version of tor. Maybe wait an hour or two for a new consensus to arrive, and try lzma again?

comment:11 Changed 7 months ago by teor

Oh, it helps if the script checks the consensus file after downloading consensuses. Now lzma works:

$
for encoding in x-tor-lzma x-zstd deflate gzip identity; do
    for doc in server/authority status-vote/current/consensus; do
        echo
        echo "Accept-Encoding: $encoding"
        echo "Requested: $doc"
        curl -s -O --header "Accept-Encoding: $encoding" \
            91.121.230.208:9030/tor/$doc && \
            file `basename $doc`
    done
done

Accept-Encoding: x-tor-lzma
Requested: server/authority
basename $doc
authority: ASCII text

Accept-Encoding: x-tor-lzma
Requested: status-vote/current/consensus
basename $doc
consensus: LZMA compressed data, streamed

Accept-Encoding: x-zstd
Requested: server/authority
basename $doc
authority: Zstandard compressed data (v0.8+), Dictionary ID: None

Accept-Encoding: x-zstd
Requested: status-vote/current/consensus
basename $doc
consensus: Zstandard compressed data (v0.8+), Dictionary ID: None

Accept-Encoding: deflate
Requested: server/authority
basename $doc
authority: zlib compressed data

Accept-Encoding: deflate
Requested: status-vote/current/consensus
basename $doc
consensus: zlib compressed data

Accept-Encoding: gzip
Requested: server/authority
basename $doc
authority: gzip compressed data, max compression, from Unix

Accept-Encoding: gzip
Requested: status-vote/current/consensus
basename $doc
consensus: zlib compressed data

Accept-Encoding: identity
Requested: server/authority
basename $doc
authority: ASCII text

Accept-Encoding: identity
Requested: status-vote/current/consensus
basename $doc
consensus: ASCII text, with very long lines

I opened #25676 for zlib consensuses in response to gzip requests:

Accept-Encoding: gzip
Requested: status-vote/current/consensus
basename $doc
consensus: zlib compressed data

comment:12 Changed 7 months ago by atagar

Great, thanks Tim! I'll give this a whirl tomorrow.

comment:13 Changed 7 months ago by atagar

Resolution: worksforme
Status: newclosed

Stem now has working support for zstd and lzma. Thanks for the help, Tim!

Note: See TracTickets for help on using tickets.