Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#22206 closed defect (fixed)

Fetching compressed consensus gives it uncompressed if Accept-Encoding header is specified

Reported by: Sebastian Owned by: ahf
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression
Cc: ahf Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This morning, we had this fun notice from doctor:

ERROR: Unable to retrieve the consensus from moria1 (http://128.31.0.39:9131/tor/status-vote/current/consensus.z): Error -3 while decompressing data: incorrect header check
ERROR: Unable to retrieve the vote from moria1 (http://128.31.0.39:9131/tor/status-vote/current/authority.z): Error -3 while decompressing data: incorrect header check

The file from that hour is available at http://seul.org/~arma/consensus-20170508.z.

The next weird thing is that using three different wget versions from different platforms, using

wget http://128.31.0.39:9131/tor/status-vote/current/consensus.z

to fetch a consensus from moria gives an uncompressed file. yawning and I can reproduce that, arma can not. Something fishy seems to be going on?

Child Tickets

Change History (21)

comment:1 Changed 2 years ago by arma

git show 1bc21111 has all the keywords I would expect to search for.

comment:2 Changed 2 years ago by arma

<Yawning> Content-Encoding: identity
<Yawning> network-status-version 3
<Yawning> vote-status consensus
<Yawning> consensus-method 25Y
<Yawning> that does not look like the identity encoding of a .z file to me
<Yawning> so tor is spitting back uncompressed data at me
<Yawning> wget is setting `Accept-Encoding: identity`

comment:3 Changed 2 years ago by arma

I've left moria1 running the shiny new code (git commit 2e4f3b36), so others can make progress on the ticket.

comment:4 Changed 2 years ago by arma

My wget on jessie (the one that behaves ok) is sending these headers:

GET /tor/status-vote/current/consensus.z HTTP/1.1
User-Agent: Wget/1.16 (linux-gnu)
Accept: */*
Host: 128.31.0.39:9131
Connection: Keep-Alive

I assume the other wgets (the ones that misbehave) are setting an Accept-Encoding header.

comment:5 Changed 2 years ago by Sebastian

Note that the headers aren't necessarily relevant, my jessie's headers are these (according to a wget to a nc -l arma set up for me):

GET /foo.z HTTP/1.1
User-Agent: Wget/1.16 (linux-gnu)
Accept: */*
Host: 128.31.0.47:50000
Cache-Control: max-age=259200
Connection: keep-alive

yet that wget fetches an uncompressed file.

comment:6 in reply to:  1 Changed 2 years ago by yawning

or/directory.c:directory_handle_command_get()

  if ((header = http_get_header(headers, "Accept-Encoding: "))) {
    compression_methods_supported = parse_accept_encoding_header(header);
    tor_free(header);
  } else {
    compression_methods_supported = (1u << NO_METHOD);
    if (zlib_compressed_in_url)
      compression_methods_supported |= (1u << ZLIB_METHOD);
  }

That's at least one of the problems. If zlib_compressed_in_url is set, the data to be transferred before Accept-Encoding: is taken into account should be zlib compressed payload.

comment:7 Changed 2 years ago by yawning

The code MUST respond to .z GETs/Accept-Encoding: identity requests with a response consisting of zlib compressed payload, because the identity transform of a .z file is a .z file, not plaintext.

The code MAY responmd to .z GETs/Accept-Encoding: that does not include identity requests by doing an extra compression pass on the .z payload. Alternatively, it could return an error (406 Not Acceptable seems like the right one?).

comment:8 Changed 2 years ago by Sebastian

Ok, my jessie stuff sat behind a proxy doing who knows what. So the behaviour I'm seeing on jessie is the same as arma, at least.

comment:10 Changed 2 years ago by teor

Cc: ahf added

comment:11 Changed 2 years ago by Sebastian

Summary: Fetching compressed consensus behaves unpredictably strangeFetching compressed consensus gives it uncompressed if Accept-Encoding header is specified

comment:12 Changed 2 years ago by ahf

Owner: set to ahf
Status: newassigned

comment:13 Changed 2 years ago by nickm

Keywords: regression added

comment:14 Changed 2 years ago by ahf

Status: assignedneeds_review

comment:15 Changed 2 years ago by nickm

Why this approach in particular, and not something more like:

diff --git a/src/or/directory.c b/src/or/directory.c
index 67c28e1f3e22e4..c1db40ef22cfc2 100644
--- a/src/or/directory.c
+++ b/src/or/directory.c
@@ -3516,8 +3516,9 @@ directory_handle_command_get,(dir_connection_t *conn, const char *headers,
     tor_free(header);
   } else {
     compression_methods_supported = (1u << NO_METHOD);
-    if (zlib_compressed_in_url)
-      compression_methods_supported |= (1u << ZLIB_METHOD);
+  }
+  if (zlib_compressed_in_url) {
+    compression_methods_supported |= (1u << ZLIB_METHOD);
   }
 
   /* Remove all methods that we don't both support. */

(This is a question, not a suggestion -- I'm trying to understand what the pros and cons of the two approaches are.)

comment:16 Changed 2 years ago by ahf

My initial patch was more or less like that, except that I toggled the ZLIB_METHOD in the branch where the Accept-Encoding header value was read.

I added the flag to the parse_accept_encoding_header() in case we wanted to do ".zstd", ".gz", and ".lzma" at some point in the future, which should make that more trivial to do. I'm OK with either solutions though.

comment:17 Changed 2 years ago by nickm

Let's go with the smaller one for now? I think that, by using Accept-Encoding, we're moving to deprecate the use of .z and other extensions.

comment:18 Changed 2 years ago by ahf

Works for me.

comment:19 Changed 2 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Fixed in 8266d193a608

comment:20 in reply to:  19 Changed 2 years ago by yawning

Replying to nickm:

Fixed in 8266d193a608

I'm not going to re-open this, but the behavior is still wrong in certain cases.

Specifically, an error (or a double compressed payload) should be returned when the .z request contains an Accept-Encoding header that specifies anything other than identity/deflate. This may be contrived and not worth worrying about, but the correct behavior (especially in the former case) isn't that hard.

comment:21 Changed 2 years ago by nickm

Added #22233 to track that and related discussion; probably missed something significant in my summary.

Note: See TracTickets for help on using tickets.