Opened 3 years ago

Closed 3 years ago

Last modified 15 months ago

#11648 closed defect (fixed)

Problem parsing .z-compressed descriptors fetched via DirPort

Reported by: karsten Owned by:
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay, 2016-bug-retrospective
Cc: atagar, wfn Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I'm having trouble parsing compressed descriptors fetched via tor's DirPort. I don't know enough about .z compression to track down the problem though. Anyway, here's what I found:

Download consensus and all server descriptors from turtles:

curl http://76.73.17.194:9030/tor/status-vote/current/consensus.z > turtles-consensus.z
curl http://76.73.17.194:9030/tor/server/all.z > turtles-server-all.z

Attempt to parse compressed consensus using Python's zlib:

Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import zlib
>>> print len(zlib.decompress(open('turtles-consensus.z', 'rb').read()))
1085611
>>> print len(zlib.decompress(open('turtles-server-all.z', 'rb').read()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -5 while decompressing data: incomplete or truncated stream
>>> print len(zlib.decompressobj().decompress(open('turtles-server-all.z', 'rb').read()))
8932838

Decompressing the consensus works fine using zlib.decompress(), but decompressing the server descriptors does not. The only workaround I found was to explicitly create a decompressor using zlib.decompressobj(). AIUI, the difference between the two approaches is that the latter can handle partial content (cf. http://stackoverflow.com/questions/20620374/how-to-inflate-a-partial-zlib-file/20625078#20625078).

Does that mean tor sends partial content?

Cc'ing atagar and wfn, because we discussed this problem a year ago: https://lists.torproject.org/pipermail/tor-dev/2013-May/004924.html

Child Tickets

Change History (35)

comment:1 follow-up: Changed 3 years ago by wfn

We might need to look at or/directory.c for this. It might be that tor is not returning a proper zlib image (properly 'terminated', or whatever (I don't know how zlib format works, at least not yet.))

Maybe take a look at the part of the code that does the actual 'send zlib-compressed data in chunks' (in directory_handle_command_get() which starts on line ~2524):

2786     if (smartlist_len(items)) {
2787       if (compressed) {
2788         conn->zlib_state = tor_zlib_new(1, ZLIB_METHOD);
2789         SMARTLIST_FOREACH(items, const char *, c,
2790                  connection_write_to_buf_zlib(c, strlen(c), conn, 0));
2791         connection_write_to_buf_zlib("", 0, conn, 1);
2792       } else {
             /* ... */

Could it be that the way tor terminates that zlib stream (add 'empty compressed data' to the zlib stream) is not a proper way to close a zlib archive? (So for example this could mean that tor does indeed assume that it's okay to send 'partial content' (in the sense of unclosed-zlib-archive, zlib-as-stream, not in the sense of fail-to-send-some-data)?)

Can't look more into this right now, but maybe this pointer will help somehow.

Last edited 3 years ago by wfn (previous) (diff)

comment:2 follow-up: Changed 3 years ago by wfn

Last mystery comment for now:

  1. It seems that zlib header should include the length of the zlib image. When sending in multiple chunks, tor is of course unable to provide the right length in the header.
  1. (this is the "try this magical pill" part:) does using zlib.decompress(data, 15 + 32) on python's side help, perchance (can't test right now)? (See: http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream-with-zlib) (note: this will probably not work, because the problem on SE was somewhat actually different; but it's probably easy to try out.)
Last edited 3 years ago by wfn (previous) (diff)

comment:3 in reply to: ↑ 1 ; follow-up: Changed 3 years ago by karsten

Replying to wfn:

Could it be that the way tor terminates that zlib stream (add 'empty compressed data' to the zlib stream) is not a proper way to close a zlib archive? (So for example this could mean that tor does indeed assume that it's okay to send 'partial content' (in the sense of unclosed-zlib-archive, zlib-as-stream, not in the sense of fail-to-send-some-data)?)

I didn't look very closely, but it seems that tor doesn't simply add empty compressed data, but that it also sets done to 1.

comment:4 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by karsten

Replying to wfn:

Last mystery comment for now:

  1. It seems that zlib header should include the length of the zlib image. When sending in multiple chunks, tor is of course unable to provide the right length in the header.

Well, tor could go through chunks twice, once to sum up their lengths and write that to the header, and another time to actually send chunks.

But I don't know the code enough to say if this would solve the problem.

  1. (this is the "try this magical pill" part:) does using zlib.decompress(data, 15 + 32) on python's side help, perchance (can't test right now)? (See: http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream-with-zlib) (note: this will probably not work, because the problem on SE was somewhat actually different; but it's probably easy to try out.)

The magical pill didn't work.

But here's some more information from curl -vv: it seems that tor uses "deflate" to compress descriptors. AFAIK, there's also "zlib" and "gzip", which tor doesn't seem to use. Though Python's zlib should support all of them. Hmm.

comment:5 in reply to: ↑ 3 Changed 3 years ago by wfn

Replying to karsten:

I didn't look very closely, but it seems that tor doesn't simply add empty compressed data, but that it also sets done to 1.

You're right, I think. Someone could try using wireshark or inserting a dump-this-data-in-hex-to-log call, and comparing the bytes with zlib flush modes[1], or something. Fun pastime! :)

Python's zlib should indeed support deflate. Hrmgh.

Here's a more pragmatic angle, which does not help reduce anxiety at all:

curl http://76.73.17.194:9030/tor/server/all.z > turtles-server-all.z
curl http://76.73.17.194:9030/tor/server/all > turtles-server-all-not-compressed
python
Python 2.7.3 (default, Mar 13 2014, 11:03:55) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import zlib
>>> with open('turtles-server-all', 'wb') as f:
...   f.write(zlib.decompressobj().decompress(open('turtles-server-all.z', 'rb').read()))
... 
>>> ^D
diff turtles-server-all turtles-server-all-not-compressed > turtles-diff

Here's one of those outputs: http://ravinesmp.com/volatile/turtles-diff

I also tried sorting the outputs before diffing, but that didn't help much, I think. Those same outputs sorted and then diffed: http://ravinesmp.com/volatile/sorted-turtles-diff (I just used sort naively.)

I did the two curls right one after the other, so maybe it's a matter of the dir authority continuously updating its descriptors, and thus it is natural for two successive all requests to return different results? (_edit_: some of the differences can definitely be explained by this: at least one relay updated its descriptor to a newer one; and the bandwidth numbers differ a bit; this is OK.)

I'm not sure what to make of this. diffing the two outputs using vimdiff does help to reduce confusion a bit, I'd recommend trying vimdiff here. It seems that the bandwidth-numbers-related differences are simply there because of the two successive GETs; but the other differences, I'm not sure about.

If we can make sure the differences are only because of the continuously updated directory data, then we can pragmatically conclude that "every tool using compressed descriptors from directory authorities should assume it is receiving zlib stream data", and maybe "this assumption (zlib stream) should be mentioned in some appropriate place to avoid confusion."

If we can't make sure of that, then we have a spooky rabbit hole.

[1]: http://www.bolet.org/~pornin/deflate-flush-en.html

Last edited 3 years ago by wfn (previous) (diff)

comment:6 in reply to: ↑ 4 Changed 3 years ago by wfn

Replying to karsten:

Replying to wfn:

Last mystery comment for now:

  1. It seems that zlib header should include the length of the zlib image. When sending in multiple chunks, tor is of course unable to provide the right length in the header.

Well, tor could go through chunks twice, once to sum up their lengths and write that to the header, and another time to actually send chunks.

I suppose. The overall logic would have to be changed, as it's currently "make a chunk and send it out right now."

But I don't know the code enough to say if this would solve the problem.

Yeah, maybe this is not a very productive angle anyway.

[...]
But here's some more information from curl -vv: it seems that tor uses "deflate" to compress descriptors. AFAIK, there's also "zlib" and "gzip", which tor doesn't seem to use. Though Python's zlib should support all of them. Hmm.

I'm reeeally not sure of those terms myself, but I *think* that zlib encapsulates DEFLATE either way; it adds (zlib or gzip) header to the DEFLATE-compressed data (DEFLATE is a compression algorithm.) So it *should* be "the data is DEFLATEd, the only question is if the header is in zlib or gzip format." But, I'm not sure.

comment:7 follow-up: Changed 3 years ago by karsten

Here's a theory why compressed and non-compressed results differ: the compressed result is cached on the directory, whereas the non-compressed result is put together on-the-fly. I didn't find the place in the code where this is done, but it seems at least plausible to me.

So, assuming we don't have this particular rabbit hole, can you rephrase your other insights? Is there anything wrong with the way directories compress strings, or is this just a matter of clients handling compressed responses in the right way? Thanks!

comment:8 in reply to: ↑ 7 Changed 3 years ago by wfn

Replying to karsten:

Here's a theory why compressed and non-compressed results differ: the compressed result is cached on the directory, whereas the non-compressed result is put together on-the-fly. I didn't find the place in the code where this is done, but it seems at least plausible to me.

Good idea. (fwiw, I could not find such a place, and from looking at the function write_to_evbuffer_zlib() (called by connection_write_to_buf_impl_(), which is the 'implementation' function for connection_write_to_buf_zlib(), which is noted above) in or/buffers.c (around line 2549), it would appear no caching is taking place. But I know nothing about bufferevents, and maybe one of those *write*() functions further down the callstack are doing something (haven't looked. didn't find anything, though.))

So, assuming we don't have this particular rabbit hole, can you rephrase your other insights? Is there anything wrong with the way directories compress strings, or is this just a matter of clients handling compressed responses in the right way? Thanks!

Let's keep this open for at least another day if that's OK, but for now, the tl;dr would be "the clients should assume the compressed data is a zlib stream." This should automatically imply that 1) all is well in the way tor handles zlib data (directory-server-side and client-side), and that 2) your python solution with the decompressobj() is the correct way to handle this (from that SO thread it looks like other people do the same when having to handle zlib streams) as far as python is concerned.

If it's not the correct way, it's (only) a matter of finding the proper way of handling zlib streams {in python | in general}.

Main caveat: I don't know nearly enough about zlib to be able to make such claims without frowning. :) But, (assuming the diffs can be explained away) I think that this was just a mismatch in assumptions about zlib data. This is how it should be.

I might still recommend to note this behaviour down somewhere, i.e. include a "clients should assume the compressed data is a zlib stream" (vs. archive), but I'm not sure what's the best place for that, if any.

Last edited 3 years ago by wfn (previous) (diff)

comment:9 follow-up: Changed 3 years ago by wfn

TL;DR of my thoughts right now:

  • I don't think any zlib-compressed-data caching happens (can't find any in buffers or torgzip, etc.)
  • Nevertheless, the diffs should really be just because directories' view of relays and their descriptors is always changing (it would be nice to confirm this in a definite manner, of course.) Also, some caching could be happening "somewhere else" (i.e.: I don't know.)
  • all.z is a zlib stream, and as far as zlib streams go, everything is OK with it.
  • I'm not really qualified to make the above statements, sorry. i.e.: someone else should take a look at this, too.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 3 years ago by karsten

Replying to wfn:

TL;DR of my thoughts right now:

Thanks for including a tl;dr. :)

  • I don't think any zlib-compressed-data caching happens (can't find any in buffers or torgzip, etc.)
  • Nevertheless, the diffs should really be just because directories' view of relays and their descriptors is always changing (it would be nice to confirm this in a definite manner, of course.) Also, some caching could be happening "somewhere else" (i.e.: I don't know.)

I'm not too worried about this, but I can't say for sure that there's no bug here. But should we move this to a separate ticket? The two possible issues seem unrelated. Would you want to create that ticket?

  • all.z is a zlib stream, and as far as zlib streams go, everything is OK with it.

I don't buy that yet. Why would zlib.decompress fail if this is a valid zlib stream?

(Also note that the zlib.decompressobj solution doesn't work for me, because I'm really looking for a way to make java.util.zip.InflaterInputStream work. I just wrote the test cases in Python, because I was hoping to get more feedback on that. And look how this worked just fine! But I really suspect that the problem is in tor's use of zlib.)

So, I looked at turtles-server-all.z using xxd with RFC 1950 opened in a second window. The first bytes (0x78da) look like a valid header (78 = "deflate with 32k window size", da= "max compression, no dict"), but I'm not sure about the last bytes (0x0000ffff). These last bytes are supposed to be the Adler-32 checksum, but these bytes don't look like a valid checksum to me. Maybe something is not initialized or not flushed correctly?

  • I'm not really qualified to make the above statements, sorry. i.e.: someone else should take a look at this, too.

Same here! Thanks for trying!

comment:11 Changed 3 years ago by wfn

Hmm, it seems that something must be wrong with that zlib stream data, yes, because according to http://stackoverflow.com/questions/3122145/zlib-error-error-3-while-decompressing-incorrect-header-check/22310760#22310760, one of those options (one of the "window sizes") should work. I tried them all, and kept getting "incomplete or truncated stream" or "invalid stored block lengths".

<detail>

common/torgzip.c:

 87 /** Return the 'bits' value to tell zlib to use <b>method</b>.*/
 88 static INLINE int
 89 method_bits(compress_method_t method)
 90 {
 91   /* Bits+16 means "use gzip" in zlib >= 1.2 */
 92   return method == GZIP_METHOD ? 15+16 : 15;
 93 }

https://docs.python.org/2/library/zlib.html:

zlib.decompress(string[, wbits[, bufsize]])

[...]

wbits must not be smaller than the size originally used to compress the stream; using a too-small value will result in an exception.
The default value is [...] 15.

I tried all the window sizes that people try (including zlib.MAX_WBITS|32 for automatic header detection), nothing worked.

I don't know why decompressobj().decompress() does work. I looked at python's packages briefly, in particular, at local/lib/python2.7/encodings/zlib_codec.py, but didn't find anything useful there. Fun times.

</detail>

Last edited 3 years ago by wfn (previous) (diff)

comment:12 in reply to: ↑ 10 ; follow-up: Changed 3 years ago by wfn

Replying to karsten:

Replying to wfn:

[...]

  • I don't think any zlib-compressed-data caching happens (can't find any in buffers or torgzip, etc.)
  • Nevertheless, the diffs should really be just because directories' view of relays and their descriptors is always changing (it would be nice to confirm this in a definite manner, of course.) Also, some caching could be happening "somewhere else" (i.e.: I don't know.)

I'm not too worried about this, but I can't say for sure that there's no bug here. But should we move this to a separate ticket? The two possible issues seem unrelated. Would you want to create that ticket?

Could you please clarify/confirm what you had in mind? By the two issues, did you mean

  1. .z-compressed descriptors can't be parsed / the zlib data returned by tor is strange (this ticket)
  2. tor/server/all and tor/server/all.z (possibly) differ (new ticket)

Just say "yes, this", and I'll create a ticket for "2."

So, I looked at turtles-server-all.z using xxd with RFC 1950 opened in a second window. The first bytes (0x78da) look like a valid header (78 = "deflate with 32k window size", da= "max compression, no dict"), but I'm not sure about the last bytes (0x0000ffff). These last bytes are supposed to be the Adler-32 checksum, but these bytes don't look like a valid checksum to me. Maybe something is not initialized or not flushed correctly?

Good catch!

Potentially directly related: there are multiple zlib flush modes. Here's what zlib's manual has to say about Z_SYNC_FLUSH (emphasis mine):

If the parameter flush is set to Z_SYNC_FLUSH, all pending output is flushed to the output buffer and the output is aligned on a byte boundary, so that the decompressor can get all input data available so far. (In particular avail_in is zero after the call if enough output space has been provided before the call.) Flushing may degrade compression for some compression algorithms and so it should be used only when necessary. This completes the current deflate block and follows it with an empty stored block that is three bits plus filler bits to the next byte, followed by four bytes (00 00 ff ff).

This is probably related. (To clarify for someone else who might be reading this (ha, ha): those four bytes are at the very end of the archived .z stream.)

Regarding the flush modes themselves, python's zlib mentions

Z_SYNC_FLUSH, Z_FULL_FLUSH, Z_FINISH.

tor's gzip module only uses Z_SYNC_FLUSH and Z_FINISH.

(There are more modes in zlib.)

Could it be that the two sides are making different assumptions about these flush modes?

Or that tor does not do a Z_FINISH in the end (though, maybe it does. It has the code for that. Whether it's called at the end of (the writing of) the .z stream, I'm not sure.)

Responding inline to less important points below (i.e.: can probably skip them.)

  • all.z is a zlib stream, and as far as zlib streams go, everything is OK with it.

I don't buy that yet. Why would zlib.decompress fail if this is a valid zlib stream?

(This paragraph doesn't help much, but it explains my confused thought process ->) When I wrote that, I was operating under the assumption that there are zlib archives and streams, and that python expects an archive. fwiw, there does seem to be some kind of assumption that python makes (see comments on http://stackoverflow.com/questions/3122145/zlib-error-error-3-while-decompressing-incorrect-header-check/22310760#22310760 maybe), but, I've little clue. Maybe zlib data is a zlib stream and that's that, and it should just work.

(Also note that the zlib.decompressobj solution doesn't work for me, because I'm really looking for a way to make java.util.zip.InflaterInputStream work. I just wrote the test cases in Python, because I was hoping to get more feedback on that. And look how this worked just fine! But I really suspect that the problem is in tor's use of zlib.)

Aha. :)

http://docs.oracle.com/javase/7/docs/api/java/util/zip/InflaterInputStream.html says that DEFLATE is expected. OK.

Any meaningful error messages on java's part here?

Also, by the way: the tor_gzip_compress() in torgzip.c looks like a rather hairy function. It doesn't operate at a "quick hack" level, but it does seem to make some specific assumptions. Either way, the comments included there do not reassure me that tor_gzip_compress() does everything right. :)

comment:13 follow-up: Changed 3 years ago by atagar

Does that mean tor sends partial content?

Hi Karsten, isn't this the same issue as #9379? Nine months back I noticed that the DirPort's compressed responses were very frequently truncated. Retries could address it, but the failure rate was so high it was actually better for DocTor to opt for uncompressed requests...

https://gitweb.torproject.org/doctor.git/commit/c7bd07015a5e7fa52be542cda05bb12fca070c4d

comment:14 in reply to: ↑ 13 Changed 3 years ago by wfn

Replying to atagar:

Does that mean tor sends partial content?

Hi Karsten, isn't this the same issue as #9379? Nine months back I noticed that the DirPort's compressed responses were very frequently truncated. Retries could address it, but the failure rate was so high it was actually better for DocTor to opt for uncompressed requests...

https://gitweb.torproject.org/doctor.git/commit/c7bd07015a5e7fa52be542cda05bb12fca070c4d

Ah, so there was a ticket for this. :) Yes, AFAICT this is the same issue. Same exception in python's traceback, etc.

There's a useful comment there,

In theory, you could use your script to download from a directory mirror, right? Can you find a directory mirror that logs on debug level that you then request server or extra-info descriptors from?

I now remember actually trying to download the tor/server/all[.z] from a (very-small-time) exit relay which is also a directory mirror (with the intention of hacking in "dump descriptor data (for the GET request) as it comes to debug log / to file" functionality into it (switching to tor git-master would be easy anyway)), and getting an empty response for tor/server/all and tor/server/all.z:

But:

comment:15 Changed 3 years ago by nickm

My guess would be that the stream is ending before it writes out data from Z_FINISH. But when I tried to reproduce this behavior with chutney, I couldn't make it happen.

One possible thing to blame from dirserv.c:

      /* XXXX024 This 'last' business should actually happen on the last
       * routerinfo, not on the last fingerprint. */

If that's the case, this will happen some times but not always. It looks like it would only happen when the identity of the last router added to the big key list turns out not be present... but I'd expect that would be pretty rare. Strange that this would be happening so consistently.

To actually debug this, one good first step would be to see if you can get it to repeat reliably on a test network.

With respect to the other issues:

My guess is that the difference between all and the decompressed all.z is an artifact of changed information between the first and second requests. Probably a different issue, if it's an issue at all.

My guess is that the low-bandwidth router is refusing to answer because of:

      if (global_write_bucket_low(TO_CONN(conn), dlen, 2)) {
        log_info(LD_DIRSERV,
                 "Client asked for server descriptors, but we've been "
                 "writing too many bytes lately. Sending 503 Dir busy.");
        write_http_status_line(conn, 503, "Directory busy, try again later");
        conn->dir_spool_src = DIR_SPOOL_NONE;
        goto done;
      }

comment:16 Changed 3 years ago by nickm

(I just added a unit test for write_to_buf_zlib to see if there might be something low-level going on there; I didn't find anything. Commit was ed0e2ecaa797696b88a388e4f82091d6aa)

comment:17 Changed 3 years ago by nickm

It might be interesting to know whether this also happens:

  • on attempts to download a few descriptors by descriptor digest and/or fingerprint
  • on attempts to download many descriptors by descriptor digest and/or fingerprint
  • on attempts to download microdescriptors.

comment:18 Changed 3 years ago by atagar

I'm not sure about reproing this for a test network, but see #9379 for a small script to repro this. You can target it against a particular authority or directory mirror to see what it's doing on the server side.

comment:19 Changed 3 years ago by wfn

95.85.10.71 does indeed return a 503 (should have seen this myself, whoops.) Upped the limit, and at least the compressed descriptors archive is now being returned.

Its .z returned is successfully parsed by python's import zlib; print len(zlib.decompress(open('all.z', 'rb').read())), and the archive ends with a different four byte tail (seems unique etc., assuming it's the adler32 checksum.) Re-downloaded later, another different tail.

(Re: other things that Nick said, will try and see to them, but won't happen before ~tomorrow)

comment:20 Changed 3 years ago by wfn

turtles seems to consistently produce a bogus all.z (which at this point can be identified by the trailing 0000ffff, which seems to be the way a Z_SYNC_FLUSH (not Z_FINISH) ends.)) Other authorities inconclusive, but IAC turtles seems to be unique in its failure rate (subsequent requests do not help.)

I wonder what can its last entry in the conn->fingerprint_stack (when it (well, dir_fps) gets filled) be? Can we infer, etc.? Can there be a tangling node on conn->fingerprint_stack which fails at descriptor lookup?

comment:21 Changed 3 years ago by karsten

Without looking at dirserv.c, it looks like missing descriptors might be causing this problem. Here's an easy way to reproduce:

Fetch an existing descriptor by digest:

$ curl http://128.31.0.39:9131/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d.z > moria1-f5.z
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   874    0   874    0     0    300      0 --:--:--  0:00:02 --:--:--   300

Attempt to fetch a non-existing descriptor by digest:

$ curl http://128.31.0.39:9131/tor/server/d/1917db60e6c769567fc364e9dbb6fe8aa5feeb16.z > moria1-19.z
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

So far so good, now fetch both at once:

$ curl http://128.31.0.39:9131/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d+1917db60e6c769567fc364e9dbb6fe8aa5feeb16.z > moria1-f5+19.z
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   875    0   875    0     0   2523      0 --:--:-- --:--:-- --:--:--  2528

Note in the curl output how the second download is 1 byte larger than the first.

Look at the binary diff using xxd:

$ diff <(xxd moria1-f5.z) <(xxd moria1-f5+19.z)
1c1
< 0000000: 78da 9593 c9ae a338 1486 f73c 85d7 15e5  x......8...<....
---
> 0000000: 78da 9493 c9ae a338 1486 f73c 85d7 15e5  x......8...<....
55c55
< 0000360: 9b13 ff28 e307 212c 71aa                 ...(..!,q.
---
> 0000360: 9b13 ff28 e307 0000 00ff ff              ...(.......

There's one difference in the first line, but more importantly, the last line differs. The second download ends with 0x0000ffff which we saw before. It seems this was supposed to be the separator between first and second descriptor. But then tor didn't find a second descriptor and just closed the stream.

Now, how's a compressed result of two descriptors supposed to look like?

Let's fetch another existing descriptor:

$ curl http://128.31.0.39:9131/tor/server/d/6d44fd5b8e5b7382089b02614eddb64b290eed42.z > moria1-6d.z
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   817    0   817    0     0    598      0 --:--:--  0:00:01 --:--:--   598

And now we're fetching both existing descriptors at once:

$ curl http://128.31.0.39:9131/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d+6d44fd5b8e5b7382089b02614eddb64b290eed42.z > moria1-f5+6d.z
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1497    0  1497    0     0   1257      0 --:--:--  0:00:01 --:--:--  1257

Here are the first line, all lines containing 0x0000ffff, and the last line:

0000000: 78da 9493 c9ae a338 1486 f73c 85d7 15e5  x......8...<....
...
0000360: 9b13 ff28 e307 0000 00ff ffb5 d45b 8f9a  ...(.........[..
...
00005d0: 967f 07e8 279b 30cd 28                   ....'.0.(

comment:22 in reply to: ↑ 12 Changed 3 years ago by karsten

Replying to wfn:

Could you please clarify/confirm what you had in mind? By the two issues, did you mean

  1. .z-compressed descriptors can't be parsed / the zlib data returned by tor is strange (this ticket)
  2. tor/server/all and tor/server/all.z (possibly) differ (new ticket)

Just say "yes, this", and I'll create a ticket for "2."

Yes, this. :) See also nickm's later comment for a possible explanation of 2 ("My guess is that the difference between all and the decompressed all.z is an artifact of changed information between the first and second requests. Probably a different issue, if it's an issue at all."). Thanks!

comment:23 follow-up: Changed 3 years ago by nickm

  • Keywords 024-backport added
  • Milestone set to Tor: 0.2.5.x-final
  • Status changed from new to needs_review

In that case, I have a theory that my branch "bug11648" will fix this. I'm still curious where the fingerprints for the missing relays are coming from in the all.z case, though.

comment:24 Changed 3 years ago by nickm

(That branch is 100% untested, so please don't run it on a directory authority or anything.)

comment:25 Changed 3 years ago by nickm

Totally untested patch in branch "bug11787" in my public repository. whoops, wrong ticket.

Last edited 3 years ago by nickm (previous) (diff)

comment:26 Changed 3 years ago by nickm

  • Keywords 024-backport removed

comment:27 in reply to: ↑ 23 Changed 3 years ago by wfn

Replying to nickm:

In that case, I have a theory that my branch "bug11648" will fix this. I'm still curious where the fingerprints for the missing relays are coming from in the all.z case, though.

TL;DR patch works, and seems to work in the very way that we'd assumed it should work. At this point I don't think there's any doubt how strange .zs are produced: if the {microdescriptor, descriptor} for the last digest on a list of digests to be processed cannot be fetched, a needed call to close the zlib stream is skipped. The question (maybe) remains why a digest list for all.z may contain a non-fetchable entry at the end so frequently (this is empirical, and unclear.)

As far as the patch is concerned, IMO it's good. (See below for details.)

  • Can't (yet) reproduce bogus (with 0000ffff in the end) all.z on a directory mirror
  • Can reproduce bogus "unfetchable descriptor at the end" archive:
    • curl -vv http://95.85.10.71/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d+1917db60e6c769567fc364e9dbb6fe8aa5feeb16.z | xxd | tail
  • This archive is not bogus (parsed by python's zlib; no 0000ffff (Z_SYNC_FLUSH end)) when running Nick's patch - on a different router:
    • curl -vv http://5.135.188.202:995/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d+1917db60e6c769567fc364e9dbb6fe8aa5feeb16.z | xxd | tail
  • Whereas existing+existing.z good for both:
    • curl -vv http://95.85.10.71/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d+8f6ff348d3a6dd5ce1fc93d030a4c82acb842c6e.z > vicare_good.z
    • curl -vv http://5.135.188.202:995/tor/server/d/f53bcf37a5233f4bf91373a62f92692a526e866d+8f6ff348d3a6dd5ce1fc93d030a4c82acb842c6e.z > ravine_good.z
    • diff vicare_good.z ravine_good.z => identical (and good)
    • (note that descriptors may expire etc.; later on, you may need to supply different digests)
  • Inserted a couple of `log_info()`s (branch "bug11648_dirserv_log_bogus_desc") (the "log_info() if descriptor or microdescriptor lookup fails" might as well eventually go into master IMO, because IMO it's log_info()-worthy. Unless logging router digests requested at INFO level is not a good idea?)
    • "processed last digest" not logged when last digest lookup fails (confirms one of the assumptions)
    • "processed last digest" logged when last digest lookup succeeds
    • "failed to look up descriptor" logged as intended

Directory mirror 5.135.188.202:995 running bug11648_dirserv_log_bogus_desc (which is on top of bug11648), directory mirror 95.85.10.71:80 running vanilla 0.2.4.21.

  • would be nice to see what kinds of dangling non-fetchable digests are on turtles' all. Would be possible to do that in a simple manner by running those log_info()s, enabling INFO log, and grepping for "last digest on the stack".
  • still curious why turtles' (and maybe in general) all.z contain nonexistent digests at the end so often.


Last edited 3 years ago by wfn (previous) (diff)

comment:28 Changed 3 years ago by nickm

still curious why turtles' (and maybe in general) all.z contain nonexistent digests at the end so often.

Note how the processing for "all" works in dirserv_get_routerdescs(): it takes all the routers in the routerlist, then excludes any that can't be sent unencrypted. It takes their identity fingerprints, not their descriptor digests.

Later, connection_dirserv_add_servers_to_outbuf() tries to get a descriptor using signed_descriptor_get_by_fp(). But look how the publish_cutoff field works: it only actually serves a descriptor when if it was published within the last ROUTER_MAX_AGE_TO_PUBLISH seconds. Otherwise, it treats the server as having no descriptor. I think that is likely the major cause of what's going on here.

comment:29 Changed 3 years ago by nickm

  • Resolution set to fixed
  • Status changed from needs_review to closed

Based on those tests, it seems like a good idea to merge the patch into 0.2.5. Because of our use of Z_SYNC_FLUSH, we aren't actually losing any real data here, though, so I say this is not a backport candidate for 0.2.4.

Merging patch, closing ticket. Please open new ticket(s) as needed if there are lingering mysteries to be chased down.

comment:30 Changed 3 years ago by nickm

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:31 Changed 3 years ago by nickm

We aren't done here. See bug #311 -- that's the reason we introduced the old behavior in 475eb5d6. I think there's an easy fix, though.

comment:32 Changed 3 years ago by nickm

  • Resolution set to fixed
  • Status changed from reopened to closed

And in fact, I think we really fixed bug #311 with 05eff35ac6d64b. Added a unit test as 891d239e012a. Opening a new ticket for a related issue though.

comment:33 Changed 15 months ago by nickm

  • Keywords 2016-bug-retrospective added

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

comment:34 Changed 15 months ago by nickm

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

comment:35 Changed 15 months ago by nickm

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

Note: See TracTickets for help on using tickets.