Opened 2 months ago

Last modified 5 weeks ago

#31683 assigned defect

md: Bad family line in cached-microdescs.new

Reported by: dgoulet Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: microdesc, 042-should
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Noticed these warnings this morning on my client (Tor version 0.4.2.0-alpha-dev (git-796a9b37ea346f41)):

Sep 10 07:50:22.482 [warn] Bad element "$7143512F@last-listed" while parsing a node family.
Sep 10 07:50:22.482 [warn] Bad element "2019-09-09" while parsing a node family.
Sep 10 07:50:22.482 [warn] Bad element "12:46:57" while parsing a node family.

I'm attaching the cached-microdescs.new file to the ticket. See line 15641:

ntor-onion-key tYqPErP1c49tLfeLe+PXR9t0USyps9C8vhxBdAFC+j4=
family $7143512F@last-listed 2019-09-09 12:46:57

I would be very surprised that dirauth published this so I'm guessing we badly wrote on disk?

Child Tickets

Change History (6)

comment:1 Changed 2 months ago by nickm

Keywords: 042-should added

comment:2 Changed 2 months ago by nickm

I think your diagnosis is correct. Is it possible that your Tor client shut down abruptly while writing the microdescs?

I believe we've seen this bug before, maybe #28223?

comment:3 Changed 6 weeks ago by ahf

Owner: set to nickm
Status: newassigned

Distributing 0.4.2 tickets between network team members.

comment:4 Changed 5 weeks ago by nickm

Musing aloud:

I'm trying to make sense of this and of #31364, which I strongly suspect have the same underlying cause.

In both cases, it's microdescriptors, and it's the journal, not the cache.

In both cases, it looks like we have started writing at the wrong point in the file: here, it seems we've started in the middle of another microdescriptor, and in #31364, it looks like we've seeked to a point after the end of the file.

This makes me kind of suspect that our fake O_APPEND implementation in start_writing_to_file() might have something to do with this. Or maybe something is doing concurrent modifications on the microdescriptors journal.

Next to investigate: why do we not observe this with routerdescs? Are we more tolerant of errors in cached_descriptors.new, or is there an important difference in the code? If this is a concurrent modification problem, what could be causing it?

comment:5 Changed 5 weeks ago by nickm

Hm. There is a signed/unsigned bug in the handling of dump_microdescriptor()'s return value. I wonder if that could account for this. dump_microdescriptor() returns -1 on error, but we store its return value in an (unsigned) size_t, and then check whether it is less than zero.

Last edited 5 weeks ago by nickm (previous) (diff)

comment:6 Changed 5 weeks ago by nickm

Oh, that's not true. annotation_len is a size_t, but "size" is a ssize_t, as it should be.

Note: See TracTickets for help on using tickets.