Opened 3 months ago

Last modified 6 days ago

#31683 new defect

md: Bad family line in cached-microdescs.new

Reported by: dgoulet Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: microdesc, 042-should
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by teor)

Noticed these warnings this morning on my client (Tor version 0.4.2.0-alpha-dev (git-796a9b37ea346f41)):

Sep 10 07:50:22.482 [warn] Bad element "$7143512F@last-listed" while parsing a node family.
Sep 10 07:50:22.482 [warn] Bad element "2019-09-09" while parsing a node family.
Sep 10 07:50:22.482 [warn] Bad element "12:46:57" while parsing a node family.

I'm attaching the cached-microdescs.new file to the ticket. See line 15641:

ntor-onion-key tYqPErP1c49tLfeLe+PXR9t0USyps9C8vhxBdAFC+j4=
family $7143512F@last-listed 2019-09-09 12:46:57

I would be very surprised that dirauth published this ^ so I'm guessing we badly wrote on disk?

Child Tickets

Change History (10)

comment:1 Changed 3 months ago by nickm

Keywords: 042-should added

comment:2 Changed 3 months ago by nickm

I think your diagnosis is correct. Is it possible that your Tor client shut down abruptly while writing the microdescs?

I believe we've seen this bug before, maybe #28223?

comment:3 Changed 2 months ago by ahf

Owner: set to nickm
Status: newassigned

Distributing 0.4.2 tickets between network team members.

comment:4 Changed 2 months ago by nickm

Musing aloud:

I'm trying to make sense of this and of #31364, which I strongly suspect have the same underlying cause.

In both cases, it's microdescriptors, and it's the journal, not the cache.

In both cases, it looks like we have started writing at the wrong point in the file: here, it seems we've started in the middle of another microdescriptor, and in #31364, it looks like we've seeked to a point after the end of the file.

This makes me kind of suspect that our fake O_APPEND implementation in start_writing_to_file() might have something to do with this. Or maybe something is doing concurrent modifications on the microdescriptors journal.

Next to investigate: why do we not observe this with routerdescs? Are we more tolerant of errors in cached_descriptors.new, or is there an important difference in the code? If this is a concurrent modification problem, what could be causing it?

comment:5 Changed 2 months ago by nickm

Hm. There is a signed/unsigned bug in the handling of dump_microdescriptor()'s return value. I wonder if that could account for this. dump_microdescriptor() returns -1 on error, but we store its return value in an (unsigned) size_t, and then check whether it is less than zero.

Version 0, edited 2 months ago by nickm (next)

comment:6 Changed 2 months ago by nickm

Oh, that's not true. annotation_len is a size_t, but "size" is a ssize_t, as it should be.

comment:7 Changed 3 weeks ago by nickm

Milestone: Tor: 0.4.2.x-finalTor: unspecified

None of my hypotheses here were supported by the code; I don't see a fix happening here in 0.4.2.

comment:8 Changed 2 weeks ago by nickm

Owner: nickm deleted

comment:9 Changed 2 weeks ago by nickm

Status: assignednew

comment:10 Changed 6 days ago by teor

Description: modified (diff)
Note: See TracTickets for help on using tickets.