Opened 3 years ago

Closed 3 years ago

#18322 closed defect (implemented)

Log unparseable votes so they can be analysed

Reported by: teor Owned by: andrea
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-dos, TorCoreTeam201606, review-group-4
Cc: Actual Points: 3
Parent ID: #17293 Points: 2
Reviewer: Sponsor: SponsorU-can

Description

When we log unparseable desc stuff for our vote, we proceed to overwrite it with the invalid consensus we produced. The vote gets logged at log level notice, but only in truncated form not allowing one to analyze this bug.

Resolving #18321 may fix this as well.

Child Tickets

Change History (26)

comment:1 Changed 3 years ago by teor

Parent ID: #17668

comment:2 Changed 3 years ago by teor

Milestone: Tor: 0.2.7.x-finalTor: 0.2.9.x-final

comment:3 Changed 3 years ago by isabela

Sponsor: SponsorU-can

comment:4 Changed 3 years ago by nickm

Parent ID: #17668
Points: small/medium

comment:5 Changed 3 years ago by nickm

Keywords: TorCoreTeam201602 removed

comment:6 Changed 3 years ago by nickm

Keywords: tor-dos added

comment:7 Changed 3 years ago by isabela

Points: small/medium2

comment:8 Changed 3 years ago by nickm

Parent ID: #17293

comment:9 Changed 3 years ago by andrea

Owner: set to andrea
Status: newassigned

Taking ownership for 0.2.9 triage

comment:10 Changed 3 years ago by nickm

Keywords: TorCoreTeam201606 added

comment:11 Changed 3 years ago by andrea

I think the simplest solution here is to modify dump_desc() in routerparse.c to include the hash of the offending descriptor in the filename and then mention the hash in the log message about the parse failure. This potentially allows someone to exhaust disk space by uploading a lot of bogus descriptors, though, so we'll need to either have a way to clean up old logged descriptors or otherwise limit that, or restrict it only to cases where the unparseable material can't have come from an arbitrary source.

comment:12 Changed 3 years ago by andrea

Status: assignedneeds_review

Proposed fix in my bug18322 branch; this introduces a new configuration option for the maximum size of dumped descriptors. When this option is non-zero, the filename for dumped descriptors includes their hex-encoded SHA-256, which is referenced in the log message at the time of the dump.

comment:13 Changed 3 years ago by andrea

Actual Points: 3

comment:14 Changed 3 years ago by nickm

Keywords: review-group-4 added

comment:15 in reply to:  12 Changed 3 years ago by asn

Status: needs_reviewneeds_revision

Replying to andrea:

Proposed fix in my bug18322 branch; this introduces a new configuration option for the maximum size of dumped descriptors. When this option is non-zero, the filename for dumped descriptors includes their hex-encoded SHA-256, which is referenced in the log message at the time of the dump.

Hello. An initial review follows. I actually expected this feature to require less code, but it seems like fears of DoS made it more complicated. My current main concerns are the UX of this change, and the lack of tests.

  • I think this feature is disabled by default. Why is that? Is there any way the old behavior is better than this feature with appropriate defaults?

Also, when this feature is disabled, I think we fallback to the old behavior. Why would we ever want to do that? Why not enable this by default, with a small size allowance (e.g. 10MB) and remove all the old behavior code?

  • The torrc option is called DetailedLogForUnparseableDescriptors but it actually requires a size input to work. With such a name, I would expect it to be a boolean option that enables or disables the feature. If we enable this feature by default as suggested above, maybe we can repurpose the torrc option to something that simply controls the maximum size?
  • I feel that this FIFO functionality is quite complex (lots of code added, asserts and length calculations) without any tests whatsoever. The general logic is also undocumented.

I wonder if there is something simpler we could do instead of FIFO. Like just refuse to log more descriptors if the maximum size has been reached, and log this to the operator. Or maybe enforce limits based on the number of descriptors instead of the exact size.

Alternatively, if we like the FIFO approach, maybe some tests would help to bring more confidence here.

  • if (emit_prefix) filelen = 50 + strlen(type) + len; What is 50 here?
  • if (debugfile_base != NULL) tor_free(debugfile_base);

tor_free() should be able to handle NULL as input, so the if check is not necessary. Also, maybe initializing debugfile and debugfile_base to NULL could be more defensive.

Setting this to needs_revision for now. I'd like to do some more review and testing here eventually.

comment:16 Changed 3 years ago by andrea

The torrc option is called DetailedLogForUnparseableDescriptors but it actually requires a size input to work. With such a name, I would expect it to be a boolean option that enables or disables the feature. If we enable this feature by default as suggested above, maybe we can repurpose the torrc option to something that simply controls the maximum size?

Yeah, it can be renamed to MaxUnparseableDescSizeToLog or something like that.

I think this feature is disabled by default. Why is that? Is there any way the old behavior is better than this feature with appropriate defaults?

Also, when this feature is disabled, I think we fallback to the old behavior. Why would we ever want to do that? Why not enable this by default, with a small size allowance (e.g. 10MB) and remove all the old behavior code?

Okay, sure.

I wonder if there is something simpler we could do instead of FIFO. Like just refuse to log more descriptors if the maximum size has been reached, and log this to the operator. Or maybe enforce limits based on the number of descriptors instead of the exact size.

The former option would be acceptable; the latter would be a bad move - one of the descriptor types that dump_desc() can be called on is a new router descriptor uploaded to a dirauth. I believe counting number rather than size lets anyone force a dirauth to consume as much disk space as it can allocate memory by uploading a huge, unparseable descriptor. We really do want to watch how much we log here.

if (emit_prefix) filelen = 50 + strlen(type) + len; What is 50 here?

It's the maximum length of the file header naming the type of descriptor; it's ugly, but it's also been there since 2009: https://gitweb.torproject.org/tor.git/commit/?id=889c07f1fc54b5bd60c7919f8a5fc784eed2d57a

That'll go if we get rid of the old code, of course.

comment:17 Changed 3 years ago by andrea

Status: needs_revisionneeds_review

See bug18322_v2 branch, which removes the fallback to the old-style logging, renames the config variable and checks for repeated identical unparseable descriptors when logging. Current testing methodology is to run a private network and curl --data-binary @test-desc.txt -v http://127.0.0.1:19030/tor/ with a suitably permuted bogus router descriptor in test-desc.txt and one dirauth's dirport at 127.0.0.1:19030. I'm working on some unit tests for this.

comment:18 Changed 3 years ago by andrea

Rebased and with unit tests added in my bug18322_v2_tests branch

comment:19 Changed 3 years ago by asn

Status: needs_reviewneeds_revision

Hello, feature looking more robust now IMO! I did some chutney testing and it seems to work fine as well!

BTW, on the changes file you don't need to write bugfix on ??? since this is a feature. You can just say Closes ticket 18322 and be done with it.

I'm gonna turn this to needs_revision just for the changes file fix. After this feel free to turn this into merge_ready from my part.

comment:20 Changed 3 years ago by nickm

Status: needs_revisionmerge_ready

I'm actually going to call it merge_ready (pending my review) -- since fixing up changes stuff is no big deal for me, and otherwise I'll forget this. :)

comment:21 Changed 3 years ago by nickm

Status: merge_readyneeds_revision
  • Don't hate me for this, but: we can't use size_t for the total size of a bunch of these files, I think. On 32-bit systems, size_t is 4 bytes, but file sizes can often be 8 bytes.
  • We don't have to say "if (foo) tor_free(foo);". Just say "tor_free(foo)."
  • It looks like the dumped descriptor limit is only applied over a number
  • Should these files go into a subdirectory of the datadir? That would give sysadmins the ability to stick it on another volume, give it different permissions, etc.
  • What do we do about NUL characters?
  • Shouldn't we scan the unparseable-desc files at startup, and count how big they are and when they were last modified? Otherwise we will still up the disk eventually if Tor stops and starts enough times.
  • This approach will break with the linux seccomp2 sandbox, since we can't predict the filenames in advance. But IMO it'll be fine to just say "if the sandbox is on, don't store these to disk."

comment:22 Changed 3 years ago by andrea

Don't hate me for this, but: we can't use size_t for the total size of a bunch of these files, I think. On 32-bit systems, size_t is 4 bytes, but file sizes can often be 8 bytes.

Files we generate here can't be, since they had to fit in process memory to get written, but I guess you can make up some weird case where we also do the scanning for existing files thing, and the user switches between running a 32-bit and a 64-bit Tor on the same directory. I'll switch it to uint64_t, though, just to be safe.

We don't have to say "if (foo) tor_free(foo);". Just say "tor_free(foo)."

Okay.

It looks like the dumped descriptor limit is only applied over a number

What does this mean?

Should these files go into a subdirectory of the datadir? That would give sysadmins the ability to stick it on another volume, give it different permissions, etc.

Okay.

What do we do about NUL characters?

If we decided something with a NUL in the middle of it constitutes a descriptor, we broke somewhere before we ever got to dump_desc(), since that function takes the bad descriptor as a NUL-terminated string. I'd call that worth checking out what does happen, but orthogonal to this ticket. If it can happen, we'd handle it by passing a length arg into dump_desc() and using write_mem_to_file() instead of write_str_to_file() for output, I suppose.

Shouldn't we scan the unparseable-desc files at startup, and count how big they are and when they were last modified? Otherwise we will still up the disk eventually if Tor stops and starts enough times.

Okay.

This approach will break with the linux seccomp2 sandbox, since we can't predict the filenames in advance. But IMO it'll be fine to just say "if the sandbox is on, don't store these to disk."

I'll add that check.

comment:23 Changed 3 years ago by nickm

Files we generate here can't be, since they had to fit in process memory to get written, but I guess you can make up some weird case where we also do the scanning for existing files thing, and the user switches between running a 32-bit and a 64-bit Tor on the same directory. I'll switch it to uint64_t, though, just to be safe.

Also there's the case where we get some number of objects that fit into address space one at a time, but when you add them up, the sum overflows.

What does this mean?

I have no idea. I'll let you know if I remember. Sometimes I start writing sentences and they just

wrt all other stuff, I agree with you. :) Thanks!

comment:24 Changed 3 years ago by andrea

On a closer look, I see no occurrences of if (foo) tor_free(foo) in this branch.

comment:25 Changed 3 years ago by andrea

Status: needs_revisionneeds_review

Updated version implements subdirectory and reconstructing FIFO from it on startup, together with unit tests for FIFO reconstruction code, in my bug18322_v3_squashed branch.

comment:26 Changed 3 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

M E R G E D !

Note: See TracTickets for help on using tickets.