Opened 7 weeks ago

Last modified 5 weeks ago

#31594 merge_ready defect

Close all the log fds before aborting

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: diagnostics, consider-backport-if-needed, consider-backport-after-042-stable, 042-should, android, macos, 040-backport-maybe, 041-backport-maybe, BugSmashFund
Cc: gaba Actual Points: 0.9
Parent ID: Points: 0.3
Reviewer: nickm Sponsor:

Description

We should close the sigsafe_log_fds before abort(), so that we are less likely to lose log lines in the fd buffers.

Here are the abort() users:

  • tor_abort_()
  • crash_handler()
  • format_number_sigsafe()
  • raw_assert()
  • raw_assert_unreached_msg()
  • trunnel_abort()

We could:

  • define a raw_abort() function that closes the fds
  • use it instead of abort()
  • #define trunnel_abort() as raw_abort()
  • update our C linter to require raw_abort() instead of abort()

Child Tickets

TicketStatusOwnerSummaryComponent
#31613closedAvoid a race condition where log and err both try to close the sigsafe fdsCore Tor/Tor
#31642closedShould trunnel_abort() be tor_abort_() or tor_raw_abort_() ?Core Tor/Tor

Change History (18)

comment:1 Changed 6 weeks ago by teor

Actual Points: 0.5
Reviewer: nickm
Status: newneeds_review

There were a lot of really tricky related bugs here. And I'm not sure if I got them right. So I made a separate child ticket for each one.

Here is a draft PR that I think fixes all those bugs:

Here's what I still need to do:

  • changes files

Here's what I don't know:

  • when we clear the list of error fds, should we use -1 or 0 as the placeholder value?
  • are any of these bugs serious? Do they need a backport?
  • should I split this PR up into multiple PRs?
    • the subsys changes are independent
    • the backtrace changes are independent
    • the torerr changes need to be merged before the log changes

comment:2 Changed 6 weeks ago by teor

Hmm, it looks like I forgot a header that Windows needs. I'll fix that when I fix everything else.

comment:3 in reply to:  2 Changed 6 weeks ago by teor

Replying to teor:

Hmm, it looks like I forgot a header that Windows needs. I'll fix that when I fix everything else.

fsync() doesn't do what we want here: I should delete most of the last commit, and just keep one of the comment changes.

comment:4 Changed 6 weeks ago by nickm

I'll review everything but the fsync commit, and try to answer your questions.

comment:5 Changed 6 weeks ago by nickm

Status: needs_reviewneeds_revision

I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.

Here are my current thoughts on your questions, but for all of these cases, I'll defer to your judgment.

when we clear the list of error fds, should we use -1 or 0 as the placeholder value?

If the n_sigsafe_log_fds value is zero, it should not matter what the empty entries contain.

That said, -1 is more commonly used in our code for "not a valid FD." (That said, we already use 0 here, and it might be better to leave that unchanged in this branch.)

are any of these bugs serious? Do they need a backport?

IMO they don't currently warrant a backport, but they might warrant a backport some day. They strike me as the kind of issue that we might change our mind about and really wish we had backported at some point in the future. On the other hand, they also strike me as subtle enough to warrant extensive testing before we think of a backport.

should I split this PR up into multiple PRs?

I don't think so, unless you want to. Maybe. (At first I thought that if we are considering a backport, we might want to backport only part of this branch. But on the other hand, if we backport only part of this branch, we risk backporting something unstable that has not had testing.)

comment:6 in reply to:  5 ; Changed 6 weeks ago by teor

Replying to nickm:

I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.

I didn't modify subsystem_list.c, I'll fix it when I revise the branch.
The subsystem levels vs subsystem_list.c order could be a unit test?
I'll see if I can make that happen.

Here are my current thoughts on your questions, but for all of these cases, I'll defer to your judgment.

when we clear the list of error fds, should we use -1 or 0 as the placeholder value?

If the n_sigsafe_log_fds value is zero, it should not matter what the empty entries contain.

That said, -1 is more commonly used in our code for "not a valid FD." (That said, we already use 0 here, and it might be better to leave that unchanged in this branch.)

I opened #31635 for follow up. I wonder if we should do it on this branch, so we don't end up with backport conflicts, if we decide to backport.

are any of these bugs serious? Do they need a backport?

IMO they don't currently warrant a backport, but they might warrant a backport some day. They strike me as the kind of issue that we might change our mind about and really wish we had backported at some point in the future. On the other hand, they also strike me as subtle enough to warrant extensive testing before we think of a backport.

I'll do them on 0.3.5, mark them as "test in 0.4.2-stable before backport", and mark them as a "maybe-not" backport.

should I split this PR up into multiple PRs?

I don't think so, unless you want to. Maybe. (At first I thought that if we are considering a backport, we might want to backport only part of this branch. But on the other hand, if we backport only part of this branch, we risk backporting something unstable that has not had testing.)

I think I want a clean_up_backtrace_handler() / subsystem / log split. These sets of changes are pretty independent, so backporting them independently should be ok.

comment:7 in reply to:  6 Changed 6 weeks ago by teor

Replying to teor:

Replying to nickm:

I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.

I didn't modify subsystem_list.c, I'll fix it when I revise the branch.
The subsystem levels vs subsystem_list.c order could be a unit test?
I'll see if I can make that happen.

They are already a test on tor startup:
https://trac.torproject.org/projects/tor/ticket/31634#comment:3

So our CI won't pass if we mess this order up. (Any check that launches tor should fail, including keys, zero-length files, rebind, chutney and stem.)

comment:8 Changed 6 weeks ago by teor

Keywords: BugSmash added

Fixing this bug also helps us smash other bugs.

comment:9 Changed 6 weeks ago by teor

Actual Points: 0.50.7

comment:10 Changed 6 weeks ago by teor

  • changes files
  • split up PR into clean_up_backtrace_handler() / subsystem / log split
  • list of changes in PR
  • list of changes in child tickets

comment:11 Changed 6 weeks ago by teor

Actual Points: 0.70.9
Keywords: consider-backport-if-needed consider-backport-after-042-stable 040-backport-maybe 041-backport-maybe added; 035-backport 040-backport 041-backport removed
Status: needs_revisionneeds_review

I split #31614 and #31615 into their own PRs, the code is pretty independent. And the backport versions are different.

I updated the PR with some fixups and a changes file:

I don't know if we need to do #31635.
If we do, we should do it on master, after all these other branches merge.

comment:12 Changed 6 weeks ago by teor

I also squashed the branch and did a backport to 0.4.0.
(The backport to 0.3.5 was too complex.)

Here is the PR to merge:

The merge forward was clean.

Here are the test branches for merging forwards:

I don't think we should backport this fix, unless the bug is actually causing issues in older versions.

comment:13 Changed 6 weeks ago by teor

Owner: set to teor
Status: needs_reviewassigned

comment:14 Changed 6 weeks ago by teor

Status: assignedneeds_review

comment:15 Changed 6 weeks ago by teor

Keywords: BugSmashFund added; BugSmash removed

Fix bug smash fund spelling

comment:16 Changed 6 weeks ago by nickm

Keywords: asn-merge added
Status: needs_reviewmerge_ready

This looks plausible to me. Let's try it in 0.4.2!

comment:17 Changed 6 weeks ago by teor

Parent ID: #31571

This ticket is independent of its parent,

comment:18 Changed 5 weeks ago by asn

Keywords: asn-merge removed
Milestone: Tor: 0.4.2.x-finalTor: 0.4.1.x-final

Merged to master! Moving to 041 for possible backports.

Note: See TracTickets for help on using tickets.