Close all the log fds before aborting

changed milestone to %Tor: 0.4.1.x-final

Trac:
Parent Ticket: #31614 (moved)
Child Ticket(s): #31642 (moved), #31613 (moved)

added 041-backport 042-should BugSmashFund actualpoints::0.9 android component::core tor/tor consider-backport-after-042-stable diagnostics macos milestone::Tor: 0.4.1.x-final owner::teor parent::31614 points::0.3 priority::medium resolution::fixed reviewer::nickm severity::normal status::closed type::defect labels

There were a lot of really tricky related bugs here. And I'm not sure if I got them right. So I made a separate child ticket for each one.

Here is a draft PR that I think fixes all those bugs:

master: https://github.com/torproject/tor/pull/1289

Here's what I still need to do:

changes files

Here's what I don't know:

when we clear the list of error fds, should we use -1 or 0 as the placeholder value?
are any of these bugs serious? Do they need a backport?
should I split this PR up into multiple PRs?
- the subsys changes are independent
- the backtrace changes are independent
- the torerr changes need to be merged before the log changes

Trac:
Actualpoints: N/A to 0.5
Status: new to needs_review
Reviewer: N/A to nickm

Hmm, it looks like I forgot a header that Windows needs. I'll fix that when I fix everything else.

Replying to teor:

Hmm, it looks like I forgot a header that Windows needs. I'll fix that when I fix everything else.

fsync() doesn't do what we want here: I should delete most of the last commit, and just keep one of the comment changes.

I'll review everything but the fsync commit, and try to answer your questions.

I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.

Here are my current thoughts on your questions, but for all of these cases, I'll defer to your judgment.

when we clear the list of error fds, should we use -1 or 0 as the placeholder value?

If the n_sigsafe_log_fds value is zero, it should not matter what the empty entries contain.

That said, -1 is more commonly used in our code for "not a valid FD." (That said, we already use 0 here, and it might be better to leave that unchanged in this branch.)

are any of these bugs serious? Do they need a backport?

IMO they don't currently warrant a backport, but they might warrant a backport some day. They strike me as the kind of issue that we might change our mind about and really wish we had backported at some point in the future. On the other hand, they also strike me as subtle enough to warrant extensive testing before we think of a backport.

should I split this PR up into multiple PRs?

I don't think so, unless you want to. Maybe. (At first I thought that if we are considering a backport, we might want to backport only part of this branch. But on the other hand, if we backport only part of this branch, we risk backporting something unstable that has not had testing.)

Trac:
Status: needs_review to needs_revision

Replying to nickm:

I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.

I didn't modify subsystem_list.c, I'll fix it when I revise the branch. The subsystem levels vs subsystem_list.c order could be a unit test? I'll see if I can make that happen.

Here are my current thoughts on your questions, but for all of these cases, I'll defer to your judgment.

when we clear the list of error fds, should we use -1 or 0 as the placeholder value?

If the n_sigsafe_log_fds value is zero, it should not matter what the empty entries contain.

That said, -1 is more commonly used in our code for "not a valid FD." (That said, we already use 0 here, and it might be better to leave that unchanged in this branch.)

I opened #31635 (moved) for follow up. I wonder if we should do it on this branch, so we don't end up with backport conflicts, if we decide to backport.

are any of these bugs serious? Do they need a backport?

IMO they don't currently warrant a backport, but they might warrant a backport some day. They strike me as the kind of issue that we might change our mind about and really wish we had backported at some point in the future. On the other hand, they also strike me as subtle enough to warrant extensive testing before we think of a backport.

I'll do them on 0.3.5, mark them as "test in 0.4.2-stable before backport", and mark them as a "maybe-not" backport.

should I split this PR up into multiple PRs?

I don't think so, unless you want to. Maybe. (At first I thought that if we are considering a backport, we might want to backport only part of this branch. But on the other hand, if we backport only part of this branch, we risk backporting something unstable that has not had testing.)

I think I want a clean_up_backtrace_handler() / subsystem / log split. These sets of changes are pretty independent, so backporting them independently should be ok.

Replying to teor:

Replying to nickm:

I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.

I didn't modify subsystem_list.c, I'll fix it when I revise the branch. The subsystem levels vs subsystem_list.c order could be a unit test? I'll see if I can make that happen.

They are already a test on tor startup: https://trac.torproject.org/projects/tor/ticket/31634#comment:3

So our CI won't pass if we mess this order up. (Any check that launches tor should fail, including keys, zero-length files, rebind, chutney and stem.)

Fixing this bug also helps us smash other bugs.

Trac:
Keywords: N/A deleted, BugSmash added

Trac:
Actualpoints: 0.5 to 0.7

changes files
split up PR into clean_up_backtrace_handler() / subsystem / log split
list of changes in PR
list of changes in child tickets

I split #31614 (moved) and #31615 (moved) into their own PRs, the code is pretty independent. And the backport versions are different.

I updated the PR with some fixups and a changes file:

fixups on master: https://github.com/torproject/tor/pull/1289

I don't know if we need to do #31635 (moved). If we do, we should do it on master, after all these other branches merge.

Trac:
Keywords: 041-backport, 040-backport, 035-backport deleted, 040-backport-maybe, 041-backport-maybe, consider-backport-after-042-stable, consider-backport-if-needed added
Actualpoints: 0.7 to 0.9
Status: needs_revision to needs_review

I also squashed the branch and did a backport to 0.4.0. (The backport to 0.3.5 was too complex.)

Here is the PR to merge:

0.4.0: https://github.com/torproject/tor/pull/1303

The merge forward was clean.

Here are the test branches for merging forwards:

https://github.com/teor2345/tor/branches/all?query=bug31594_

I don't think we should backport this fix, unless the bug is actually causing issues in older versions.

Trac:
Status: needs_review to assigned
Owner: N/A to teor

Trac:
Status: assigned to needs_review

Fix bug smash fund spelling

Trac:
Keywords: BugSmash deleted, BugSmashFund added

This looks plausible to me. Let's try it in 0.4.2!

Trac:
Keywords: N/A deleted, asn-merge added
Status: needs_review to merge_ready

This ticket is independent of its parent,

Trac:
Parent: #31571 (moved) to N/A

Merged to master! Moving to 041 for possible backports.

Trac:
Keywords: asn-merge deleted, N/A added
Milestone: Tor: 0.4.2.x-final to Tor: 0.4.1.x-final

Merged to 0.4.1; marking for further possible backport.

Trac:
Milestone: Tor: 0.4.1.x-final to Tor: 0.4.0.x-final

I'm leaning towards "no backport" on this ticket, unless we discover a specific bug. Leaving open, so we check again after 042-stable.

This change doesn't seem to make much of a difference, still thinking "no backport".

Trac:
Parent: N/A to #31614 (moved)

This change caused issues for debugging using LeakSanitizer and AddressSanitizer in some contexts, so we should not backport it any further. See #33087 (moved) for more details.

Trac:
Milestone: Tor: 0.4.0.x-final to Tor: 0.4.1.x-final
Status: merge_ready to closed
Keywords: 040-backport-maybe, 041-backport-maybe, consider-backport-if-needed deleted, 041-backport added
Resolution: N/A to fixed

closed

changed time estimate to 2h 24m

added 7h 12m of time spent

mentioned in issue #31613 (moved)

mentioned in issue #31614 (moved)

mentioned in issue #31615 (moved)

mentioned in issue #31635 (moved)

mentioned in issue #31642 (moved)

mentioned in issue #31734 (moved)

mentioned in issue #32835 (moved)

mentioned in issue #33087 (moved)

mentioned in issue #33850 (moved)

moved to tpo/core/tor#31594 (closed)

mentioned in issue tpo/core/tor#31613 (closed)

mentioned in issue tpo/core/tor#31614 (closed)

Close all the log fds before aborting

Child items ...

Activity