I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.
Here are my current thoughts on your questions, but for all of these cases, I'll defer to your judgment.
when we clear the list of error fds, should we use -1 or 0 as the placeholder value?
If the n_sigsafe_log_fds value is zero, it should not matter what the empty entries contain.
That said, -1 is more commonly used in our code for "not a valid FD." (That said, we already use 0 here, and it might be better to leave that unchanged in this branch.)
are any of these bugs serious? Do they need a backport?
IMO they don't currently warrant a backport, but they might warrant a backport some day. They strike me as the kind of issue that we might change our mind about and really wish we had backported at some point in the future. On the other hand, they also strike me as subtle enough to warrant extensive testing before we think of a backport.
should I split this PR up into multiple PRs?
I don't think so, unless you want to. Maybe. (At first I thought that if we are considering a backport, we might want to backport only part of this branch. But on the other hand, if we backport only part of this branch, we risk backporting something unstable that has not had testing.)
I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.
I didn't modify subsystem_list.c, I'll fix it when I revise the branch.
The subsystem levels vs subsystem_list.c order could be a unit test?
I'll see if I can make that happen.
Here are my current thoughts on your questions, but for all of these cases, I'll defer to your judgment.
when we clear the list of error fds, should we use -1 or 0 as the placeholder value?
If the n_sigsafe_log_fds value is zero, it should not matter what the empty entries contain.
That said, -1 is more commonly used in our code for "not a valid FD." (That said, we already use 0 here, and it might be better to leave that unchanged in this branch.)
I opened #31635 (moved) for follow up. I wonder if we should do it on this branch, so we don't end up with backport conflicts, if we decide to backport.
are any of these bugs serious? Do they need a backport?
IMO they don't currently warrant a backport, but they might warrant a backport some day. They strike me as the kind of issue that we might change our mind about and really wish we had backported at some point in the future. On the other hand, they also strike me as subtle enough to warrant extensive testing before we think of a backport.
I'll do them on 0.3.5, mark them as "test in 0.4.2-stable before backport", and mark them as a "maybe-not" backport.
should I split this PR up into multiple PRs?
I don't think so, unless you want to. Maybe. (At first I thought that if we are considering a backport, we might want to backport only part of this branch. But on the other hand, if we backport only part of this branch, we risk backporting something unstable that has not had testing.)
I think I want a clean_up_backtrace_handler() / subsystem / log split. These sets of changes are pretty independent, so backporting them independently should be ok.
I've left a couple of comments on the review. I've not reviewed the fsync commit, and I haven't checked that the new list of levels on the subsystems matches their dependency order or their order in subsystem_list.c.
I didn't modify subsystem_list.c, I'll fix it when I revise the branch.
The subsystem levels vs subsystem_list.c order could be a unit test?
I'll see if I can make that happen.
This change caused issues for debugging using LeakSanitizer and AddressSanitizer in some contexts, so we should not backport it any further. See #33087 (moved) for more details.