Opened 9 months ago

Last modified 6 months ago

#33087 merge_ready defect

closing stdio fds on exit can interfere with LeakSanitizer, etc

Reported by: catalyst Owned by: teor
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version: Tor: 0.4.1.6
Severity: Normal Keywords: tor-log, 043-should, consider-backport-after-0433, 041-backport, 042-backport
Cc: teor Actual Points: 0.5
Parent ID: Points: 0.2
Reviewer: dgoulet Sponsor:

Description

At tor exit time, tor_log_close_sigsafe_err_fds() winds up closing stderr or stdout if they're being log destinations. This can close stderr before tools like LeakSanitizer can output useful information, e.g., if there's a config setting like Log notice stderr. (valgrind doesn't seem to have this problem, maybe because it runs a separate process?)

Is it correct for us to close a default fd like stderr or stdout if we're not running as a daemon? Other tools that do runtime interception might also be affected. If we think we have a good reason to explicitly close these fds, we probably should document how tor's closing of these fds can interfere with debugging tools.

nickm thinks this behavior might be new in 0.4.2 with the signal safety work. I haven't had time to investigate further.

Child Tickets

TicketStatusOwnerSummaryComponent
#33850newlog rotation for /var/log/tor/debug.log did not close handle to old file after compressionCore Tor/Tor
#33924closedTor beginning with 0.4.1 seems to ignore SIGNAL SHUTDOWN on iOSCore Tor/Tor

Change History (19)

comment:1 Changed 9 months ago by nickm

Cc: teor added

Adding teor to cc since they wrote the tor_log_close_sigsafe_err_fds() code.

comment:2 Changed 9 months ago by teor

Keywords: 043-should consider-backport-after-0433 041-backport 042-backport added
Milestone: Tor: unspecifiedTor: 0.4.3.x-final

I added tor_log_close_sigsafe_err_fds() in #31594, it was merged to 0.4.2, and later backported to 0.4.1. I've closed #31594, so we don't backport it any further.

My original motivation for the change was that some OSes seem to lose the last few lines of tor log output, when it crashes.

So I suggest we change tor_log_close_sigsafe_err_fds() to flush those file descriptors, rather than closing them. (And change its name as well.)

We'll need to backport those changes to 0.4.1 and later.

comment:3 Changed 9 months ago by nickm

I don't think there's such a concept in posix as flushing an fd. You can flush a FILE*, but we aren't using those. Perhaps we could mark some fds (stdout, stderr) as not to be closed by this function?

comment:4 in reply to:  3 Changed 9 months ago by catalyst

Replying to nickm:

I don't think there's such a concept in posix as flushing an fd. You can flush a FILE*, but we aren't using those. Perhaps we could mark some fds (stdout, stderr) as not to be closed by this function?

I think there's fsync() but that's largely to provide robustness against kernel or hardware failures? I thought usually the semantic was that synchronous writes queued in-kernel complete at process termination, though I'm having trouble finding anything definitive in POSIX about that. (POSIX does say that some asynchronous I/O operations may be canceled at process termination.)

comment:5 Changed 9 months ago by nickm

Ah, right, fsync(). It wouldn't be terrible to try that. ISTR that on some platforms (OSX?) fsync might do a bit more than it does elsewhere, but I could be wrong about that. It can't hurt to true it. (If we do, let's remember to list it in the sandbox.c table this time)

comment:6 Changed 9 months ago by teor

fsync() seems to be pretty standard on recent macOS:

     fsync() causes all modified data and attributes of fildes to be moved to
     a permanent storage device.  This normally results in all in-core modi-
     fied copies of buffers for the associated file to be written to a disk.

There are other macOS functions for a complete flush to disk, but they are specifically for disks and power failures:

     For applications that require tighter guarantees about the integrity of
     their data, Mac OS X provides the F_FULLFSYNC fcntl.  The F_FULLFSYNC
     fcntl asks the drive to flush all buffered data to permanent storage.
     Applications, such as databases, that require a strict ordering of writes
     should use F_FULLFSYNC to ensure that their data is written in the order
     they expect.  Please see fcntl(2) for more detail.

If fsync() is interrupt-safe, we should use it.

If it's not, maybe we should do nothing?
(The other related tickets to#31594 already write an extra newline to all of the error fds. So that should flush any line-oriented buffers.)

comment:7 Changed 8 months ago by teor

Status: newneeds_review
Version: Tor: 0.4.1.6

fsync() is required to be async signal safe by POSIX, see:
http://manpages.ubuntu.com/manpages/bionic/man7/signal-safety.7.html

See my PRs:

All the test branches are here:

comment:8 Changed 8 months ago by teor

Owner: set to teor
Status: needs_reviewassigned

comment:9 Changed 8 months ago by teor

Actual Points: 0.2
Points: 0.2
Status: assignedneeds_review

comment:10 Changed 8 months ago by teor

Actual Points: 0.20.3

fsync() doesn't exist on Windows, so I made the flush functions depend on HAVE_UNISTD_H.

comment:11 in reply to:  10 Changed 8 months ago by teor

Actual Points: 0.30.4

Replying to teor:

fsync() doesn't exist on Windows, so I made the flush functions depend on HAVE_UNISTD_H.

Windows doesn't have fsync(), but it does have unistd.h. So I just deleted the close functions for the backport branches, and cleaned up the log fd duplication on master.

Do you think it's worth checking for fsync() at configure time, and flushing the files if it exists? I'd only make this change in master.

comment:12 Changed 8 months ago by nickm

I think it couldn't hurt to use fsync() if present.

comment:13 in reply to:  12 Changed 8 months ago by nickm

Replying to nickm:

I think it couldn't hurt to use fsync() if present.

Rationale: The kind of problems that we might miss if we _don't_ use fsync would be very tricky to notice or debug otherwise.

comment:14 Changed 8 months ago by teor

Status: needs_reviewneeds_revision

In fact, they'd be almost exactly like the issues we created by closing stderr early :-)

Ok, I'll add fsync to configure, and use it when it's available. I think we should probably merge the fsync() fix to 0.4.3-alpha and later. fsync() is a very low risk function call.

comment:15 Changed 8 months ago by teor

Actual Points: 0.40.5
Status: needs_revisionneeds_review

See my PRs:

Delete close():

Replace close() with fsync():

All the test branches are here:

comment:16 Changed 8 months ago by dgoulet

Reviewer: dgoulet

comment:17 Changed 7 months ago by dgoulet

Status: needs_reviewmerge_ready

lgtm. Merged PR 1723 to master.

Leaving in 043 milestone for backport consideration.

comment:18 Changed 6 months ago by nickm

Milestone: Tor: 0.4.3.x-finalTor: 0.4.2.x-final

I've merged this to 0.4.3, and added a small tweak as c4da0a5094e21241db8ce0d8b12c2e4272fa49ef. (The tweak is to add fsync to the list of syscalls that our sandbox permits. Let's not duplicate the drama we had with the dup syscall.)

Marking for backport. No need to backport the tweak, since the version of this branch targeting 0.4.1 doesn't use fsync.

comment:19 Changed 6 months ago by arma

See also #33850, where they find that Tor (0.4.1.7 and 0.4.2.x) doesn't release its handles on log files during HUP, which means that in the deb package, logrotate (a) accumulates more and more file handles inside Tor, and (b) fills up your disk because nothing ever gets deleted. They narrowed it down to the fix in commit 8a23393eda which is part of this ticket. So: that would seem to be another reason to backport at the appropriate time.

Note: See TracTickets for help on using tickets.