Opened 7 months ago

Closed 7 months ago

Last modified 4 months ago

#21654 closed defect (fixed)

Don't use fgets() when we might get EAGAIN

Reported by: nickm Owned by: ahf
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: TorCoreTeam201703
Cc: ahf Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Our tor_fgets() code, added in #20988, tries to make fgets() behave as we expect with nonblocking sockets. But really, fgets() is not such a great choice for nonblocking sockets in the first place! Let's use direct fd-level IO for those instead.

Child Tickets

Change History (14)

comment:1 Changed 7 months ago by nickm

Cc: ahf added

comment:2 Changed 7 months ago by ahf

Owner: set to ahf
Status: newassigned

comment:3 Changed 7 months ago by ahf

Status: assignedneeds_review

I've made a patch on branch 'bugs/21654' in my Gitlab Tor repository:

https://gitlab.com/ahf/tor/commits/bugs/21654

This patch should hopefully make the race condition go away for good.

I've tested it on Linux (glibc), FreeBSD, HardenedBSD, OS X, and OpenBSD. Thanks to rubiate in #tor-dev for providing me access to an OpenBSD host.

comment:4 Changed 7 months ago by nickm

Hm. I wonder if we should propagate this switch from FILE to fileno even higher in the read_all_handle code. That is, on anything where we call tor_read_all_handle, we shouldn't be using fgets() at all: It's not safe to mix buffered IO (like FILE) does and non-buffered IO.

Alternatively, for an easier alternative, we can try turning off buffering on these particular FILEs. I forget how to do that. setvbuf?

comment:5 Changed 7 months ago by nickm

Status: needs_reviewneeds_revision

comment:6 Changed 7 months ago by ahf

Status: needs_revisionneeds_review

I've updated https://gitlab.com/ahf/tor/commits/bugs/21654 to add an additional patch that refactors out our usage of FILE* in the Unix paths in the utility functions around process_handle_t.

I've also opened #21678 which is a refactoring patch which is related to this issue, but I do not have access to a Windows machine to test it on currently.

This additional patch passes make check on OS X, Linux (glibc), OpenBSD, FreeBSD, and HardenedBSD.

comment:7 Changed 7 months ago by nickm

Status: needs_reviewneeds_revision

DANGER, DANGER!

read() does not ensure that its output is NUL-terminated: you can't do strlen() on something you just got from read()! get_string_from_pipe needs to change.

comment:8 Changed 7 months ago by ahf

Status: needs_revisionneeds_review

Yikes, good catch. Should be fixed in the branch now.

Additionally I've added a test case for get_string_from_pipe() which passes on FreeBSD, Linux, OS X and HardenedBSD. Did not test on OpenBSD.

comment:9 Changed 7 months ago by nickm

Status: needs_reviewneeds_revision

Closer and closer!

  • The string that goes into 'buf' in get_string_from_pipe() is not actually nul-terminated. The current code only sets a nul there when the last character read is a newline.
  • It doesn't make sense to me to return EAGAIN when the buffer starts with a NUL. If we read a NUL, then we read a NUL -- tat's all it means.
  • I'd suggest that we change tor_read_all_handle to do "while numread < count", so that if we have some bug that makes it go over count, we don't keep reading forever.
  • This will need a changes file.
  • This pipe() test needs to be made non-Windows-only: Windows doesn't have a pipe().

comment:10 Changed 7 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.1.x-final

comment:11 Changed 7 months ago by ahf

Status: needs_revisionneeds_review

I've updated my branch again.

  1. Fixed by setting buf[ret] = '\0' in the else-branch in get_string_from_pipe().
  2. Removed the code that did that - was an artifact from the fgets(3) implementation.
  3. This is done in a separate commit for the 3 places in the utility module that uses while (numread != count) instead of while (numread < count): https://gitlab.com/ahf/tor/commit/02ef06516e64c1559b24123d7c7d164b76110c9a
  4. Added changes file.
  5. Protected the implementation within an #ifndef _WIN32/#endif block and marked the test with UTIL_TEST_NO_WIN().

Additionally I've added a patch that removes the tor_fgets() wrapper since this is no longer needed. The only places left in the code that calls fgets(3) is in dirserv and the geoip module - both places where it should be safe to use fgets(3) directly. The test-case for the very specific EAGAIN checks in fgets(3) have been removed in the same commit.

comment:12 Changed 7 months ago by nickm

Keywords: TorCoreTeam201703 added
Status: needs_reviewassigned

Thanks; merged! It's good to see this get resolved

comment:13 Changed 7 months ago by nickm

Resolution: fixed
Status: assignedclosed

comment:14 Changed 4 months ago by ahf

A test was added for the EAGAIN behaviour with fgets() on non-blocking I/O in commit 8ee559bd on Fri Oct 8 21:31:15 2010.

It seems the behaviour was mentioned during the code review by Nick in bug #1903 and also mentioned in #2045, so this have been an issue on and off since at least tor-0.2.3.1-alpha.

Note: See TracTickets for help on using tickets.