Opened 7 years ago

Closed 6 years ago

Last modified 5 years ago

#2046 closed task (fixed)

Port Tor code for starting a background process to Windows

Reported by: sjmurdoch Owned by: sjmurdoch
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version:
Severity: Keywords:
Cc: nickm Actual Points:
Parent ID: #1983 Points:
Reviewer: Sponsor:

Description

Once tor-fw-helper works on Windows, we need to be able to launch it and read its output. This works on Linux/BSD/MacOS X, but Windows does process control very differently.

Child Tickets

Change History (19)

comment:1 Changed 7 years ago by nickm

I would suggest reading (though not copying, of course) the Python "subprocess" module for a survey of one way to do this.

comment:2 Changed 7 years ago by nickm

Component: - Select a componentTor Client

comment:3 Changed 6 years ago by nickm

Milestone: Deliverable-Mar2011Tor: 0.2.3.x-final
Priority: normalmajor

comment:4 Changed 6 years ago by sjmurdoch

Cc: nickm added
Status: newneeds_review

My branch bug2046 in git://git.torproject.org/sjm217/tor.git (1da5081) should now be ready for review. I haven't rebased it yet because I was assuming you will just compare it to origin/master, but let me know if this is not the case.

comment:5 Changed 6 years ago by ioerror

I've read over the diff here:
https://gist.github.com/1170726

It seems sanely written and defensively coded. I wonder about setting process security attributes and primary thread security attributes to NULL in the child process. Is that simply because the parent already has everything we need and it is inherited?

Overall, I think I'd like to test and compile it but I don't currently have a win32 dev machine setup.

comment:6 Changed 6 years ago by nickm

From IRC:

20:19 < nickm> sjmurdoch: Initial request, pending more detailed review: Can we 
               split the C string processing stuff out of functions like 
               log_from_handle, so that we can do a unit test for it?
20:21 < nickm> sjmurdoch: Also, elsewhere thoughout the code, we return structs 
               on the heap, not by value. Any reason to make an exception for 
               tor_spawn_background?
20:21 < sjmurdoch> nickm: Yes, that sounds like a good idea
20:21 < nickm> (If there's other string processing goop in the spawn code, same 
               logic applies)
20:21 < sjmurdoch> nickm: Not particularly
20:22 < sjmurdoch> nickm: You mean pass in a pointer which the function will 
                   modify?
20:22 < sjmurdoch> or the function allocates the struct and returns a pointer?
20:22 < nickm> Either one would be fine IMO.

20:22 < sjmurdoch> nickm: OK
20:24 < nickm> If the caller will generally just have it on the stack, passing 
               a pointer is fine.
20:24 < nickm> If the caller will generally want to have the struct stick 
               around for a while, having the function allocate and return it 
               is better.  You'd probably want to add a process_handle_t 
               function somewhere as well.
20:25 < nickm> sjmurdoch: (Also, is it okay if I copy-and-paste all of this 
               onto the 2046 ticket when we're done?)
20:25 < nickm> err, a process_handle_free function
20:26 < sjmurdoch> Currently the process_handle_t instance for the helper is a 
                   static variable in tor_check_port_forwarding
20:26 < sjmurdoch> So I could pass in a pointer from tor_check_port_forwarding 
                   to tor_spawn_background?
20:27 < nickm> fine by me
20:27 < sjmurdoch> nickm: And yes, feel free to paste into 2046
20:28 < nickm> ok
20:29 < nickm> so, what happens if some argv entry contains a space?
20:29 < sjmurdoch> You're looking at the join code?
20:30 < nickm> (Yeah.) That'll happen for sure if we ever need to run a program 
               that lives under C:\Program Files\ .
20:30 < sjmurdoch> For Windows, it would have to be put in quotes
20:30 < nickm> sjmurdoch: I think we should have a separate, unit-tested 
               function, that formats a command line properly for windows.
20:31 < sjmurdoch> OK, I can refactor that out
20:31 < nickm> great
20:32 < nickm> also it should handle internal quotes; there have been way too 
               many shell-based snafus in the history of programming for us to 
               be blase about anything other than stricly correct command lines.
20:32 < sjmurdoch> I don't fully understand how command lines are parsed on 
                   Windows. I do know that it is the process which is called 
                   that does it, but is it their libc?

20:35 < nickm> I'm afraid I don't fully know.  But look at Python's 
               subprocess.list2cmdline and at 
               http://msdn.microsoft.com/en-us/library/ms880421
20:35 < nickm> I think it's actually done in the equivalent of crt.o
20:36 < sjmurdoch> Thanks, I'll take a look
20:41 < nickm> For tor_get_exit_code: Let's return a define or an enum rather 
               than a magic tristate.
20:42 < sjmurdoch> Sounds sensible
20:42 < nickm> nitpick: Should "log_from_handle" be called "log_to_handle" ?
20:43 < sjmurdoch> I think log_from_handle because it reads from a handle and 
                   logs the output
20:43 < nickm> Ah!
20:44 < nickm> makes sense
20:44 < nickm> In tor_read_all_from_handle, I'd be more comfortable if it did 
               an tor_assert(byte_count + numread <= count);

20:47 < nickm> sjmurdoch: In tor_log_from_handle: Do we care about strings with 
               embedded NULs?
20:48 < sjmurdoch> nickm: Hmm, possibly
20:48 < sjmurdoch> nickm: tor-fw-helper won't generate them but other processes 
                   might
20:49 < nickm> probably something to note; not necessarily something to fix 
               right now.
20:49 < sjmurdoch> OK
20:49 < sjmurdoch> Related to this, I've assumed I'm working with byte strings 
                   everywhere. Is Windows Unicode support going to bite me?
20:50 < sjmurdoch> (another bit I don't understand well).
20:50 < nickm> Not sure.  I think we're probably okay here.
20:51 < nickm> Hm.  log_from_handle will do badly if the subprocess has mroe 
               than 255 bytes to say.
20:52 < sjmurdoch> I think it will output partial lines in that case
20:52 < nickm> yeah.
20:52 < nickm> It might do bogus newlines if the \r\n is split
20:53 < sjmurdoch> Hmm, yes 
20:54 < sjmurdoch> I'm not sure what to do here. Code to handle partial lines 
                   will be tricky. On *nix I got away with using fgets but 
                   fgets on Windows can't ready from a handle
20:55 < nickm> I'd say for now, document it as a known limitation
20:55 < nickm> For later, you'd need to leave any partial line in a buffer 
               associated with the handle
20:55 < nickm> and make the whole thing more stateful
20:56 < sjmurdoch> OK
20:57 < sjmurdoch> nickm: There is already a ticket related to this: 
                   https://trac.torproject.org/projects/tor/ticket/2045
20:57 < nickm> sjmurdoch: .... and with that, I'm done with the initial review, 
               I think.

comment:7 in reply to:  6 ; Changed 6 years ago by sjmurdoch

Status: needs_reviewaccepted

For my benefit, and perhaps others, here's the IRC log above as a TODO list:

Fix:

Split the C string processing stuff out of functions like log_from_handle, so
that we can do a unit test for it. (If there's other string processing goop in
the spawn code, same logic applies).

Pass pointer to process_handle_t into tor_spawn_background, rather than it
return a process_handle_t.

Have a separate, unit-tested function, that formats a command line properly for
windows. It should handle filenames with a space (e.g. C:\Program Files\);
also it should handle internal quotes. Look at Python's subprocess.list2cmdline
and at http://msdn.microsoft.com/en-us/library/ms880421

For tor_get_exit_code: Let's return a define or an enum rather than a magic
tristate.

In tor_read_all_from_handle, I'd be more comfortable if it did an
tor_assert(byte_count + numread <= count);

Document and possibly fix:

In tor_log_from_handle: handle strings with embedded NULs; handle strings > 255 bytes (without splitting lines or outputting bogus newlines). See bug #2045.

comment:8 in reply to:  7 Changed 6 years ago by sjmurdoch

Status: acceptedneeds_review

OK, I believe I have addressed all of these issues, and the code is ready for review. FYI, the code you previously reviewed was 1da5081. Changes since then are:

Fix:

Split the C string processing stuff out of functions like log_from_handle, so
that we can do a unit test for it. (If there's other string processing goop in
the spawn code, same logic applies).

https://gitweb.torproject.org/sjm217/tor.git/commit/da34360952c0fbbd8effc2789ed72b86c8045531

Pass pointer to process_handle_t into tor_spawn_background, rather than it
return a process_handle_t.

https://gitweb.torproject.org/sjm217/tor.git/commitdiff/3f0a197aad3cca6634e4eb63e8441e5507a6b77f

Have a separate, unit-tested function, that formats a command line properly for
windows. It should handle filenames with a space (e.g. C:\Program Files\);
also it should handle internal quotes. Look at Python's subprocess.list2cmdline
and at http://msdn.microsoft.com/en-us/library/ms880421

https://gitweb.torproject.org/sjm217/tor.git/commit/bc97f410802d5b9c66bfba6aebeae1ecd70f8857

For tor_get_exit_code: Let's return a define or an enum rather than a magic
tristate.

https://gitweb.torproject.org/sjm217/tor.git/commit/f1ff65dfad800ed89e5b01c1c5b4b77c10a438b8

In tor_read_all_from_handle, I'd be more comfortable if it did an
tor_assert(byte_count + numread <= count);

https://gitweb.torproject.org/sjm217/tor.git/commit/93792b5aa6806646674795504419f3e97862685c

Document and possibly fix:

In tor_log_from_handle: handle strings with embedded NULs; handle strings > 255 bytes (without splitting lines or outputting bogus newlines). See bug #2045.

https://gitweb.torproject.org/sjm217/tor.git/commit/d1dd9991cd636bafe7543aea4dbb18de69f26202

comment:9 Changed 6 years ago by nickm

Looks good; merging!

Minor issue we should fix:

  • I think the need_quotes check in format_commandline_argument might be inadequate: Shouldn't it also check for \ and " ?

Thing we could fix later:

  • In format_commandline_argument(), it would be neat to skip the business of building a smartlist of char*, and just pre-allocate a guaranteed-to-be-long-enough string -- I think that strlen(arg)*2 + 3 would be long enough, since each character in the input turns into at most 2 characters in the output, and there are at most 3 characters of overhead: 1 for the nul-terminator, 2 for quotes.

Let me know if you'd like me to open tickets for those.

Stuff I tweaked:

  • I think the tor_join_cmdline() function and format_commandline_argument() functions ought to have "win" somewhere in their names. (Fixed this.)
  • In documentation for tor_split_lines, we should refer to a NUL character. NULL is a pointer. (Pedantically, NUL is a null character, and null-terminated strings are also NUL-terminated, but never NULL-terminated.) (Fixed this.)

comment:10 in reply to:  9 Changed 6 years ago by sjmurdoch

Replying to nickm:

Looks good; merging!

Thanks!

Minor issue we should fix:

  • I think the need_quotes check in format_commandline_argument might be inadequate: Shouldn't it also check for \ and " ?

I believe not, because it is acceptable to escape \ and " outside of a quoted argument (as per http://msdn.microsoft.com/en-us/library/ms880421 and list2cmdline in http://hg.python.org/cpython/file/3102951cc1ce/Lib/subprocess.py#l530).

e.g.

C:\w\tor>src\tools\tor-fw-helper\tor-fw-helper.exe -T "foo bar" foo\bar foo\"bar
 foo\\\"bar
ARG: 0: src\tools\tor-fw-helper\tor-fw-helper.exe
ARG: 1: -T
ARG: 2: foo bar
ARG: 3: foo\bar
ARG: 4: foo"bar
ARG: 5: foo\"bar

Thing we could fix later:

  • In format_commandline_argument(), it would be neat to skip the business of building a smartlist of char*, and just pre-allocate a guaranteed-to-be-long-enough string -- I think that strlen(arg)*2 + 3 would be long enough, since each character in the input turns into at most 2 characters in the output, and there are at most 3 characters of overhead: 1 for the nul-terminator, 2 for quotes.

Yes, I was considering that. An alternative would be to have a smartlist of ranges within the string, since most of the time output strings will consist of contiguous areas of input string. That would however lead to rather tricky code, so I think your approach sounds good.

Let me know if you'd like me to open tickets for those.

Yes please.

Stuff I tweaked:

  • I think the tor_join_cmdline() function and format_commandline_argument() functions ought to have "win" somewhere in their names. (Fixed this.)
  • In documentation for tor_split_lines, we should refer to a NUL character. NULL is a pointer. (Pedantically, NUL is a null character, and null-terminated strings are also NUL-terminated, but never NULL-terminated.) (Fixed this.)

Thanks.

comment:11 Changed 6 years ago by nickm

Okay, added the issue as #3876. One more thing -- this needs a changes file.

comment:12 Changed 6 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Added an entry to the draft changelog for #2046; closing this ticket at long last. Thanks for the hard work, Steven!

comment:13 Changed 6 years ago by sjmurdoch

Resolution: implemented
Status: closedreopened

Sorry, there was bug in the test suite which triggered a double-free on some platforms, including the buildbot. This should be fixed in cfa9ee5fe708539965f46a31b5d2abe4950179af of git://git.torproject.org/sjm217/tor.git (branch bug2046).

comment:14 Changed 6 years ago by sjmurdoch

Status: reopenedneeds_review

comment:15 Changed 6 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Looks ok by me. Merging.

comment:16 Changed 6 years ago by sjmurdoch

Resolution: fixed
Status: closedreopened

Gah, buildbot is unhappy again because the test made some assumptions about how the EOF flag would be delivered. For some reason it works differently between Ubuntu and MacOS, compared to the Debian machine. I've now fixed the test case in f5df96c94f0c40e4bf55e4b4b843df85e9d90a71 of git://git.torproject.org/sjm217/tor.git (branch bug2046).

comment:17 Changed 6 years ago by sjmurdoch

Status: reopenedneeds_review

comment:18 Changed 6 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

looks ok; merging.

comment:19 Changed 5 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.