Opened 10 years ago

Closed 9 years ago

Last modified 8 years ago

# Port Tor code for starting a background process to Windows

Reported by: Owned by: sjmurdoch sjmurdoch High Tor: 0.2.3.x-final Core Tor/Tor nickm #1983

### Description

Once tor-fw-helper works on Windows, we need to be able to launch it and read its output. This works on Linux/BSD/MacOS X, but Windows does process control very differently.

## Child Tickets

### comment:1 Changed 10 years ago by nickm

I would suggest reading (though not copying, of course) the Python "subprocess" module for a survey of one way to do this.

### comment:2 Changed 10 years ago by nickm

Component: - Select a component → Tor Client

### comment:3 Changed 10 years ago by nickm

Milestone: Deliverable-Mar2011 → Tor: 0.2.3.x-final normal → major

### comment:4 Changed 9 years ago by sjmurdoch

Cc: nickm added new → needs_review

My branch bug2046 in git://git.torproject.org/sjm217/tor.git (1da5081) should now be ready for review. I haven't rebased it yet because I was assuming you will just compare it to origin/master, but let me know if this is not the case.

### comment:5 Changed 9 years ago by ioerror

I've read over the diff here:
https://gist.github.com/1170726

It seems sanely written and defensively coded. I wonder about setting process security attributes and primary thread security attributes to NULL in the child process. Is that simply because the parent already has everything we need and it is inherited?

Overall, I think I'd like to test and compile it but I don't currently have a win32 dev machine setup.

### comment:6 follow-up:  7 Changed 9 years ago by nickm

From IRC:

20:19 < nickm> sjmurdoch: Initial request, pending more detailed review: Can we
split the C string processing stuff out of functions like
log_from_handle, so that we can do a unit test for it?
20:21 < nickm> sjmurdoch: Also, elsewhere thoughout the code, we return structs
on the heap, not by value. Any reason to make an exception for
tor_spawn_background?
20:21 < sjmurdoch> nickm: Yes, that sounds like a good idea
20:21 < nickm> (If there's other string processing goop in the spawn code, same
logic applies)
20:21 < sjmurdoch> nickm: Not particularly
20:22 < sjmurdoch> nickm: You mean pass in a pointer which the function will
modify?
20:22 < sjmurdoch> or the function allocates the struct and returns a pointer?
20:22 < nickm> Either one would be fine IMO.

20:22 < sjmurdoch> nickm: OK
20:24 < nickm> If the caller will generally just have it on the stack, passing
a pointer is fine.
20:24 < nickm> If the caller will generally want to have the struct stick
around for a while, having the function allocate and return it
is better.  You'd probably want to add a process_handle_t
function somewhere as well.
20:25 < nickm> sjmurdoch: (Also, is it okay if I copy-and-paste all of this
onto the 2046 ticket when we're done?)
20:25 < nickm> err, a process_handle_free function
20:26 < sjmurdoch> Currently the process_handle_t instance for the helper is a
static variable in tor_check_port_forwarding
20:26 < sjmurdoch> So I could pass in a pointer from tor_check_port_forwarding
to tor_spawn_background?
20:27 < nickm> fine by me
20:27 < sjmurdoch> nickm: And yes, feel free to paste into 2046
20:28 < nickm> ok
20:29 < nickm> so, what happens if some argv entry contains a space?
20:29 < sjmurdoch> You're looking at the join code?
20:30 < nickm> (Yeah.) That'll happen for sure if we ever need to run a program
that lives under C:\Program Files\ .
20:30 < sjmurdoch> For Windows, it would have to be put in quotes
20:30 < nickm> sjmurdoch: I think we should have a separate, unit-tested
function, that formats a command line properly for windows.
20:31 < sjmurdoch> OK, I can refactor that out
20:31 < nickm> great
20:32 < nickm> also it should handle internal quotes; there have been way too
many shell-based snafus in the history of programming for us to
be blase about anything other than stricly correct command lines.
20:32 < sjmurdoch> I don't fully understand how command lines are parsed on
Windows. I do know that it is the process which is called
that does it, but is it their libc?

20:35 < nickm> I'm afraid I don't fully know.  But look at Python's
subprocess.list2cmdline and at
http://msdn.microsoft.com/en-us/library/ms880421
20:35 < nickm> I think it's actually done in the equivalent of crt.o
20:36 < sjmurdoch> Thanks, I'll take a look
20:41 < nickm> For tor_get_exit_code: Let's return a define or an enum rather
than a magic tristate.
20:42 < sjmurdoch> Sounds sensible
20:42 < nickm> nitpick: Should "log_from_handle" be called "log_to_handle" ?
20:43 < sjmurdoch> I think log_from_handle because it reads from a handle and
logs the output
20:43 < nickm> Ah!
20:44 < nickm> makes sense
20:44 < nickm> In tor_read_all_from_handle, I'd be more comfortable if it did
an tor_assert(byte_count + numread <= count);

20:47 < nickm> sjmurdoch: In tor_log_from_handle: Do we care about strings with
embedded NULs?
20:48 < sjmurdoch> nickm: Hmm, possibly
20:48 < sjmurdoch> nickm: tor-fw-helper won't generate them but other processes
might
20:49 < nickm> probably something to note; not necessarily something to fix
right now.
20:49 < sjmurdoch> OK
20:49 < sjmurdoch> Related to this, I've assumed I'm working with byte strings
everywhere. Is Windows Unicode support going to bite me?
20:50 < sjmurdoch> (another bit I don't understand well).
20:50 < nickm> Not sure.  I think we're probably okay here.
20:51 < nickm> Hm.  log_from_handle will do badly if the subprocess has mroe
than 255 bytes to say.
20:52 < sjmurdoch> I think it will output partial lines in that case
20:52 < nickm> yeah.
20:52 < nickm> It might do bogus newlines if the \r\n is split
20:53 < sjmurdoch> Hmm, yes
20:54 < sjmurdoch> I'm not sure what to do here. Code to handle partial lines
will be tricky. On *nix I got away with using fgets but
fgets on Windows can't ready from a handle
20:55 < nickm> I'd say for now, document it as a known limitation
20:55 < nickm> For later, you'd need to leave any partial line in a buffer
associated with the handle
20:55 < nickm> and make the whole thing more stateful
20:56 < sjmurdoch> OK
20:57 < sjmurdoch> nickm: There is already a ticket related to this:
https://trac.torproject.org/projects/tor/ticket/2045
20:57 < nickm> sjmurdoch: .... and with that, I'm done with the initial review,
I think.


### comment:7 in reply to:  6 ; follow-up:  8 Changed 9 years ago by sjmurdoch

Status: needs_review → accepted

For my benefit, and perhaps others, here's the IRC log above as a TODO list:

Fix:

Split the C string processing stuff out of functions like log_from_handle, so
that we can do a unit test for it. (If there's other string processing goop in
the spawn code, same logic applies).

Pass pointer to process_handle_t into tor_spawn_background, rather than it
return a process_handle_t.

Have a separate, unit-tested function, that formats a command line properly for
windows. It should handle filenames with a space (e.g. C:\Program Files\);
also it should handle internal quotes. Look at Python's subprocess.list2cmdline
and at http://msdn.microsoft.com/en-us/library/ms880421

For tor_get_exit_code: Let's return a define or an enum rather than a magic
tristate.

In tor_read_all_from_handle, I'd be more comfortable if it did an
tor_assert(byte_count + numread <= count);

Document and possibly fix:

In tor_log_from_handle: handle strings with embedded NULs; handle strings > 255 bytes (without splitting lines or outputting bogus newlines). See bug #2045.

### comment:8 in reply to:  7 Changed 9 years ago by sjmurdoch

Status: accepted → needs_review

OK, I believe I have addressed all of these issues, and the code is ready for review. FYI, the code you previously reviewed was 1da5081. Changes since then are:

Fix:

Split the C string processing stuff out of functions like log_from_handle, so
that we can do a unit test for it. (If there's other string processing goop in
the spawn code, same logic applies).

Pass pointer to process_handle_t into tor_spawn_background, rather than it
return a process_handle_t.

Have a separate, unit-tested function, that formats a command line properly for
windows. It should handle filenames with a space (e.g. C:\Program Files\);
also it should handle internal quotes. Look at Python's subprocess.list2cmdline
and at http://msdn.microsoft.com/en-us/library/ms880421

For tor_get_exit_code: Let's return a define or an enum rather than a magic
tristate.

In tor_read_all_from_handle, I'd be more comfortable if it did an
tor_assert(byte_count + numread <= count);

Document and possibly fix:

In tor_log_from_handle: handle strings with embedded NULs; handle strings > 255 bytes (without splitting lines or outputting bogus newlines). See bug #2045.

### comment:9 follow-up:  10 Changed 9 years ago by nickm

Looks good; merging!

Minor issue we should fix:

• I think the need_quotes check in format_commandline_argument might be inadequate: Shouldn't it also check for \ and " ?

Thing we could fix later:

• In format_commandline_argument(), it would be neat to skip the business of building a smartlist of char*, and just pre-allocate a guaranteed-to-be-long-enough string -- I think that strlen(arg)*2 + 3 would be long enough, since each character in the input turns into at most 2 characters in the output, and there are at most 3 characters of overhead: 1 for the nul-terminator, 2 for quotes.

Let me know if you'd like me to open tickets for those.

Stuff I tweaked:

• I think the tor_join_cmdline() function and format_commandline_argument() functions ought to have "win" somewhere in their names. (Fixed this.)
• In documentation for tor_split_lines, we should refer to a NUL character. NULL is a pointer. (Pedantically, NUL is a null character, and null-terminated strings are also NUL-terminated, but never NULL-terminated.) (Fixed this.)

### comment:10 in reply to:  9 Changed 9 years ago by sjmurdoch

Replying to nickm:

Looks good; merging!

Thanks!

Minor issue we should fix:

• I think the need_quotes check in format_commandline_argument might be inadequate: Shouldn't it also check for \ and " ?

I believe not, because it is acceptable to escape \ and " outside of a quoted argument (as per http://msdn.microsoft.com/en-us/library/ms880421 and list2cmdline in http://hg.python.org/cpython/file/3102951cc1ce/Lib/subprocess.py#l530).

e.g.

C:\w\tor>src\tools\tor-fw-helper\tor-fw-helper.exe -T "foo bar" foo\bar foo\"bar
foo\\\"bar
ARG: 0: src\tools\tor-fw-helper\tor-fw-helper.exe
ARG: 1: -T
ARG: 2: foo bar
ARG: 3: foo\bar
ARG: 4: foo"bar
ARG: 5: foo\"bar


Thing we could fix later:

• In format_commandline_argument(), it would be neat to skip the business of building a smartlist of char*, and just pre-allocate a guaranteed-to-be-long-enough string -- I think that strlen(arg)*2 + 3 would be long enough, since each character in the input turns into at most 2 characters in the output, and there are at most 3 characters of overhead: 1 for the nul-terminator, 2 for quotes.

Yes, I was considering that. An alternative would be to have a smartlist of ranges within the string, since most of the time output strings will consist of contiguous areas of input string. That would however lead to rather tricky code, so I think your approach sounds good.

Let me know if you'd like me to open tickets for those.

Yes please.

Stuff I tweaked:

• I think the tor_join_cmdline() function and format_commandline_argument() functions ought to have "win" somewhere in their names. (Fixed this.)
• In documentation for tor_split_lines, we should refer to a NUL character. NULL is a pointer. (Pedantically, NUL is a null character, and null-terminated strings are also NUL-terminated, but never NULL-terminated.) (Fixed this.)

Thanks.

### comment:11 Changed 9 years ago by nickm

Okay, added the issue as #3876. One more thing -- this needs a changes file.

### comment:12 Changed 9 years ago by nickm

Resolution: → implemented needs_review → closed

Added an entry to the draft changelog for #2046; closing this ticket at long last. Thanks for the hard work, Steven!

### comment:13 Changed 9 years ago by sjmurdoch

Resolution: implemented closed → reopened

Sorry, there was bug in the test suite which triggered a double-free on some platforms, including the buildbot. This should be fixed in cfa9ee5fe708539965f46a31b5d2abe4950179af of git://git.torproject.org/sjm217/tor.git (branch bug2046).

### comment:14 Changed 9 years ago by sjmurdoch

Status: reopened → needs_review

### comment:15 Changed 9 years ago by nickm

Resolution: → fixed needs_review → closed

Looks ok by me. Merging.

### comment:16 Changed 9 years ago by sjmurdoch

Resolution: fixed closed → reopened

Gah, buildbot is unhappy again because the test made some assumptions about how the EOF flag would be delivered. For some reason it works differently between Ubuntu and MacOS, compared to the Debian machine. I've now fixed the test case in f5df96c94f0c40e4bf55e4b4b843df85e9d90a71 of git://git.torproject.org/sjm217/tor.git (branch bug2046).

### comment:17 Changed 9 years ago by sjmurdoch

Status: reopened → needs_review

### comment:18 Changed 9 years ago by nickm

Resolution: → fixed needs_review → closed

looks ok; merging.

### comment:19 Changed 8 years ago by nickm

Component: Tor Client → Tor
Note: See TracTickets for help on using tickets.