Opened 9 years ago

Last modified 5 weeks ago

#2077 needs_information defect

Overwriting files can fail on Windows

Reported by: joebt Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: windows tor-client
Cc: special@…, shondoit@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Using Firefox 3.6.10, Vista x64 hm prem, SP2.

Crashes w/ older Tor versions were rare for me. Since installing alpha ver listed, several crashes in 2 days. Doing nothing diff (except later ver of Firefox). No warning / signs of things going bad before get crash notice (Tor has stopped working).

These seem to be main warnings from log:

Vidala log errors:

Oct 15 12:59:21.741 [Warning] Error replacing "C:\Users\<user name>\AppData\Roaming\tor\state": File exists
Oct 15 12:59:21.743 [Warning] Unable to write state to file "C:\Users\<user name>\AppData\Roaming\tor\state"

Note: the user\AppData path is not protected - the acct I was running in Vista at time of Tor crash has full access to the paths \ files mentioned above.
The file: state.tmp DOES exist in the location mentioned.
I can open / read - state.tmp - (after Tor crash & Tor is shut down). Has no obvious errors msgs in the file.

There's another file: "lock" (w/ 0 bytes) in same folder as state.tmp (after crash & Tor / Vidalia are shut down).
Properties of the "lock" file show was created yesterday 10/14, & last mod 10/15. Its file attribs are "AN" - archived, not indexed? Other than that, no real info on it.

Err Rpt from Windows Vista x64 prepared to send to MS:

Problem signature:

Problem Event Name: APPCRASH
Application Name: tor.exe
Application Version: 0.0.0.0
Application Timestamp: 4ca65e57
Fault Module Name: tor.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 4ca65e57
Exception Code: c0000005
Exception Offset: 000c3bf9
OS Version: 6.0.6002.2.2.0.768.3
Locale ID: 1033
Additional Information 1: fd00
Additional Information 2: ea6f5fe8924aaa756324d57f87834160
Additional Information 3: fd00
Additional Information 4: ea6f5fe8924aaa756324d57f87834160

Child Tickets

Attachments (3)

VidaliaLog_Tor Crash2_10.15.2010.txt (36.9 KB) - added by joebt 9 years ago.
Tor crash2_10-15-2010.txt (1.1 KB) - added by joebt 9 years ago.
0001-Correct-the-logic-from-f14754fbd-for-tor_gmtime_r.patch (1.2 KB) - added by special 9 years ago.
Patch to correct gmtime_null (f1475fbd)

Download all attachments as: .zip

Change History (53)

Changed 9 years ago by joebt

Changed 9 years ago by joebt

Attachment: Tor crash2_10-15-2010.txt added

comment:1 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final

We should fix if we can figure out how to diagnose it. This could be a variant of #1125, which is rumored to be especially prevalent on Win64. I don't think the warning bug about the state file is necessarily related, though it could be. Does the warning about the state file occur right before every crash, or only sometimes?

comment:2 Changed 9 years ago by arma

0.2.2.19-alpha is out, and fixes a number of crash bugs since 0.2.2.17-alpha. Can you try again?

comment:3 Changed 9 years ago by nickm

Status: newneeds_review

boboper traced this to a possible NULL return from gmtime. If that's the right diagnosis (and it's sure possible on Windows), then my branch gmtime_null may be a fix here. Moving this to "needs_review".

comment:4 Changed 9 years ago by Sebastian

Looks good to me.

comment:6 Changed 9 years ago by Sebastian

Do you have any kind of advice what exactly to look for there or what it is actually supposed to tell us?

comment:7 in reply to:  6 Changed 9 years ago by rransom

Replying to Sebastian:

Do you have any kind of advice what exactly to look for there or what it is actually supposed to tell us?

That bug report describes a similar problem (a program tries to replace a file and fails), and it may have the same cause, possibly a big in Windows Vista and Windows 7.

The workaround for this bug is probably to change replace_file (in src/common/compat.c) to use MoveFileEx on Windows, but I'll need to check to make sure that function is available on older versions of Windows.

comment:8 Changed 9 years ago by nickm

Hm. Could be at least two bugs at work here then: the one responsible for the crash (still see my branch gmtime_null), and the one responsible for making the state file not get replaced.

rransom: I'm trying to understand the nature of the windows issue, and not getting it. Is it that file_status() fails and so we never try to unlink the old file, or is it that unlink() fails, or is it that even if we unlink the old file, the rename() might still fail?

comment:9 Changed 9 years ago by nickm

See branch 2077_movefileex (as well as gmtime_null) in my public repository. It needs testing; it has gotten precisely none from me. It should apply cleanly to 0.2.1.

comment:10 Changed 9 years ago by nickm

doors sayss:

19:40 < doorss> for record: branch gmtime_null failed.

Perhaps someday he will say more.

comment:11 Changed 9 years ago by arma

See also #2344 for further hints (or maybe just the same hint)

Changed 9 years ago by special

Patch to correct gmtime_null (f1475fbd)

comment:12 Changed 9 years ago by arma

See #2346 for another bug uncovered while looking at this one.

comment:13 Changed 9 years ago by arma

The problem here appears to be in the line

    global_state->LastWritten = -1;

added by git 29e0d708. special's backtrace is

or_state_save -> config_dump -> get_assigned_option
(key="LastWritten") -> format_iso_time -> tor_gmtime_r -> crash

My first thought was "Mike meant to set LastWritten to 0".

My second thought was "wait, why are we using LastWritten to remember whether the last write had failed?"

My third thought was "The behavior of LastWritten in maint-0.2.1 is to put time(NULL) into the state file, so the state file tells you when it was written. The behavior of LastWritten in maint-0.2.2 is now to tell you when the *previous* state file was written. We should reconcile these. I rather like the behavior in maint-0.2.1."

If Mike wants to turn off circuit building when we're failing to write to our state file, it seems to me that we should have some wrapper functions like last_state_write_worked() and note_state_file_written() that he can call.

comment:14 Changed 9 years ago by special

Cc: special@… added

I can confirm that nickm/gmtime_null, after applying the attached patch, stops the crash. nickm/2077_movefileex (dd7ef9ac) builds, works correctly, and fails correctly, but I won't be able to confirm that it solves this bug without running it for awhile.

There is a small memory leak in 2077_movefileex:

diff --git a/src/or/routerlist.c b/src/or/routerlist.c
index f85db08..0fa2a8f 100644
--- a/src/or/routerlist.c
+++ b/src/or/routerlist.c
@@ -747,6 +747,7 @@ router_rebuild_store(int flags, desc_store_t *store)
     char *errmsg=NULL;
     if (replace_file(fname_tmp, fname, &errmsg)<0) {
       log_warn(LD_FS, "Error replacing old router store: %s", errmsg);
+      tor_free(errmsg);
       goto done;
     }
     tor_free(errmsg);

That resolves everything other than arma's notes about the value of LastWritten. Note that with gmtime_null, LastWritten will be 1/1/1970 in the first successful state file after a write failure.

comment:15 Changed 9 years ago by special

2077_movefileex does not appear to fix the bug - I just received the "Access is denied." error, and the next write (one second later) succeeded. The random file replace failures are still a problem.

comment:16 Changed 9 years ago by nickm

doorss has been doing some interesting research here. There seem to be well-known issues related to bugs in Windows Vista and later:

http://social.answers.microsoft.com/Forums/en-US/w7files/thread/
381e9f29-50a5-4fa3-b9f2-7b3389293063

http://stackoverflow.com/questions/153257/random-movefileex-fail
ures-on-vista-access-denied-looks-like-caused-by-search-i

17:57 < doorss> I think it's about http://msdn.microsoft.com/en-us/library/aa363
858(v=vs.85).aspx
17:58 < doorss> missed FILE_SHARE_DELETE for CreateFile, thats how indexer or an
other process can "lock" renaming of Tor's files.

comment:17 Changed 8 years ago by rransom

See bug2077-not-indexed ( ssh://mob@repo.or.cz/srv/git/tor/rransom.git bug2077-not-indexed ) for a (not-yet-tested) fix for the 'access denied' problem.

comment:18 Changed 8 years ago by rransom

I just pushed a fix by special to bug2077-not-indexed; he is testing that version now.

comment:19 Changed 8 years ago by special

Unfortunately, this doesn't seem like the right fix. I've verified that %APPDATA%\Tor and all of the files within have the NOT_CONTENT_INDEXED attribute set, but I've still gotten several more "Error replacing 'xxx\state': Access is denied." errors.

It's worth noting that the error occurs on the file being replaced (the old state file), rather than the newly created state.tmp. I'm out of ideas here.

comment:20 Changed 8 years ago by Sebastian

So what's the status here? nickm/gmtime_null still works to fix the issue?

comment:21 Changed 8 years ago by special

nickm/gmtime_null resolves the crash, but not the failed state file writes and the subsequent (but harmless) errors.

The changes in arma's comment regarding the LastWritten value may be worth making as well.

The underlying cause of the state file write failing is unknown.

comment:22 Changed 8 years ago by cypherpunks

Anyone can check behavior for disabled search service, for sure?

comment:23 Changed 8 years ago by nickm

Priority: normalmajor

Looks like there are multiple bugs all going on here at once. We shouldn't wait on fixing all of them to merge the fixes we know about for some. The known good ideas are:

  • merge the (fixed) gmtime fix
  • Tolerate failure to write to files better.
  • The LastWritten fix would be neat.

The unknown stuff to investigate more is:

  • look into a more complicated incant to open files on windows

comment:24 Changed 8 years ago by nickm

Also: investigate whether disabling the search service entirely makes this problem go away.

comment:25 Changed 8 years ago by nickm

See also #1523.

comment:26 Changed 8 years ago by nickm

rebased and merged gmtime_null with special's fix.

comment:27 in reply to:  13 Changed 8 years ago by arma

Replying to arma:

The problem here appears to be in the line

global_state->LastWritten = -1;

Added my bug as #3039.

comment:28 Changed 8 years ago by nickm

Owner: set to nickm
Status: needs_reviewaccepted

comment:29 Changed 8 years ago by Sebastian

Do we still want this for 0.2.2.x?

comment:30 Changed 8 years ago by nickm

Well, we still want it: the underlying issue is still present. But I think we fixed the issues that made the underlying issue result in a crash. If that is so, then this can wait till 0.2.3.x.

comment:31 Changed 8 years ago by Shondoit

Cc: shondoit@… added

comment:32 Changed 8 years ago by nickm

Milestone: Tor: 0.2.2.x-finalTor: 0.2.3.x-final

comment:33 Changed 8 years ago by nickm

Keywords: windows added; Tor removed

comment:34 Changed 7 years ago by nickm

Priority: majornormal
Summary: Tor crashes in Vidalia 0.2.2.17a-0.2.10Overwriting files can fail on Windows

comment:35 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified

moving to unspecified. It's an annoyance but it's no longer a crash.

comment:36 Changed 7 years ago by nickm

Keywords: tor-client added

comment:37 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:38 Changed 6 years ago by nickm

Keywords: 023-backport 024-deferrable added
Milestone: Tor: unspecifiedTor: 0.2.4.x-final

Based on a comment of rransom's on #8795, on the following parenthetical on MSDN, and some simple experiments, I think that a fix here could be as simple as using FILE_SHARE_DELETE when opening a Windows file for memory mapping. I added 4f4fc63fea0589a4fa03f3859dc27860cdde75af to maint-0.2.4 to test this theory.

comment:39 Changed 6 years ago by nickm

Ah, I see doorss caught FILE_SHARE_DELETE above too. I guess I didn't understand what he was saying. :/

comment:40 Changed 6 years ago by nickm

Status: acceptedneeds_review

Branch "bug2077_share_delete" in my public repository . Roger and Andrea said that the basic idea seemed sane to them (modulo our mutual lack of window fu), so I'm going to call this "already reviewed" and merge it into 0.2.4 to see whether it makes that unit test pass again on windows.

comment:41 Changed 6 years ago by nickm

Status: needs_reviewneeds_revision

Interesting. That didn't work.

comment:42 Changed 6 years ago by nickm

(By which I mean, the new unit test on Windows 7 did not begin to pass.)

comment:43 Changed 6 years ago by nickm

Okay, a little more testing shows that FILE_SHARE_DELETE is not the answer, or at least not a sufficient answer. On the other hand, unmap-before-delete totally works. So let's try that...

comment:44 Changed 6 years ago by nickm

Keywords: 023-backport 024-deferrable removed

And indeed, 0.2.3 currently does do unmap-before-delete. So the problem is in 0.2.4 now...

comment:45 Changed 6 years ago by nickm

The problem occurred previously and we fixed it as bdff7e3299d78cd2f7aa4a31e5f57c1aeef5ffa1.

comment:46 Changed 6 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: unspecified
Status: needs_revisionneeds_information

The problem was reintroduced in 0.2.4.12-alpha by 6905c1f6. I'm calling it its own separate bug and reopening one of the tickets that we closed as a dup of this one.

comment:47 Changed 6 years ago by slacka

I am encountering this issue with:
tor-browser-2.4.12-alpha-2_en-US
in Windows 7 64 bit

I was unable to reconnect to tor, so I checked the log and saw:
[Warning] Error replacing "C:/Tor Browser/Data/Tor\cached-microdescs"

So I deleted the file, and was able to restart tor. The next day, tor stopped and I had to repeat the procedure.

comment:48 Changed 20 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

comment:49 Changed 5 weeks ago by nickm

Owner: nickm deleted
Status: needs_informationassigned

I'm listed as the owner of these needs_information tickets, but I'm not currently doing anything with them. Reassigning to nobody, then putting back in needs_information.

comment:50 Changed 5 weeks ago by nickm

Status: assignedneeds_information
Note: See TracTickets for help on using tickets.