Opened 11 years ago

Closed 9 years ago

Last modified 7 years ago

#863 closed defect (fixed)

Relay crashes OSX 10.3.9

Reported by: downie Owned by:
Priority: High Milestone: 0.2.1.x-final
Component: Core Tor/Tor Version: 0.2.0.31
Severity: Keywords:
Cc: downie, nickm, phobos, arma, Sebastian Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by phobos)

Tor v0.2.0.31 (r16744). Mac OSX10.3.9 500Mhz 640Mb Libevent 1.4.7 ORPort 9001
Running a Tor relay consistently crashes the machine after a few hours (black screen of death).
No other desktop software is running apart from Anti-virus and PGP-Desktop memory-resident programs.
Little Snitch is also running, it is set to allow Tor any network connection.
The torrc is the bare minimum, apart from an Address line to avoid a bug with dynamic IP not being detected correctly

(submitted separately). Bandwidth is limited for 512k upload. Logging is currently at Notice level.
Tell me what to log and how please!

[Automatically added by flyspray2trac: Operating System: OSX 10.3 Panther]

Child Tickets

Attachments (3)

tor.log.txt (128.1 KB) - added by downie 11 years ago.
Last 100K of the log after a crash
panic.log (1.5 KB) - added by downie 11 years ago.
Kernel log after crash
panic.log.txt (1.5 KB) - added by downie 11 years ago.
Kernel log after crash

Download all attachments as: .zip

Change History (37)

comment:1 Changed 11 years ago by nickm

Can you get a stack trace to figure out what's failing? More info on how to do that here:

https://wiki.torproject.org/noreply/TheOnionRouter/TorFAQ#ReportBug

comment:2 Changed 11 years ago by downie

That bit of the FAQ mentions a 'core dump' but not a 'stack trace'. I don't see "core" in my home directory nor in .tor (the Data Directory refered to I presume).
I don't have a 'ulimit' command. There is 'dmesg' though - is that of any use after the machine has crashed, or only before?
I'm willing to log at debug level and hope it crashes quickly before the logfile grows enormous - once I've finished for the day.

comment:3 Changed 11 years ago by downie

From a ps run shortly before the crash:
PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
1100 3.0 2.8 48780 18272 ?? S 6:46AM 2:47.91 tor
1092 1.8 4.3 153592 28304 ?? S 6:44AM 31:28.06 vidalia

I tried to attach 2.8Mb of Tor logs but I got an error - too much?
The logs appear to end a while before the actual crash (how do I find the exact time?).
Is it possible that it's Vidalia rather than Tor which is crashing - and stopping logging beforehand?

comment:4 Changed 11 years ago by downie

I've tried running the relay from a comand line now - it crashes even without Vidalia.
I think it may not like other programs running.
The first crash happened after I opened Adobe Reader and printed a page - a few minutes later the machine crashed.
The second happened after I opened Firefox and did some browsing.
There was a lot of traffic from the telltales on my router,
so maybe I'm running out of resources and it's not failing gracefully?

comment:5 Changed 11 years ago by nickm

Try "unlimit coredumpsize" if you are using a shell without ulimit; that might work instead.

(My current guess is that you're running out of RAM.)

comment:6 Changed 11 years ago by downie

Still no core dump I can see: here's the script I used to run Tor
#!/bin/tcsh
unlimit coredumpsize
/usr/bin/tor --quiet -f /Users/<home>/.vidalia/torrelay DataDirectory /Users/<home>/.tor/ ControlPort 9051 HashedControlPassword 16:<.............> CookieAuthentication 0 RunAsDaemon 1 Address xx.yy.zz.106

Thanks for your help.

Changed 11 years ago by downie

Attachment: tor.log.txt added

Last 100K of the log after a crash

comment:7 Changed 11 years ago by downie

Debug level log extract attached

comment:8 Changed 11 years ago by downie

From 'man core':
"This memory image is written to a file named by default

core.pid in the /cores directory; provided the terminated process had
write permission in the directory, and the directory existed."

So permissions need to be set for all write access?
"The maximum size of a core file is limited by setrlimit(2). Files which

would be larger than the limit are not created."

Can't find setrlimit in bash or tcsh - I assume unlimit does the trick?
"Core dumps are disabled by default under Darwin/Mac OS X. To re-enable

core dumps, a privlaged user must edit /etc/hostconfig to contain the
line: COREDUMPS=-YES- "

That needed doing as well - trying again.

comment:9 Changed 11 years ago by Benzyl

I would like to run TOR as a relay but am getting an identical problem to this. After about an hour the error screen saying restart your machine pops up with an error 'PMU FORCED SHUTDOWN -122' in the logs. Nothing else I run does this and TOR in standard mode works fine. G4 450 dual, 512 Mb memory OSX 10.3.9, running network through a WRT54G. It works perfectly up until the point it terminates totally.

comment:10 Changed 11 years ago by downie

Panic.log attached - it is timestamped at the time I rebooted.

Changed 11 years ago by downie

Attachment: panic.log added

Kernel log after crash

comment:11 Changed 11 years ago by downie

Any hope of progress on this?
Will Tor 0.2.0.32 be coming out for OSX10.3.9 any time soon for us to try?

comment:12 Changed 11 years ago by phobos

I think you're running out of memory, but I'm still testing on my 10.3.9 ppc mac. My ppc mac has 2GB of ram, so I may
not be able to recreate the problem.

comment:13 Changed 11 years ago by nickm

If it _is_ an out-of-memory problem, you might want to try 0.2.1.7-alpha, or whatever the latest in the 0.2.1.x series
is when you have a chance. They use less RAM than the stable 0.2.0.x series.

Another option to consider is adding more RAM to your mac if you can. If it's an older computer, RAM should be pretty
cheap for it.

comment:14 Changed 11 years ago by downie

It seems this old Imac can't take more than 640Mb :(
As to running a development version, I'm not a programmer and I'm *really* loath to get into compiling code when I've never used a compiler before. Just another piece of software to know just enough about to get into trouble.
I guess I'll have to wait.
Thanks.

comment:15 Changed 11 years ago by downie

I can confirm that Tor 0.2.0.32 does the same thing.
Panic.log attached

Changed 11 years ago by downie

Attachment: panic.log.txt added

Kernel log after crash

comment:16 Changed 11 years ago by phobos

You could try setting this in Terminal, and then starting Tor:

export EVENT_NOKQUEUE=1

comment:17 Changed 11 years ago by downie

That seems to work! I added it to my startup script, making sure it used bash shell (no export command in tcsh - is there an equivalent?)
So presumably that stops 'method kqueue' (whatever that is). What affect will that have on how Tor runs?
Thanks,
downie

comment:18 Changed 11 years ago by nickm

In tcsh, you'd say "setenv NO_KQUEUE 1"

It'll slow down the Tor process a little by making it use a libevent backend that slower, but apparently less
crash-prone on your system.

Also, you should probably try 0.2.1.9-alpha or whatever the latest development version is. They use far less
RAM.

comment:19 Changed 11 years ago by phobos

This NOKQUEUE is set automatically in the startup script for osx 10.3.x. It won't affect the performance of tor much at all.

It's related to an old bug in OS X, see http://archives.seul.org/or/talk/May-2005/msg00074.html for more info.

Glad it works for you now.

comment:20 Changed 11 years ago by nickm

Reopening. I'd like to get the fix out of the startup script and into Tor if we can. Do you know a way to detect
the Apple version from C?

comment:21 Changed 11 years ago by nickm

Oy. I found one, but it isn't too pretty, but I think it'll work. Implemented in r18450.

comment:22 Changed 10 years ago by downie

Should 0.2.1.14-rc contain the fix?
The first time I ran it without 'export EVENT_NOKQUEUE=1' I had the same type of crash.
Since I restored that command it has run perfectly.

comment:23 Changed 10 years ago by downie

No difference in the Info-level logs either way though - no reference to kqueue or libevent.

comment:24 Changed 10 years ago by downie

Same result with 0.2.1.18 - the 'export EVENT_NOKQUEUE=1' has to be in the startup script.

comment:25 Changed 10 years ago by downie

Sun Jul 26 16:11:16 2009

Unresolved kernel trap(cpu 0): 0x300 - Data access DAR=0x0000000000000014 PC=0x000000000020D250
Latest crash info for cpu 0:

Exception state (sv=0x2A3A9C80)

PC=0x0020D250; MSR=0x00009030; DAR=0x00000014; DSISR=0x40000000; LR=0x0020D15C; R1=0x0F41BC20; XCP=0x0000000C (0x300 - Data access)
Backtrace:

0x00000000 0x0020CC8C 0x00246D84 0x000941C0 0x00000000

Proceeding back via exception chain:

Exception state (sv=0x2A3A9C80)

previously dumped as "Latest" state. skipping...

Exception state (sv=0x2229B500)

PC=0x9002E7AC; MSR=0x0000F030; DAR=0xE1803000; DSISR=0x40000000; LR=0x000BC474; R1=0xBFFFFB50; XCP=0x00000030 (0xC00 - System call)

Kernel version:
Darwin Kernel Version 7.9.0:
Wed Mar 30 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC

panic(cpu 0): 0x300 - Data access
Latest stack backtrace for cpu 0:

Backtrace:

0x00083498 0x0008397C 0x0001EDA4 0x00090C38 0x0009402C

Proceeding back via exception chain:

Exception state (sv=0x2A3A9C80)

PC=0x0020D250; MSR=0x00009030; DAR=0x00000014; DSISR=0x40000000; LR=0x0020D15C; R1=0x0F41BC20; XCP=0x0000000C (0x300 - Data access)
Backtrace:

0x00000000 0x0020CC8C 0x00246D84 0x000941C0 0x00000000

Exception state (sv=0x2229B500)

PC=0x9002E7AC; MSR=0x0000F030; DAR=0xE1803000; DSISR=0x40000000; LR=0x000BC474; R1=0xBFFFFB50; XCP=0x00000030 (0xC00 - System call)

Kernel version:
Darwin Kernel Version 7.9.0:
Wed Mar 30 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC

comment:26 Changed 10 years ago by downie

Are we expecting 0.2.1.19 to work? If so I'll try it. Otherwise no point crashing the machine again.

comment:27 Changed 10 years ago by arma

Every Tor from 0.2.1.13-alpha onward should have the fix.

So yes, it should be in 0.2.1.19.

comment:28 Changed 10 years ago by phobos

Someone wanted this re-opened, unsure why since it appears to have been fixed in October.

comment:29 Changed 10 years ago by downie

Your closing it reminded me that I had yet to test 0.2.1.19
Removing the 'NO_KQUEUE' line from my torrc caused a crash (after 12 hours this time).
Unless there is something special about my OS setup, tor isn't detecting 10.3 and disabling KQUEUE itself.

comment:30 Changed 10 years ago by arma

downie: does the latest tor stable (currently 0.2.1.24) still crash
for you? Can you explain your exact set-up?

Nobody else has experienced this problem since 0.2.1.13-alpha, as far
as I understand it.

(But then, we do have very few Tor relays on os x...)

comment:31 Changed 10 years ago by downie

Roger: This may be a problem of my understanding of the fix: I'm expecting the Tor binary itself to do the detection, whereas you are doing it in the Startup script - which at one point I was bypassing.

I can still try 0.2.1.22 if you wish - as you may recall that is the most recent not to have the 'imaginary IP address every few seconds' dynamic address bug.

comment:32 Changed 9 years ago by phobos

Description: modified (diff)

The relay, powerpc4life, is running the provided ppc vidalia-bundle on torproject.org/download. It seems to be working well. Is this still a problem?

comment:33 Changed 9 years ago by phobos

Resolution: Nonefixed
Status: newclosed

active relay runs fine for me. assuming problem solved.

comment:34 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.