Opened 11 years ago

Last modified 7 years ago

#691 closed defect (Fixed)

Tor relay fails on startup if network not up yet

Reported by: HANtwister Owned by: nickm
Priority: High Milestone: 0.2.1.x-final
Component: Core Tor/Tor Version: 0.2.0.25-rc
Severity: Keywords:
Cc: HANtwister, nickm, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This applies to 0.2.0.26.

On Windows XP Home Edition, if the Tor Service is started while no network adapters are physically connected,
Tor will immediately crash.

szAppName: tor.exe
szModName: unknown
offset: 0022e7d0

Crash Dump Attached.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Attachments (2)

tor-mdmp-noconnection.7z (104.7 KB) - added by HANtwister 11 years ago.
Tor Memory Dump when No Network Adapter Connected
patch-691.txt (7.0 KB) - added by nickm 11 years ago.
First cut at patch for bug

Download all attachments as: .zip

Change History (15)

Changed 11 years ago by HANtwister

Attachment: tor-mdmp-noconnection.7z added

Tor Memory Dump when No Network Adapter Connected

comment:1 Changed 11 years ago by arma

Unfortunately we don't have any idea how to read Windows crash dumps.

I wonder if there is a way to get any sort of stack trace or useful
error messages or anything out of this?

comment:2 Changed 11 years ago by phobos

I can't reproduce this in winxp home. I remove the network adapter and Tor starts up and just waits for the network
to appear forever.

comment:3 Changed 11 years ago by HANtwister

I think this only applies to Tor Routers, given the text I was able to get by launching it from the command line when
the network cable was disconnected:

Jun 29 16:33:04.568 [notice] Tor v0.2.1.2-alpha (r15383). This is experimental software. Do not rely on it for strong
anonymity. (Running on Windows XP Service Pack 2 [workstation] {personal} {terminal services, single user})
Jun 29 16:33:04.584 [notice] Initialized libevent version 1.4.4-stable using method win32. Good.
Jun 29 16:33:04.584 [notice] Opening OR listener on 0.0.0.0:443
Jun 29 16:33:04.584 [notice] Opening Directory listener on 0.0.0.0:80
Jun 29 16:33:04.584 [notice] Opening Socks listener on 127.0.0.1:9050
Jun 29 16:33:04.584 [notice] Opening Control listener on 127.0.0.1:9051
Jun 29 16:33:04.662 [warn] eventdns: Unable to add nameserver 68.87.74.162: error 2
Jun 29 16:33:04.662 [warn] eventdns: Didn't find any nameservers.
Jun 29 16:33:04.662 [warn] Could not config nameservers.
Jun 29 16:33:04.662 [err] Error initializing dns subsystem; exiting

comment:4 Changed 11 years ago by HANtwister

Let me refine this: the computer is behind a router (Private IP), it has a Static IP and uses Static DNS information.
Could Tor be crashing because it's given a list of nameservers, but no connection to them?

Also, if the network cable is unplugged after Tor starts, and you attempt to change the settings via the Control Port
(for example, with Vidalia), you'll see this, which looks identical to the problem it's encountering above:

Jul 24 18:39:01.930 [Warning] eventdns: Unable to add nameserver 68.87.74.162: error 2
Jul 24 18:39:01.930 [Warning] eventdns: Didn't find any nameservers.
Jul 24 18:39:01.930 [Warning] Could not config nameservers.
Jul 24 18:39:01.930 [Error] set_options(): Bug: Acting on config options left us in a broken state. Dying.

comment:5 Changed 11 years ago by arma

I just reproduced on Linux. This is also a bug with 0.1.2.x.

The issue is that Tor relays want working nameservers when they start up, and
if the network isn't up, the nameservers don't work.

I'm going to bump up the priority of the bug, and assign it to Nick so we make
sure to get his input. Since it's broken in 0.1.2.x also, this seems like more
of a bug to tackle for 0.2.1.x than to try to backport.

It seems that the best fix is to remember that initializing the nameservers
didn't work, and try again when we believe our network to be working. Then if
they continue to be broken, opt not to be an exit relay.

comment:6 Changed 11 years ago by nickm

Roger: your diagnosis seems correct. The fix should get into 0.2.1.x; we can decide how far back to backport once
we see how complex the fix turns out to be.

comment:7 Changed 11 years ago by arma

Just to put another spin on it: if your Vidalia is configured to make you
a relay, and you start Vidalia when your network is down, Tor exits immediately
(so Vidalia can't get things started up right).

comment:8 Changed 11 years ago by nickm

Hm. At first I liked Roger's proposal (note that nameserver init didn't work; wait; try again later; don't claim
to be an exit till we have nameservers. But on further thought, the case where we have no network at all is way
easier to fix:

Check for the network on startup. If it is not there, stall until it is.

We could stall in two ways: either by allowing control connections and nothing else, or by allowing no connections
at all.

How does this grab folks?

comment:9 Changed 11 years ago by nickm

19:43 < armadev> re 691, we should want to have a control port open at least, or

vidalia will think it didn't start tor right.

19:44 < armadev> how do we propose to stall? sleep and check each second, and an
swer control queries in the meantime?
19:45 < armadev> that is a better approach than our current approach. it seems t
hat being a client would be more expected behavior. though i'll grant that if ou
r network isn't working, we won't be a very good client either.
19:46 < armadev> if we really plan to sleep until the network comes back, there
are some other "booting up" activities we should delay until the network arrives
. like, launching directory fetches.
19:48 < nickm> Right. There are two ways we could stall.
19:48 < nickm> We could sleep, retry, sleep, retry, sleep, retry,...
19:49 < nickm> or we could run as Tor, but not do a whole lot of things tor usua
lly does, and keep retrying the network periodically, and if it ever comes up, r
estart trying to do all the things Tor does.
19:49 < nickm> The second approach seems more "serious", but I'm not sure I see
the advantage.
19:50 < armadev> so you're thinking really just a function that sleeps and waits
.
19:50 < armadev> we do want to answer control port queries though, or we aren't
tor at all
19:51 < nickm> If we want to do that, we'll take the second approach.
19:51 < nickm> Another option is to exit with a useful error message and indicat
e somehow to whatever launched us that there is no network yet.
19:52 < armadev> i'm not sure i like that one either. i try to demo vidalia on m
y laptop when i'm doing talks or talking to people,
19:52 < armadev> and if i accidentally left myself as a relay last time i shut d
own, vidalia spits up a 'tor crashed' warning
19:53 < armadev> then i manually go edit my torrc and vidaliarc, and restart. i
can't do it with vidalia because it can't keep tor up long enough to setconf.
19:58 < nickm> Hm.
19:58 < nickm> Is it only the listeners that fail here?
19:58 < nickm> (opening them, that is.)
19:58 < armadev> no, opening the listeners works fine. we test our nameservers o
n start and they fail. so tor exits because no nameservers work.
19:59 < nickm> Hm.
19:59 < nickm> Is that it?
19:59 < nickm> Everything else works?
19:59 < armadev> yes.
19:59 < nickm> We could get smart here somehow, I guess.
19:59 < armadev> i think
19:59 < armadev> the smart i had in mind was to set a flag that said 'no nameser
vers yet'
19:59 < armadev> and go about our business, not being an exit relay, and seeing
if that's changed every so often
20:00 < armadev> it does cause dir fetches to fail immediately, and other things

like that to fail immediately. but they already handle their failures. it's onl

y nameserver checks that don't handle.
20:01 -!- nemysis [~nemysis@…] has quit [Remote host closed

the connection]

20:02 -!- nemysis [~nemysis@…] has joined #tor-dev
20:04 < nickm> Hm. I had overestimated the difficuly of the original approach,
I guess.
20:05 < armadev> we could choose not to be any sort of relay, if that's easier
20:05 < armadev> but i bet it won't be easier
20:05 < nickm> Agreed.

Changed 11 years ago by nickm

Attachment: patch-691.txt added

First cut at patch for bug

comment:10 Changed 11 years ago by nickm

I've attached a plausible patch. It does a better job of advertising that ServerDNSAllowBrokenResolvConf exists on
failure, turns it on by default, retries broken dns_init()s every 10 min if they fail, replaces the router's exit
policy with reject *:* if the most recent dns_init() failed, and marks the descriptor dirty when the DNS-failed status
changes.

comment:11 Changed 11 years ago by nickm

applying, with small changes.

comment:12 Changed 11 years ago by nickm

flyspray2trac: bug closed.

comment:13 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.