Opened 11 years ago

Last modified 7 years ago

#885 closed defect (Fixed)

Can't run more than 9 copies of TOR

Reported by: somename Owned by:
Priority: Low Milestone:
Component: Core Tor/Tor Version: 0.2.0.31
Severity: Keywords:
Cc: somename, nickm, karsten, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

First 9 copies of tor running correctly (every copy have an own torrc file and TCP ports). But 10th running copy (and next) reporting an error:
... [warn] Error replacing old router store: Permission denied
... [err] Bug: routerlist.c:2093: signed_descriptor_get_body_impl: Assertion r failed; aborting.
routerlist.c:2093 signed_descriptor_get_body_impl: Assertion r failed; aborting.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Attachments (1)

patch885.txt (1.7 KB) - added by karsten 11 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 11 years ago by somename

And sometimes TOR can have started and connected to directory, but can't transfer any data.
(Sorry for my very bad English)

comment:2 Changed 11 years ago by karsten

Alex, did you try assigning distinct data directories to your Tor
processes? I currently have changing sets of around 15 parallel Tor
processes running without problems.

Of course, that bug should be fixed anyway.

Ah, it's also "Tor", not "TOR".

comment:3 Changed 11 years ago by somename

Is distinct data directories assigning by parameter DataDirectory? Is it need an identifical copies of data directory?

comment:4 Changed 11 years ago by karsten

So, I think this bug is the result of running multiple Tor processes in the
same data directory. We could backport use of a lockfile from 0.2.1.6-alpha
(r16722, r17244, more?) or hunt down this bug. The lockfile solution sounds
more sane to me, because there are probably further problems with running
two instances in the same data directory.

comment:5 Changed 11 years ago by karsten

Alex: Yes, use the DataDirectory option to specify where Tor stores its
state, cached descriptors, and so on. Data directories of your Tor
processes need to be distinct. You might also put your torrc's in distinct
directories and omit the DataDirectory option, so that Tor uses the current
directory as data directory.

comment:6 Changed 11 years ago by nickm

I'm not so comfortable hunting backporting the lockfile stuff; it is potentially destabilizing, and we try not to
do anything but bugfixes in the stable series. Still, we shouldn't expect sane behavior in this case.

It would be good to track the bug down, though. Assertion failures shouldn't be this easy to trigger.

comment:7 Changed 11 years ago by nickm

Oh! Also, it would be a neat feature to allow multiple Tor instances to share a single set of directory cache files,
with just one Tor updating them. Unfortunately, despite being a _neat_ feature, it is probably not a feature lots of
people want.

comment:8 Changed 11 years ago by karsten

The assertion is triggered in the SMARTLIST_FOREACH macro when calling
signed_descriptor_get_body() in routerlist.c line 695.

This is my theory (it's not proven to the end, so it might also be wrong):

Two or more Tor processes are writing to cached-descriptors.tmp
concurrently. Apparently, they do not open the file once, write their
descriptors, and close it; they rather open it for every descriptor, write
the descriptor, and close the file. The result is that the file contains a
mix of (correct) descriptors from different Tor processes. When one Tor
process is done, it renames cached-descriptors.tmp to cached-descriptors
and loads it back to memory. Then it reconstructs the pointers in its
cache.

And here comes the problem: In router_rebuild_store(), Tor uses the local
data structure signed_descriptors to iterate over the cache and calculates
offsets itself to find descriptors in the cache. But offsets do not match
anymore. The result is that the second or third call of
signed_descriptor_get_body() triggers the assertion.

A possible fix is to change the assertions in signed_descriptor_get_body()
into error log statements saying that the user is probably running multiple
Tor processes in the same data directory which confuses our internal data
structures and that we need to exit.

comment:9 Changed 11 years ago by karsten

See attached patch.

Changed 11 years ago by karsten

Attachment: patch885.txt added

comment:10 Changed 11 years ago by nickm

The part with the memcmp() check looks right. But the check for r is probably too late: the only way that r can be 0
is if the mmap() failed. I'll move the check to that case specifically, so that we still catch the error when
with_annotations is set.

comment:11 Changed 11 years ago by nickm

Ok; applied with tweaks.

comment:12 Changed 11 years ago by nickm

flyspray2trac: bug closed.

comment:13 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.