Opened 2 months ago

Last modified 6 weeks ago

#29743 needs_information defect

Long-running tor instances fail to keep up-to-date directory information

Reported by: karsten Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor:
Severity: Normal Keywords: needs-insight usability scalability
Cc: karsten, gk, starlight@…, gaba Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


We have a small number of long-running tor instances as part of our OnionPerf setups that are running 24/7. In the past, some of these tor instances got into a state where their directory information was no longer up-to-date enough to build circuits. In some cases they recovered after hours, days, or even weeks, but in some cases we had to restart the tor processes.

I'm attaching a graph that shows the number of open circuits as reported in heartbeat log messages. That number is relatively stable most of the time, depending on whether we're using the tor instance for making requests or for providing an onion service. But in some cases the number drops to zero, which coincides with the log message:

[notice] Our directory information is no longer up-to-date enough to build circuits: [...]

The graph also shows that sometimes the number magically goes up again. Those times coincide with the following log message:

[notice] We now have enough directory information to build circuits.

The purple dashed lines show when we restarted tor processes manually. Some of these restarts are unrelated to the number of open circuits. But some restarts happened explicitly because the tor instance was not working anymore for our measurements.

By the way, the op-nl instance shown in the middle was running, whereas the op-us and op-hk instances were running It may be coincidence, but the older op-nl did not run out of up-to-date directory information, whereas the newer op-us and op-hk did. Was this issue maybe introduced in 0.3.0.x?

I have tor logs available for all these tor instances. I can easily provide them, either as a big tarball or for specific days and instances as a smaller tarball. Just let me know.

Child Tickets

Attachments (1)

onionperf-open-circuits.pdf (29.1 KB) - added by karsten 2 months ago.

Download all attachments as: .zip

Change History (8)

Changed 2 months ago by karsten

Attachment: onionperf-open-circuits.pdf added

comment:1 Changed 2 months ago by gk

Cc: gk added

comment:2 Changed 2 months ago by nickm

Do you know whether the bug also happens in supported versions? We fixed a _lot_ of bugs between and now...

comment:3 Changed 2 months ago by nickm

Keywords: needs-insight usability added
Milestone: Tor: unspecified

comment:4 Changed 2 months ago by karsten

I'll also look at op-ab logs which should be running a more recent tor version, and we're going to update the tor versions on the three OnionPerf instances mentioned above. This is going to take a few days if the op-ab logs contain something useful or a few weeks if we first need to update tor versions and make new measurements.

comment:5 Changed 2 months ago by karsten

Status: newneeds_information

I looked at op-ab's logs, which is running by the way, and I didn't spot any case like this happening in the past two months. I'd say let's put this ticket on hold and see whether one of the updated instances produces a case as described in the next weeks or months.

comment:6 Changed 2 months ago by starlight

Cc: starlight@… added

comment:7 Changed 6 weeks ago by gaba

Cc: gaba added
Keywords: scalability added
Note: See TracTickets for help on using tickets.