Opened 7 years ago

Last modified 5 months ago

#2003 assigned enhancement

Hibernation Soft and Hard Limits Reached Simultaneously

Reported by: BarkerJr Owned by: Sebastian
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: tor-relay bandwidth accounting hibernate
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

0.2.2.17

AccountingMax 50GB

BarkerJrGarden1:
Oct 03 04:04:29.672 [notice] Bandwidth soft limit reached; commencing hibernation.
Oct 03 04:04:29.672 [notice] Going dormant. Blowing away remaining connections.

BarkerJrGarden2:
Oct 03 03:13:23.971 [notice] Bandwidth soft limit reached; commencing hibernation.
Oct 03 03:13:23.971 [notice] Going dormant. Blowing away remaining connections.

Child Tickets

Change History (42)

comment:1 Changed 7 years ago by Sebastian

I guess this happens because we had a very small expected usage recorded last month (because we included the soft hibernation time when calculating it) so that we didn't expect to finish the remaining amount of bytes in the next second (this can happen with barkerjr's setting of 50GB/month, that's just an average of 20kB/s).

The question is whether we actually want to do something about it. I think the answer is probably no, because this bug won't affect many people (only those that have huge pipes, small traffic/month accounting, have a bw estimate from the past, this bw estimate from the past was generated with an older Tor version).

comment:2 Changed 7 years ago by arma

What exactly are the parameters where we go into soft and hard hibernation at the same time? If it's an edge case that can be easily handled by another if or something, that sounds worth looking into.

comment:3 Changed 7 years ago by nickm

I think it's a low priority, but still worth fixing if somebody wants to fix it. The fix is probably something like, "If we notice that our observed bandwidth has been over 200% higher than our accounting-predicted bandwidth over the first 50% of our our bandwidth usage, then recalculate the soft limit."

comment:4 Changed 7 years ago by Sebastian

Ok. I was just going to do "if the first two conditions are true (we've used more than 95% and less than 500mb remain to be transfered), just use the bw we've averaged in this interval to decide when to stop. That seems relatively trivial to implement. Are there any downsides I'm unaware of?

comment:5 Changed 7 years ago by Sebastian

(An additional advantage is that it would work on the first run where bw accounting is enabled, too)

comment:6 Changed 7 years ago by nickm

That's fine by me.

comment:7 Changed 7 years ago by BarkerJr

Here are logs from BarkerJrGarden1 when the month started:

Oct 01 00:00:00.455 [warn] Mismatched accounting interval: moved by 96.77%. Starting a fresh one.
Oct 01 00:00:00.456 [notice] Configured hibernation. This interval began at 2010-10-01 00:00:00; the scheduled wake-up time was 2010-10-01 00:00:00; we expect to exhaust our quota for this interval around 2010-11-01 00:00:00; the next interval begins at 2010-11-01 00:00:00 (all times local)

comment:8 Changed 7 years ago by nickm

Milestone: Tor: 0.2.2.x-final

I think we did something like this already for 0.2.2.x, but if we didn't we should do it for 0.2.2.x or 0.2.3.x.

comment:9 Changed 7 years ago by Sebastian

Looks like this was addressed in bc081c254afaac42 and the three prior commits?

comment:10 Changed 7 years ago by BarkerJr

This doesn't seem fixed to me. Both are 0.2.2.22 (as of Jan 29)

BarkerJrAtom1
---
Feb 01 00:00:00.631 [info] configure_accounting(): Accounting interval elapsed; starting a new one
Feb 01 00:00:00.677 [info] reset_accounting(): Starting new accounting interval.
Feb 01 00:00:00.677 [notice] Configured hibernation. This interval began at 2011-02-01 00:00:00; the scheduled wake-up time was 2011-02-01 00:00:00; we expect to exhaust our quota for this interval around 2011-03-01 00:00:00; the next interval begins at 2011-03-01 00:00:00 (all times local)
Feb 01 00:00:00.677 [notice] Hibernation period ended. Resuming normal activity.
...
Feb 11 00:38:16.109 [notice] Bandwidth soft limit reached; commencing hibernation.
Feb 11 00:38:16.109 [notice] Going dormant. Blowing away remaining connections.
Mar 01 00:00:00.023 [warn] Mismatched accounting interval: moved by 90.32%. Starting a fresh one.
Mar 01 00:00:00.189 [info] reset_accounting(): Starting new accounting interval.
Mar 01 00:00:00.189 [notice] Configured hibernation. This interval began at 2011-03-01 00:00:00; the scheduled wake-up time was 2011-03-01 00:00:00; we expect to exhaust our quota for this interval around 2011-04-01 00:00:00; the next interval begins at 2011-04-01 00:00:00 (all times local)
Mar 01 00:00:00.189 [notice] Hibernation period ended. Resuming normal activity.
...
Mar 03 05:19:18.022 [notice] Bandwidth soft limit reached; commencing hibernation.
Mar 03 05:19:18.055 [notice] Going dormant. Blowing away remaining connections.

BarkerJrAtom2
---
Feb 01 00:00:00.385 [info] configure_accounting(): Accounting interval elapsed; starting a new one
Feb 01 00:00:00.398 [info] reset_accounting(): Starting new accounting interval.
Feb 01 00:00:00.398 [notice] Configured hibernation. This interval began at 2011-02-01 00:00:00; the scheduled wake-up time was 2011-02-01 00:00:00; we expect to exhaust our quota for this interval around 2011-03-01 00:00:00; the next interval begins at 2011-03-01 00:00:00 (all times local)
Feb 01 00:00:00.398 [notice] Hibernation period ended. Resuming normal activity.
...
Feb 11 00:38:46.594 [notice] Bandwidth soft limit reached; commencing hibernation.
Feb 11 00:38:46.594 [notice] Going dormant. Blowing away remaining connections.
Mar 01 00:00:00.644 [warn] Mismatched accounting interval: moved by 90.32%. Starting a fresh one.
Mar 01 00:00:00.658 [info] reset_accounting(): Starting new accounting interval.
Mar 01 00:00:00.658 [notice] Configured hibernation. This interval began at 2011-03-01 00:00:00; the scheduled wake-up time was 2011-03-01 00:00:00; we expect to exhaust our quota for this interval around 2011-04-01 00:00:00; the next interval begins at 2011-04-01 00:00:00 (all times local)
Mar 01 00:00:00.658 [notice] Hibernation period ended. Resuming normal activity.
...
Mar 03 20:54:17.756 [notice] Bandwidth soft limit reached; commencing hibernation.
Mar 03 20:54:18.756 [notice] Going dormant. Blowing away remaining connections.

comment:11 Changed 6 years ago by Sebastian

part of the report above is influenced by #3020. Can we get a new report for this with a more recent version of Tor?

comment:12 Changed 6 years ago by Sebastian

Status: newneeds_information

comment:13 Changed 6 years ago by BarkerJr

BarkerJrAtom2

Jul 19 11:28:16.705 [notice] Tor 0.2.2.28-beta (git-e084c6fd14caba95) opening log file.
Jul 19 11:28:16.705 [info] configure_accounting(): Continuing accounting interval.
Jul 19 11:28:16.705 [notice] Configured hibernation. This interval began at 2011-07-01 00:00:00; the scheduled wake-up time was 2011-07-01 00:00:00; we expect to exhaust our quota for this interval around 2011-08-01 00:00:00; the next interval begins at 2011-08-01 00:00:00 (all times local)
Aug 01 00:00:00.302 [info] configure_accounting(): Accounting interval elapsed; starting a new one
Aug 01 00:00:00.326 [info] reset_accounting(): Starting new accounting interval.
Aug 01 00:00:00.326 [notice] Configured hibernation. This interval began at 2011-08-01 00:00:00; the scheduled wake-up time was 2011-08-01 00:00:00; we expect to exhaust our quota for this interval around 2011-09-01 00:00:00; the next interval begins at 2011-09-01 00:00:00 (all times local)
Aug 01 00:00:00.326 [notice] Hibernation period ended. Resuming normal activity.
Aug 01 00:00:00.326 [notice] Opening OR listener on 0.0.0.0:0
Aug 01 00:00:00.326 [notice] OR listener listening on port 55198.
Aug 01 00:00:45.451 [notice] I learned some more directory information, but not enough to build a circuit: We have only 96/2495 usable descriptors.
Aug 01 00:00:45.572 [notice] I learned some more directory information, but not enough to build a circuit: We have only 192/2495 usable descriptors.
Aug 01 00:00:45.639 [notice] I learned some more directory information, but not enough to build a circuit: We have only 288/2495 usable descriptors.
Aug 01 00:00:45.856 [notice] I learned some more directory information, but not enough to build a circuit: We have only 383/2495 usable descriptors.
Aug 01 00:00:45.925 [notice] I learned some more directory information, but not enough to build a circuit: We have only 479/2495 usable descriptors.
Aug 01 00:00:45.951 [notice] I learned some more directory information, but not enough to build a circuit: We have only 575/2495 usable descriptors.
Aug 01 00:00:46.010 [notice] We now have enough directory information to build circuits.
Aug 01 01:09:55.756 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Aug 01 01:09:57.695 [notice] Performing bandwidth self-test...done.
Aug 06 00:45:22.711 [notice] Bandwidth soft limit reached; commencing hibernation. No new conncetions will be accepted
Aug 06 00:45:23.712 [notice] Going dormant. Blowing away remaining connections.

comment:14 Changed 6 years ago by BarkerJr

Here are the state files from yesterday (August 31) and today (September 1)

===========================================================

BarkerJrAtom2

AccountingBytesReadInInterval 5447680
AccountingBytesWrittenInInterval 5067776
AccountingExpectedUsage 1194
AccountingIntervalStart 2011-09-01 00:00:00
AccountingSecondsActive 7383
...
TorVersion Tor 0.2.2.31-rc (git-dd18fcbbae84a9ac)
...
LastWritten 2011-09-01 02:04:03

AccountingBytesReadInInterval 159062830080
AccountingBytesWrittenInInterval 161070214144
AccountingExpectedUsage 4826
AccountingIntervalStart 2011-08-01 00:00:00
AccountingSecondsActive 2606867
AccountingSecondsToReachSoftLimit 2234867
AccountingSoftLimitHitAt 2011-08-27 00:03:03
AccountingBytesAtSoftLimit 161068877824
...
TorVersion Tor 0.2.2.31-rc (git-dd18fcbbae84a9ac)
...
LastWritten 2011-08-31 07:24:03

=========================================================

BarkerJrAtom1

AccountingBytesReadInInterval 2039808
AccountingBytesWrittenInInterval 1733632
AccountingExpectedUsage 1194
AccountingIntervalStart 2011-09-01 00:00:00
AccountingSecondsActive 7367
...
TorVersion Tor 0.2.2.31-rc (git-dd18fcbbae84a9ac)
...
LastWritten 2011-09-01 02:03:47

AccountingBytesReadInInterval 159278124032
AccountingBytesWrittenInInterval 161066458112
AccountingExpectedUsage 7745
AccountingIntervalStart 2011-08-01 00:00:00
AccountingSecondsActive 2606860
AccountingSecondsToReachSoftLimit 2234859
AccountingSoftLimitHitAt 2011-08-27 00:02:46
AccountingBytesAtSoftLimit 161064982528
...
TorVersion Tor 0.2.2.31-rc (git-dd18fcbbae84a9ac)
...
LastWritten 2011-08-31 07:23:47

comment:15 Changed 6 years ago by Sebastian

ok here we go. When Tor is restarted, the AccountingSoftLimitHitAt line gets set to the startup time. That means that after a month we think it took almost the whole month, even tho it only took a few days in reality. thanks for BarkerJr for noticing this!

comment:16 Changed 6 years ago by BarkerJr

Here are the august logs:

Aug 01 00:00:00.302 [info] configure_accounting(): Accounting interval elapsed; starting a new one
Aug 01 00:00:00.326 [info] reset_accounting(): Starting new accounting interval.
Aug 01 00:00:00.326 [notice] Configured hibernation. This interval began at 2011-08-01 00:00:00; the scheduled wake-up time was 2011-08-01 00:00:00; we expect to exhaust our quota for this interval around 2011-09-01 00:00:00; the next interval begins at 2011-09-01 00:00:00 (all times local)
Aug 01 00:00:00.326 [notice] Hibernation period ended. Resuming normal activity.
Aug 01 00:00:00.326 [notice] Opening OR listener on 0.0.0.0:0
Aug 01 00:00:00.326 [notice] OR listener listening on port 55198.
Aug 01 00:00:45.451 [notice] I learned some more directory information, but not enough to build a circuit: We have only 96/2495 usable descriptors.
Aug 01 00:00:45.572 [notice] I learned some more directory information, but not enough to build a circuit: We have only 192/2495 usable descriptors.
Aug 01 00:00:45.639 [notice] I learned some more directory information, but not enough to build a circuit: We have only 288/2495 usable descriptors.
Aug 01 00:00:45.856 [notice] I learned some more directory information, but not enough to build a circuit: We have only 383/2495 usable descriptors.
Aug 01 00:00:45.925 [notice] I learned some more directory information, but not enough to build a circuit: We have only 479/2495 usable descriptors.
Aug 01 00:00:45.951 [notice] I learned some more directory information, but not enough to build a circuit: We have only 575/2495 usable descriptors.
Aug 01 00:00:46.010 [notice] We now have enough directory information to build circuits.
Aug 01 01:09:55.756 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Aug 01 01:09:57.695 [notice] Performing bandwidth self-test...done.
Aug 06 00:45:22.711 [notice] Bandwidth soft limit reached; commencing hibernation. No new conncetions will be accepted
Aug 06 00:45:23.712 [notice] Going dormant. Blowing away remaining connections.
Aug 07 15:12:44.888 [notice] Tor 0.2.2.30-rc (git-085c9754ccae6cae) opening log file.
Aug 07 15:12:45.005 [notice] Based on 1000 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 9 seconds.
Aug 07 15:12:45.005 [info] read_bandwidth_usage(): Reading bandwidth accounting data from state file
Aug 07 15:12:45.005 [info] read_bandwidth_usage(): Successfully read bandwidth accounting info from state written at 2011-08-07 12:56:20 for interval starting at 2011-08-01 00:00:00. We have been active for 564920 seconds in this interval. At the start of the interval, we expected to use about 82363 KB per second. (158892689408 bytes read so far, 161061789696 bytes written so far)
Aug 07 15:12:45.005 [info] configure_accounting(): Continuing accounting interval.
Aug 07 15:12:45.542 [notice] OpenSSL OpenSSL 1.0.0d 8 Feb 2011 looks like version 0.9.8m or later; I will try SSL_OP to enable renegotiation
Aug 07 15:12:45.871 [notice] Your Tor server's identity key fingerprint is 'BarkerJrAtom2 94E35409D98D1E43663999B9BF30A4982CD3A40E'
Aug 07 15:12:45.871 [notice] Configured hibernation. This interval began at 2011-08-01 00:00:00; the scheduled wake-up time was 2011-08-01 00:00:00; we expect to exhaust our quota for this interval around 2011-09-01 00:00:00; the next interval begins at 2011-09-01 00:00:00 (all times local)
Aug 07 15:12:46.105 [notice] Parsing GEOIP file /usr/local/share/tor/geoip.
Aug 07 15:13:00.688 [notice] Bootstrapped 45%: Asking for relay descriptors.
Aug 07 15:13:00.688 [notice] I learned some more directory information, but not enough to build a circuit: We have only 0/2607 usable descriptors.
Aug 07 15:13:00.896 [notice] Bootstrapped 50%: Loading relay descriptors.
Aug 07 15:13:00.896 [notice] I learned some more directory information, but not enough to build a circuit: We have only 14/2607 usable descriptors.
Aug 07 15:13:00.924 [notice] Bootstrapped 54%: Loading relay descriptors.
Aug 07 15:13:00.924 [notice] I learned some more directory information, but not enough to build a circuit: We have only 110/2607 usable descriptors.
Aug 07 15:13:00.951 [notice] Bootstrapped 59%: Loading relay descriptors.
Aug 07 15:13:00.951 [notice] I learned some more directory information, but not enough to build a circuit: We have only 205/2607 usable descriptors.
Aug 07 15:13:01.074 [notice] Bootstrapped 63%: Loading relay descriptors.
Aug 07 15:13:01.074 [notice] I learned some more directory information, but not enough to build a circuit: We have only 301/2607 usable descriptors.
Aug 07 15:13:01.240 [notice] Bootstrapped 67%: Loading relay descriptors.
Aug 07 15:13:01.240 [notice] I learned some more directory information, but not enough to build a circuit: We have only 397/2606 usable descriptors.
Aug 07 15:13:01.384 [notice] Bootstrapped 71%: Loading relay descriptors.
Aug 07 15:13:01.384 [notice] I learned some more directory information, but not enough to build a circuit: We have only 490/2606 usable descriptors.
Aug 07 15:13:01.710 [notice] Bandwidth soft limit reached; commencing hibernation. No new conncetions will be accepted
Aug 07 15:13:01.710 [notice] Going dormant. Blowing away remaining connections.
Aug 07 15:13:01.953 [notice] Bootstrapped 76%: Loading relay descriptors.
Aug 07 15:13:01.953 [notice] I learned some more directory information, but not enough to build a circuit: We have only 585/2604 usable descriptors.
Aug 07 15:13:01.975 [notice] We now have enough directory information to build circuits.
Aug 07 15:13:01.975 [notice] Bootstrapped 80%: Connecting to the Tor network.
Aug 16 23:59:45.153 [notice] Tor 0.2.2.30-rc (git-085c9754ccae6cae) opening log file.
Aug 16 23:59:45.442 [notice] Based on 1000 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 9 seconds.
Aug 16 23:59:45.442 [info] read_bandwidth_usage(): Reading bandwidth accounting data from state file
Aug 16 23:59:45.442 [info] read_bandwidth_usage(): Successfully read bandwidth accounting info from state written at 2011-08-16 23:04:01 for interval starting at 2011-08-01 00:00:00. We have been active for 1370720 seconds in this interval. At the start of the interval, we expected to use about 82363 KB per second. (158963741696 bytes read so far, 161065438208 bytes written so far)
Aug 16 23:59:45.442 [info] configure_accounting(): Continuing accounting interval.
Aug 16 23:59:45.981 [notice] OpenSSL OpenSSL 1.0.0d 8 Feb 2011 looks like version 0.9.8m or later; I will try SSL_OP to enable renegotiation
Aug 16 23:59:46.067 [notice] Your Tor server's identity key fingerprint is 'BarkerJrAtom2 94E35409D98D1E43663999B9BF30A4982CD3A40E'
Aug 16 23:59:46.067 [notice] Configured hibernation. This interval began at 2011-08-01 00:00:00; the scheduled wake-up time was 2011-08-01 00:00:00; we expect to exhaust our quota for this interval around 2011-09-01 00:00:00; the next interval begins at 2011-09-01 00:00:00 (all times local)
Aug 16 23:59:46.282 [notice] Parsing GEOIP file /usr/local/share/tor/geoip.
Aug 16 23:59:55.332 [notice] We now have enough directory information to build circuits.
Aug 16 23:59:55.332 [notice] Bootstrapped 80%: Connecting to the Tor network.
Aug 16 23:59:56.332 [notice] Bandwidth soft limit reached; commencing hibernation. No new conncetions will be accepted
Aug 16 23:59:56.333 [notice] Going dormant. Blowing away remaining connections.
Aug 27 00:02:23.848 [notice] SIGINT received while hibernating; exiting now.
Aug 27 00:03:01.595 [notice] Tor 0.2.2.31-rc (git-dd18fcbbae84a9ac) opening log file.
Aug 27 00:03:01.601 [notice] Based on 1000 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 9 seconds.
Aug 27 00:03:01.601 [info] read_bandwidth_usage(): Reading bandwidth accounting data from state file
Aug 27 00:03:01.601 [info] read_bandwidth_usage(): Successfully read bandwidth accounting info from state written at 2011-08-27 00:02:23 for interval starting at 2011-08-01 00:00:00. We have been active for 2234867 seconds in this interval. At the start of the interval, we expected to use about 82363 KB per second. (159034652672 bytes read so far, 161068877824 bytes written so far)
Aug 27 00:03:01.601 [info] configure_accounting(): Continuing accounting interval.
Aug 27 00:03:01.676 [notice] OpenSSL OpenSSL 1.0.0d 8 Feb 2011 looks like version 0.9.8m or later; I will try SSL_OP to enable renegotiation
Aug 27 00:03:01.752 [notice] Your Tor server's identity key fingerprint is 'BarkerJrAtom2 94E35409D98D1E43663999B9BF30A4982CD3A40E'
Aug 27 00:03:01.752 [notice] Configured hibernation. This interval began at 2011-08-01 00:00:00; the scheduled wake-up time was 2011-08-01 00:00:00; we expect to exhaust our quota for this interval around 2011-09-01 00:00:00; the next interval begins at 2011-09-01 00:00:00 (all times local)
Aug 27 00:03:01.752 [notice] Parsing GEOIP file /usr/local/share/tor/geoip.
Aug 27 00:03:02.793 [notice] We now have enough directory information to build circuits.
Aug 27 00:03:02.793 [notice] Bootstrapped 80%: Connecting to the Tor network.
Aug 27 00:03:03.794 [notice] Bandwidth soft limit reached; commencing hibernation. No new conncetions will be accepted
Aug 27 00:03:03.794 [notice] Going dormant. Blowing away remaining connections.
Sep 01 00:00:00.600 [info] configure_accounting(): Accounting interval elapsed; starting a new one
Sep 01 00:00:00.621 [info] reset_accounting(): Starting new accounting interval.
Sep 01 00:00:00.621 [notice] Configured hibernation. This interval began at 2011-09-01 00:00:00; the scheduled wake-up time was 2011-09-01 00:00:00; we expect to exhaust our quota for this interval around 2011-10-01 00:00:00; the next interval begins at 2011-10-01 00:00:00 (all times local)
Sep 01 00:00:00.621 [notice] Hibernation period ended. Resuming normal activity.
Sep 01 00:00:00.621 [notice] Opening OR listener on 0.0.0.0:0
Sep 01 00:00:00.621 [notice] OR listener listening on port 38837.
Sep 01 00:00:00.836 [notice] Bootstrapped 85%: Finishing handshake with first hop.
Sep 01 00:00:02.172 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Sep 01 00:00:05.446 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Sep 01 00:00:05.446 [notice] Bootstrapped 100%: Done.
Sep 01 00:00:05.446 [notice] Now checking whether ORPort 74.207.254.217:38837 is reachable... (this may take up to 20 minutes -- look for log messages indicating success)
Sep 01 00:00:08.442 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Sep 01 00:00:09.660 [notice] We're missing a certificate from authority with signing key F7C7B9191C74C0BA07363C84D37BBAD3A8A6C6D8: launching request.
Sep 01 00:00:09.660 [notice] We're missing a certificate from authority with signing key 604834622B54F2D9BA39B34AC53924546733AA60: launching request.
Sep 01 00:00:13.703 [notice] Performing bandwidth self-test...done.

comment:17 Changed 6 years ago by Sebastian

Owner: set to Sebastian
Status: needs_informationassigned

Fix in branch bug2003. Think this is a backport candidate, it's a regression against 0.2.1.x.

Also, lol trac. it doesn't allow me to put this into needs_review.

comment:18 Changed 6 years ago by Sebastian

Status: assignedneeds_review

comment:19 Changed 6 years ago by nickm

Okay, this is tricky stuff. The code *looks* ok to me, so here's what I think we should do.

First off, we should rebase this onto maint-0.2.2. Also, I think DEFAULT should get renamed to "STARTING" or "INITIAL" or something.

Second, we should review it again, to make sure there are no remaining checks for LIVE/DORMANT that shouldn't be extended to allow "DEFAULT".

Third, we should merge it into master, and try it out in 0.2.3.x for at least one or two relases. If that works out, we should merge it into maint-0.2.2 as well.

Does that sound plausible?

comment:20 Changed 6 years ago by Sebastian

Yes, that sounds ok. I will get to the rebasing and renaming. I'll go with INITIAL. I tested this in the following scenarios (only on clients, never on relays):

  • First start, empty state file, no accounting configured. No problems at startup.
  • First start, empty state file, accounting configured. No problems at startup, accounting enabled log message, state file gets updated.
  • Second start after not exhausting bandwidth, startup with expected log messages, traffic allowed.
  • Ran until soft limit was reached, hibernation commenced as expected
  • started after soft limit was reached, expected log messages, state file NOT updated with latest time - all good.

The last scenario made me realize this fixes another bug: We used to open ports etc even if we would then go on to close them immediately afterwards, because we had reached our quota already.

comment:21 Changed 6 years ago by Sebastian

(We should get BarkerJr to run 0.2.3.x once this is merged. Nobody else ever reported this, I don't expect them to suddenly report it in 0.2.3.x even if it is still broken)

Updated branch in bug2003_rebased in my repo. Merges into master without conflicts.

comment:22 Changed 6 years ago by Sebastian

Alternatively we could get BarkerJr to just try the branch on 0.2.2.x. That'd work too.

comment:23 Changed 6 years ago by nickm

Okay. Since 0.2.3.3-alpha is supposed to come out today, let's merge this one into 0.2.3.4-alpha once today's release is tagged.

Thanks again!

Also, the idea about asking BarkerJr to try the branch on 0.2.2.x is a good one IMO.

comment:24 Changed 6 years ago by Sebastian

grmpf, this breaks the unit tests. Not sure how I didn't notice. BarkerJr did.

comment:25 Changed 6 years ago by Sebastian

The problem is that we_are_hibernating() returns true until we're out of HIBERNATE_STATE_INITIAL. I think this is the *right* behavior for normal operation. But I don't really like the idea to introduce special behavior for the unit tests. Hrm. Ideas?

comment:26 Changed 6 years ago by nickm

Have the unit tests -- possibly somewhere in main() -- manually exit hibernate_state_initial ?

comment:27 in reply to:  26 Changed 6 years ago by nickm

Replying to nickm:

Have the unit tests -- possibly somewhere in main() -- manually exit hibernate_state_initial ?

err, have the unit tests manually _set_ the state to somehing other than hibernate_state_initial. Perhaps only do this in places where the fact that we aren't hibernating matters.

comment:28 Changed 6 years ago by nickm

Unit tests are fixed in branch "bug2003_nm" in my public repository. Could use another set of eyes on 'em.

comment:29 Changed 6 years ago by nickm

Merged "bug2003_nm" to master; leaving this open for merge to 0.2.2 once this has been tested in an alpha.

comment:30 Changed 6 years ago by nickm

It turned out to break stuff; see bug #4217 . We should only merge to 0.2.2 it if we then immediately backport b4bd836f46549b6

comment:31 Changed 6 years ago by Sebastian

Actually, it seems quite possible that b4bd836f46549b6 is the only commit we'd need to backport. It's a much simpler fix for this bug than my original patch

comment:32 Changed 6 years ago by Sebastian

This is all a bit sad. b4bd836f46549b6 (the #4217 fix) broke this again. The problem is that we don't parse accounting stuff and set it up until we've dropped permissions (we currently do it in options_act(), and we open listeners in options_act_reversible() which gets called before options_act(). We will definitely need to change this order, otherwise we will always have situations where we open ports needlessly. Once we have that order changed we can keep b4bd836f46549b6, and revert all the stuff we did for this bug so far. bleh.

comment:33 Changed 6 years ago by Sebastian

Status: needs_reviewassigned

comment:34 Changed 6 years ago by Sebastian

yuck yuck yuck.

We have options_act_reversible() and options_act(), the former is called before the latter. Inside options_act_reversible() we drop privileges, and inside options_act() we configure_accounting(). This is bad, because we have to have accounting configured before we can decide whether to open listeners, and we have to do that before we drop privileges. But to configure_accounting(), we need to have our state file parsed, and we currently do that after dropping privs for a good reason...

Any smart ideas?

comment:35 Changed 6 years ago by nickm

Sounds like we might need to rejigger options_act vs options_act_reversible? Or Hm.

We never actually did a good job of supporting the case where we have accounting set and we start as root in order to bind low ports. If we admit that we don't allow that, does stuff get easier?

comment:36 Changed 6 years ago by Sebastian

Milestone: Tor: 0.2.2.x-finalTor: 0.2.3.x-final
Priority: normalmajor

Reassigning this to 0.2.3.x. This has too many tricky failure modes that start to slowly trickle in, but we should try to really fix it this time. I'll provide a revert of what we committed to far, and a new approach.

comment:37 Changed 6 years ago by Sebastian

Just chatted with nick about this. The consensus seemed to be that we should be smart about opening ports when we're root and will later drop privileges. I'm identifying these cases:

We don't have accounting enabled: Open ports right away.

We have accounting configured and (we're not root, or we're not dropping privileges, or we're not trying to open a low port): Let's delay opening ports, because we can still do it when we're ready to parse the state file and see if we should've been hibernating or not.

We have accounting configured and we're root and we want to drop privileges and we're opening a low port: Open ports now. The user already has a warning in their log telling them that if we ever go into hibernation, their relay will likely not be able to reopen ports. In this case, we will want a mechanism in place that doesn't clobber the state file with information that we just started hibernating when we've successfully parsed the state file and learned that we should've been hibernating all along, so that when the user fixes their configuration they still have a sane state file.

comment:38 Changed 6 years ago by Sebastian

Oh. And we all think that it would be great if we had some posix capabilities support. But that is for bug #918 and out of scope here imo.

comment:39 Changed 6 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified
Type: defectenhancement

I am calling this an enhancement for later. The "right" behavior that Sebastian describes above would be weird and fiddly, and would IMO risk destabilizing current Tors more than it would benefit. We should do something here, but not in 0.2.3.x imo.

comment:40 Changed 5 years ago by nickm

Keywords: tor-relay added

comment:41 Changed 5 years ago by nickm

Component: Tor RelayTor

comment:42 Changed 5 months ago by nickm

Keywords: bandwidth accounting hibernate added
Priority: HighMedium
Severity: Normal
Note: See TracTickets for help on using tickets.