AccountingStart defaults to 0:00 local time. This results in may relays waking up from hibernation at the same second.
What about randomizing the default value on first start with an AccountingMax config?
(write the randomized time to disk and read that file the next time tor starts)
here some numbers (from relays probably using a daily quota):
AccountingStart is the time at which the period starts, not the time at which relays wake up. The wakeup time is determined by estimating our bandwidth, and trying to pick a random start point that will still allow us to consume all our accountingbytes.
Is there a place where the documentation explains this badly?
The calculation is done in accounting_set_wakeup_time(). For more information, see the big comment in hibernate.c , near the start.
Is the calculation not working correctly for these relays?
AccountingStart is the time at which the period starts, not the time at which relays wake up.
lets make sure we mean the same thing when saying 'wake up':
wake up = relay starts to relay traffic again/publishes a new descriptor where the hibernate flag is not set
I understood AccountingStart as the time when the relay starts to relay traffic based on onionoo data.
Maybe I'm wrong, but I'll explain how I came to that conclusion.
Lets have a look at yesterday's data.
There were 59 relays restarting at 2015-08-03 22:00:00 UTC.
by 2015-08-04 02:00:00 (last_seen, hibernate=1) 12 relays were hibernating already (I assume: they used up their accountingmax already by that time, 2 relays somewhere between 0:00-1:00 and 10 relays somewhere between 1:00 - 2:00).
@nick: If I understood you correctly, a relay should wake up around ~20:00 UTC if his calculations say that it will take 4 hours to eat up the accountingmax traffic (and his accountingstart is at 00:00 UTC).
Onionoo data says otherwise or am I misinterpreting it?
@teor: onionoo's 'last_restarted' field has second granularity (unlike first_seen and last_seen which are consensus timestamps with 1 hour granularity)
I wanted to know what the ratio between hibernating relays where
interval_wakeup_time = interval_start_time; (identified by MM:SS = 00:00) and relays where
the start_time is not wakeup_time.
For 2015-08-04 it is:
43 (wakeup_time != start_time) vs 71 (wakeup_time=start_time)
A relay operator running 4 of these hibernating relays provided me with the output of
grep 'Configured hibernation.' /var/log/tor/log
All 4 relays say:
Aug 04 06:25:09.000 [notice] Configured hibernation. This interval began at 2015-08-04 00:00:00; the scheduled wake-up time was 2015-08-04 00:00:00; we expect to exhaust our quota for this interval around 2015-08-05 00:00:00; the next interval begins at 2015-08-05 00:00:00 (all times local)
In reality it took them between 9 and 15 hours to exhaust the quota.
So the approach to prevent them all from starting at the same time would be to randomize the interval start time? In that case we would distribute wakeups more evenly even if relays are bad at estimating their bandwidth usage.
If the estimate is inaccurate, why not try to fix the estimate, at least as a first step?
Have we confirmed that the estimate is inaccurate on a consistent basis?
Given that the bandwidth authorities are currently thrashing about, that could be causing the inaccuracy at the moment.
I agree that randomising the lower-order components of the period would mitigate the thundering herd wake issue, but 100/5000 relays is not really a herd.
So we'd have to decide whether the unpredictable behaviour would be worthwhile, and outweigh the existing assumption of a 00:00 interval start time.
When I configured hibernation, I depended on the fact that the changeover time was 00:00, as that was the time that the VPS' free quota was reset.
Changing the behaviour for existing configs would be a really bad idea, if it led to people exceeding their quotas due to unpredictable interval start times, where those start times overlapped poorly with the charging intervals on the VPS.
(For example, if 11:39 was chosen at random, I could have had almost two periods' worth of usage in the one charging period, if the wake time was late one day, and early the next. This would have been expensive for me.)
If the estimate is inaccurate, why not try to fix the estimate, at least as a first step?
I just assumed that having accurate estimates is harder.
Have we confirmed that the estimate is inaccurate on a consistent basis?
Depends on what accuracy you are aiming at.
Currently out of 59 relays 50 exhausted their relays >3hours before interval start time.
Given that the bandwidth authorities are currently thrashing about, that could be causing the inaccuracy at the moment.
I agree that randomising the lower-order components of the period would mitigate the thundering herd wake issue, but 100/5000 relays is not really a herd.
So we'd have to decide whether the unpredictable behaviour would be worthwhile, and outweigh the existing assumption of a 00:00 interval start time.
When I configured hibernation, I depended on the fact that the changeover time was 00:00, as that was the time that the VPS' free quota was reset.
Changing the behaviour for existing configs would be a really bad idea, if it led to people exceeding their quotas due to unpredictable interval start times, where those start times overlapped poorly with the charging intervals on the VPS.
(For example, if 11:39 was chosen at random, I could have had almost two periods' worth of usage in the one charging period, if the wake time was late one day, and early the next. This would have been expensive for me.)
I've no strong opinion on how and if that behavior gets changed, we can also simply send an email to tor-relays to ask ops to changed their accountingstart time if they wish to distribute restarts more evenly.
If the estimate is inaccurate, why not try to fix the estimate, at least as a first step?
Have we confirmed that the estimate is inaccurate on a consistent basis?
Given that the bandwidth authorities are currently thrashing about, that could be causing the inaccuracy at the moment.
I agree that randomising the lower-order components of the period would mitigate the thundering herd wake issue, but 100/5000 relays is not really a herd.
So we'd have to decide whether the unpredictable behaviour would be worthwhile, and outweigh the existing assumption of a 00:00 interval start time.
When I configured hibernation, I depended on the fact that the changeover time was 00:00, as that was the time that the VPS' free quota was reset.
Changing the behaviour for existing configs would be a really bad idea, if it led to people exceeding their quotas due to unpredictable interval start times, where those start times overlapped poorly with the charging intervals on the VPS.
(For example, if 11:39 was chosen at random, I could have had almost two periods' worth of usage in the one charging period, if the wake time was late one day, and early the next. This would have been expensive for me.)
When using the suggested method (random value generated by the relay once) then this problem does not occur, right?
Anyway I'll just write a short email to tor-relays and we can close this ticket.
When using the suggested method (random value generated by the relay once) then this problem does not occur, right?
No, I think the problem can still occur. If my provider charges me for going over my quota on a given day (midnight to midnight), but from Tor's perspective there are two intervals that overlap with that day, then I could end up spending most of my bandwidth in the later half of the first interval, and most of it in the early half of the second interval, and now I spent twice as much as I wanted to on the day.
In sum, letting operators know that they can change the 00:00 is a fine thought, if they do it with knowledge of what's going on inside Tor.
Seems to me that the better answer is to make Tor better at predicting how much bandwidth it will take on a day, so it can start up at more random times.
Do we think there are bugs in the current prediction algorithm, or is it just the case that relays often don't have any data from the previous day?