tor can overrun its accountingmax if it enters soft hibernation first

Closed (moved) created Oct 16, 2019 by Roger Dingledine

I'll put the punchline first: second_elapsed_callback(), which is where we check if it's time to go dormant for hibernation, is no longer called when we've entered soft hibernation.

I assume this is because of the new "periodic event flag" feature, where we try to avoid calling callbacks when we're in a state that won't need them. See the "online and active" note here:

  /* This is a legacy catch-all callback that runs once per second if
   * we are online and active. */
  CALLBACK(second_elapsed, NET_PARTICIPANT,
           FL(NEED_NET)|FL(RUN_ON_DISABLE)),

The impact is limited, since we stop accepting new connections and new circuits when we enter soft hibernation, but it can still be bad: existing connections and circuits can last for a long time and use a lot of bandwidth.

A secondary impact is that accounting_run_housekeeping() never gets called, which means that the state file never gets updated after we've entered soft hibernation, which means these bandwidth overspends are never recorded to disk.

I think the bug went in during commit 4bf79fa4f which is part of Tor 0.4.0.1-alpha.

The PERIODIC_EVENT_FLAG_NEED_NET flag (what FL(NEED_NET) expands into) checks net_is_disabled(), but there is another function right after net_is_disabled in netstatus.c called net_is_completely_disabled(), and the only difference is that it checks we_are_fully_hibernating() vs we_are_hibernating().

I confirmed the overall bug happens in practice: there's a relay operator in #tor who hit soft hibernation, and then saw his tor proceed to use more bytes than expected. I had him do a 'gdb attach' to his tor and break on 'second_elapsed_callback' and the function never got called.

It seems like the immediate fix, and best backport plan, would be to resume calling second_elapsed_callback even when net_is_disabled(). The longer term plan can be to audit our calls to net_is_disabled() and net_is_completely_disabled() and we_are_hibernating(), with an eye towards "should we be doing this behavior while soft hibernating", and see what other bugs we find.

Activity

Please register or sign in to reply

tor can overrun its accountingmax if it enters soft hibernation first

Assignees

Labels

Milestone

Dates

Time tracking

Activity

tor can overrun its accountingmax if it enters soft hibernation first

Attributes

Assignees

Labels

Milestone

Dates

Time tracking

Activity