The circuit timeout prediction is not working properly

During prop224 service testing (#20657 (moved)), I've encountered a weird behavior. Tor will start closing circuits with circuit_expire_old_circuits_clientside() and then open new ones for internal use (prediction) but set their timeout to 1 sec which leads to closing the circuit 30 sec later (NewCircuitPeriod defaults to 30 sec).

Condensed log info:

Aug 03 19:59:30.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 320 that has been unused for 27997 msec.
Aug 03 19:59:30.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 318 that has been unused for 29997 msec.
Aug 03 19:59:30.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 319 that has been unused for 28979 msec.
Aug 03 19:59:30.000 [info] circuit_predict_and_launch_new(): Have 0 clean circs (0 internal), need another internal circ for my hidden service.
Aug 03 19:59:30.000 [info] origin_circuit_new(): Circuit 321 chose an idle timeout of 1 based on 0 seconds of predictive building remaining.
...
Aug 03 20:00:01.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 321 that has been unused for 30991 msec.

So notice the circuit 321 with a timeout of 1 sec and then being closed 30 sec later... Basically, it's looping like that non stop. What I think is happening is this:

predicted_ports_prediction_time_remaining() returns 0 so the following computation always results in 1 sec (in origin_circuit_new()):

    int prediction_time_remaining =
      predicted_ports_prediction_time_remaining(time(NULL));
    circ->circuit_idle_timeout = prediction_time_remaining+1+
        crypto_rand_int(1+prediction_time_remaining/20);

We call rep_hist_note_used_internal() in the hs subsystem when we launch intro point and rendezvous circuits. That function sets last_prediction_add_time = now. Which is what we want because then predicted_ports_prediction_time_remaining() computes a idle_delta that is below the prediction_timeout (set between 30 and 60 mins by default, see channelpadding_get_circuits_available_timeout()).

Which means that after let's say 61 min for a prediction_timeout = 60min, idle_delta becomes bigger than prediction_timeout and thus returning 0 everytime ultimately making every new circuit timeout to 1 sec. See why below:

int
predicted_ports_prediction_time_remaining(time_t now)
{
  time_t idle_delta = now - last_prediction_add_time;

  /* Protect against overflow of return value. This can happen if the clock
   * jumps backwards in time. Update the last prediction time (aka last
   * active time) to prevent it. This update is preferable to using monotonic
   * time because it prevents clock jumps into the past from simply causing
   * very long idle timeouts while the monotonic time stands still. */
  if (last_prediction_add_time > now) {
    last_prediction_add_time = now;
    idle_delta = 0;
  }

  /* Protect against underflow of the return value. This can happen for very
   * large periods of inactivity/system sleep. */
  if (idle_delta > prediction_timeout)
    return 0;
    [RIGHT HERE, we return 0]
...

This is pretty bad because it means that every HS that sees no client for at least 30 to 60 min, will enter this loop of create/close circuits raising flags at the Guard but also putting pressure on the network.

Seems that this was introduced with the channel padding in tor-0.3.1.1-alpha.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information