The circuit timeout prediction is not working properly
During prop224 service testing (#20657 (moved)), I've encountered a weird behavior. Tor will start closing circuits with circuit_expire_old_circuits_clientside()
and then open new ones for internal use (prediction) but set their timeout to 1 sec which leads to closing the circuit 30 sec later (NewCircuitPeriod
defaults to 30 sec).
Condensed log info:
Aug 03 19:59:30.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 320 that has been unused for 27997 msec.
Aug 03 19:59:30.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 318 that has been unused for 29997 msec.
Aug 03 19:59:30.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 319 that has been unused for 28979 msec.
Aug 03 19:59:30.000 [info] circuit_predict_and_launch_new(): Have 0 clean circs (0 internal), need another internal circ for my hidden service.
Aug 03 19:59:30.000 [info] origin_circuit_new(): Circuit 321 chose an idle timeout of 1 based on 0 seconds of predictive building remaining.
...
Aug 03 20:00:01.000 [info] circuit_expire_old_circuits_clientside(): Closing circuit 321 that has been unused for 30991 msec.
So notice the circuit 321 with a timeout of 1 sec and then being closed 30 sec later... Basically, it's looping like that non stop. What I think is happening is this:
predicted_ports_prediction_time_remaining()
returns 0 so the following computation always results in 1 sec (in origin_circuit_new()
):
int prediction_time_remaining =
predicted_ports_prediction_time_remaining(time(NULL));
circ->circuit_idle_timeout = prediction_time_remaining+1+
crypto_rand_int(1+prediction_time_remaining/20);
We call rep_hist_note_used_internal()
in the hs subsystem when we launch intro point and rendezvous circuits. That function sets last_prediction_add_time = now
. Which is what we want because then predicted_ports_prediction_time_remaining()
computes a idle_delta
that is below the prediction_timeout
(set between 30 and 60 mins by default, see channelpadding_get_circuits_available_timeout()
).
Which means that after let's say 61 min for a prediction_timeout = 60min
, idle_delta
becomes bigger than prediction_timeout
and thus returning 0 everytime ultimately making every new circuit timeout to 1 sec. See why below:
int
predicted_ports_prediction_time_remaining(time_t now)
{
time_t idle_delta = now - last_prediction_add_time;
/* Protect against overflow of return value. This can happen if the clock
* jumps backwards in time. Update the last prediction time (aka last
* active time) to prevent it. This update is preferable to using monotonic
* time because it prevents clock jumps into the past from simply causing
* very long idle timeouts while the monotonic time stands still. */
if (last_prediction_add_time > now) {
last_prediction_add_time = now;
idle_delta = 0;
}
/* Protect against underflow of the return value. This can happen for very
* large periods of inactivity/system sleep. */
if (idle_delta > prediction_timeout)
return 0;
[RIGHT HERE, we return 0]
...
This is pretty bad because it means that every HS that sees no client for at least 30 to 60 min, will enter this loop of create/close circuits raising flags at the Guard but also putting pressure on the network.
Seems that this was introduced with the channel padding in tor-0.3.1.1-alpha.