Opened 9 years ago

Closed 8 years ago

Last modified 7 years ago

#2004 closed defect (fixed)

Capping descriptor build times notice log

Reported by: Sebastian Owned by: mikeperry
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by Sebastian)

Recently we introduced a maximum circuit build time of twice what we ever observed, but when we do cap we emit a notice log, which can be quite spammy. We should not have all these cbt notice logs, I think - they don't really tell the user anything, they aren't necessarily an indication of a problem, etc.

Also, there seems to be a bug:

Oct 04 19:38:05.311 [notice] Circuit build measurement period of 118361ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:38:05.311 [notice] Based on 162 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 10 seconds.
Oct 04 19:38:10.075 [notice] Circuit build measurement period of 117281ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.

Oct 04 19:38:12.271 [notice] Circuit build measurement period of 117330ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:38:16.805 [notice] Circuit build measurement period of 120054ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:38:21.109 [notice] Circuit build measurement period of 117447ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:38:38.345 [notice] Circuit build measurement period of 122752ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:41:05.013 [notice] Circuit build measurement period of 135039ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:42:06.013 [notice] Circuit build measurement period of 148555ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:42:06.013 [notice] Based on 169 circuit times, it looks like we need to wait longer for circuits to finish. We will now assume a circuit is too slow to use after waiting 11 seconds.
Oct 04 19:44:08.013 [notice] Circuit build measurement period of 163425ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:45:09.013 [notice] Circuit build measurement period of 179783ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:45:09.013 [notice] Based on 171 circuit times, it looks like we need to wait longer for circuits to finish. We will now assume a circuit is too slow to use after waiting 12 seconds.
Oct 04 19:46:10.013 [notice] Circuit build measurement period of 197779ms is more than twice the maximum build time we have ever observed. Capping it to 116950ms.
Oct 04 19:46:10.013 [notice] Based on 172 circuit times, it looks like we need to wait longer for circuits to finish. We will now assume a circuit is too slow to use after waiting 13 seconds.

We never increase the value we're capping to, instead of doubling it.

Child Tickets

Change History (13)

comment:1 Changed 9 years ago by Sebastian

Description: modified (diff)

comment:2 Changed 9 years ago by nickm

Component: - Select a componentTor Client

comment:3 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final

comment:4 Changed 9 years ago by Sebastian

Just as a note to consider when debugging this: Maybe we log this notice and then don't actually count the timeout at all?

comment:5 Changed 9 years ago by arma

We cap it to twice the longest build time we have on record. So if the longest build time on record didn't change, the cap we use wouldn't change.

comment:6 Changed 9 years ago by Sebastian

Erm. So that means we don't go to gradually worse timeouts? What happens when you have an insanely low timeout, your network becomes worse, and almost all your circuits fail except really few don't, but we never have a circuit timing out slower than our slowest timeout so far, but not worse than twice as slow?

comment:7 Changed 9 years ago by nickm

Owner: set to mikeperry
Status: newassigned

comment:8 in reply to:  6 Changed 9 years ago by arma

Replying to Sebastian:

Erm. So that means we don't go to gradually worse timeouts?

We do, in the general case. This hack is to handle the case where the distribution that Mike decided best fit the timeouts decides that your "80% percentile" timeout is actually higher than any observed circuit build time. This edge case can happen when you have too many right censored inputs, i.e. when too many circuits take more than twice the timeout so we give up on them.

The real problem here is that the distribution Mike picked is more complex than it needs to be, leading to weird hacks to try to make it realistic again, like the one we're talking about here.

What happens when you have an insanely low timeout, your network becomes worse, and almost all your circuits fail except really few don't, but we never have a circuit timing out slower than our slowest timeout so far, but not worse than twice as slow?

I don't know. Would that situation ever happen? It seems a very funny-shaped situation. Why would no circuits ever finish in the [x, 2x] interval but we would still want to raise x? I guess we could fit this with another hack where we don't cap it if x is less than some number.

This whole timeout calculation algorithm is a mess. Mike, how can we go about simplifying it, ditching the right censoring part, etc? It doesn't have to be perfect according to the math, it has to be approximately correct and simple to analyze and verify.

comment:9 Changed 9 years ago by arma

Priority: normalmajor

comment:10 Changed 9 years ago by mikeperry

Status: assignedneeds_review

We can safely just demote this message to info.

The patch is in mikeperry/bug2004. I also clarified some other notice messages.

comment:11 Changed 9 years ago by arma

patch looks ok to me

comment:12 Changed 8 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Rebased onto maint-0.2.2, added a changes file, and merged. Thanks!

comment:13 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.