Opened 7 years ago

Closed 6 years ago

#8164 closed enhancement (worksforme)

Validate flag-thresholds in consensus-health checker

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Core Tor/DocTor Version:
Severity: Keywords:
Cc: asn, arma, nickm, mikeperry, atagar Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As of yesterday, votes may contain flag-thresholds lines like this one from moria1 (line breaks only added here):

flag-thresholds stable-uptime=693369
                stable-mtbf=153249
                fast-speed=40960
                guard-wfu=94.669%
                guard-tk=691200
                guard-bw-inc-exits=174080
                guard-bw-exc-exits=184320
                enough-mtbf=1

The consensus-health checker can look at these values and warn if one of them gets too high or too low. What values should it consider normal, and when should it begin warning?

Child Tickets

Attachments (6)

flag-thresholds-2013-02-06.png (116.4 KB) - added by karsten 7 years ago.
flag-thresholds-2013-02-18.png (181.6 KB) - added by karsten 7 years ago.
flag-thresholds-2013-03-05.png (197.0 KB) - added by karsten 7 years ago.
flag-thresholds-2013-04-09.png (217.4 KB) - added by karsten 6 years ago.
flag-thresholds-2013-04-11.png (199.7 KB) - added by karsten 6 years ago.
flag-thresholds-2013-08-15.png (309.0 KB) - added by karsten 6 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 Changed 7 years ago by asn

Hm, I don't really know which static values we should consider normal in this case. Maybe #8145 is kind of related?

I guess that in an ideal future we would have some kind of anomaly detection (a dynamic system) to find abnormalities in those values. (Although, the current anomaly detection system we have for censorship events does not work too well, does it?)

comment:2 in reply to:  1 Changed 7 years ago by karsten

Replying to asn:

Hm, I don't really know which static values we should consider normal in this case. Maybe #8145 is kind of related?

Only if you can translate that into upper/lower bounds I can write into the consensus-health checker code.

I guess that in an ideal future we would have some kind of anomaly detection (a dynamic system) to find abnormalities in those values. (Although, the current anomaly detection system we have for censorship events does not work too well, does it?)

I hope static bounds will do the trick for now. I agree that we're not very good at writing anomaly detection systems.

I'm attaching a plot of flag thresholds reported by moria1 and gabelmoo, which I'm going to renew in 1 week and in 2 weeks. Then we can define bounds when we want to get notified.

Changed 7 years ago by karsten

Changed 7 years ago by karsten

comment:3 Changed 7 years ago by karsten

Status: newneeds_information

Just added a new graph. The values for stable-mtbf and guard-wfu deviate more than expected, which may well be a bug we simply didn't see before. I think we should wait for the other authorities to upgrade to 0.2.4.10-alpha-dev or higher and report these values, too. Then we can define thresholds for the consensus-health checker to warn.

comment:4 in reply to:  3 ; Changed 7 years ago by asn

Replying to karsten:

Just added a new graph. The values for stable-mtbf and guard-wfu deviate more than expected, which may well be a bug we simply didn't see before. I think we should wait for the other authorities to upgrade to 0.2.4.10-alpha-dev or higher and report these values, too. Then we can define thresholds for the consensus-health checker to warn.

Interesting graphs!

How come you graphed only those three authorities? Do they all run different versions of Tor? The stable_mtbf and guard_wfu graphs are kind of weird, indeed.

comment:5 in reply to:  4 Changed 7 years ago by karsten

Replying to asn:

How come you graphed only those three authorities? Do they all run different versions of Tor? The stable_mtbf and guard_wfu graphs are kind of weird, indeed.

These three are the only authorities running recent enough Tor versions to report flag thresholds. Once the other authorities upgrade, they'll be included in the graphs, too.

comment:6 in reply to:  4 Changed 7 years ago by arma

Replying to asn:

The stable_mtbf and guard_wfu graphs are kind of weird, indeed.

#8218 can explain a bit of it. I'm sure there's more too.

Changed 7 years ago by karsten

comment:7 Changed 7 years ago by karsten

Updated graph.

Changed 6 years ago by karsten

comment:8 Changed 6 years ago by karsten

Cc: mikeperry added

Updated the graph once more. Finally, we have all nine authorities reporting their flag thresholds, with interesting results. A few observations with respect to finding lower/upper bounds for what the consensus-health checker should consider normal:

  • The mean stable_uptime of most authorities is around 7.2 days (620000 seconds), whereas turtle's mean stable_uptime is 17.6 days. What's up with turtles, and should we still consider those values normal? How about 5 and 20 (or 10?) days as lower and upper bound to catch extreme values?
  • I can hardly see a stable state in stable_mtbf. Without turtles, I'd say that gabelmoo, moria1, and dannenberg are heading somewhere, but that process takes very long, probably too long for continuous consensus-health warnings. How about 1 second as lower bound and 3e+6 seconds (34.7 days) as upper bound to see what turtles is up to?
  • fast_speed looks quite stable, well, except for turtles. I'd say 25 and 75 kB/s would be good lower/upper bounds. But what turtles sets there seems too low.
  • guard_wfu looks okay. We could probably set a lower bound of 90 to learn about extremes (and 99.99 as upper bound, just to learn when authorities become too demanding).
  • guard_tk takes a while to get stable after authorities went down for some time (which is what I think was the case with dizum). We could warn about values below 4e+05 seconds (4.6 days) and above 8e+05 seconds (9.3 days).
  • guard_bw_inc_exits and guard_bw_exc_exits look quite stable, too. But what is turtles doing there? Without turtles, I'd say 1e+05 and 3e+05 are fine lower/upper bounds.
  • enough_mtbf looks like it's fine with a lower and upper bound of 1, so that we learn when it goes down to 0.

Do these limits make any sense? And what's the reason for turtles behaving different?

comment:9 in reply to:  8 ; Changed 6 years ago by arma

Replying to karsten:

And what's the reason for turtles behaving different?

I think turtles is running the #8273 code. moria1 just started running it.

Changed 6 years ago by karsten

comment:10 in reply to:  9 Changed 6 years ago by karsten

Replying to arma:

Replying to karsten:

And what's the reason for turtles behaving different?

I think turtles is running the #8273 code. moria1 just started running it.

Yup, moria1 is now doing the same thing as turtles. See the update graph.

comment:11 Changed 6 years ago by asn

Hi Karsten,

any chance we could have these graphs on metrics.torproject.org?
(I'd also be curious to see an up-to-date version of this graph.)

Changed 6 years ago by karsten

comment:12 Changed 6 years ago by karsten

See attached updated graph.

I agree that it would be neat to have this graph on the metrics website. However, I can't afford the time to write and maintain the additional code, and I don't think yatei can handle yet another thing to do. I'm happy to re-run the graphing script every now and then on my laptop if there's need for an updated graph. Sorry.

comment:13 Changed 6 years ago by karsten

Cc: atagar added

atagar, is this something you want to do in your Python DocTor? If not, I'd close this ticket, because I'm not working on the Java DocTor anymore. Thanks!

comment:14 Changed 6 years ago by atagar

Hi Karsten. I'd be happy to add flag-thresholds checks to DocTor if we both have defined values, and you can tell me what they indicate the authority operator should do (DocTor checks should be actionable, otherwise they aren't terribly helpful). Stem already checks the constraints on parameter values since those have defined bounds in the dir-spec, but the earlier correspondance didn't seem to settle on concrete ranges for flag-thresholds.

comment:15 Changed 6 years ago by karsten

Component: Metrics WebsiteDocTor

comment:16 Changed 6 years ago by karsten

Good points. Actually, I don't really know what an authority operator would do in such a case. Feel free to close, or to leave open at minor or trivial priority.

comment:17 Changed 6 years ago by atagar

Resolution: worksforme
Status: needs_informationclosed

Resolving for now. Feel free to reopen if anyone would like to discuss this further.

Note: See TracTickets for help on using tickets.