Opened 3 months ago

Closed 3 weeks ago

#29773 closed enhancement (fixed)

Include highest latency within 1.5 IQR of upper quartile in circuit round-trip latencies graph

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords: scalability
Cc: metrics-team Actual Points:
Parent ID: #29507 Points:
Reviewer: irl Sponsor:

Description

We have been asked to add graphs on (nearly) worst-case performance of our OnionPerf measurements, in addition to the average-case performance graphs we already have. In particular, we were asked to plot latency and bandwidth numbers. This ticket is about latency numbers. It's based on team-internal discussions in Brussels and follow-up discussions. This ticket is related to #29772.

We already have graphs on circuit round-trip latencies. They show median and interquartile range, as do most of our OnionPerf graphs. Now we're asked to add graphs on (nearly) worst-case latencies, so 90th or 95th or 99th percentile.

I'm attaching two graphs showing 99th percentile latency. As you can see, these graphs are highly susceptible to outliers. I don't really know how to fix that. I mean, we could plot 95th percentile and hope there won't be outliers in those, but there's no guarantee for that. We could use a log scale, but that will make the graph so much harder to interpret. Hmm.

The coding and deployment effort for bringing this graph on the Tor Metrics website would be really small, because we'd simply have to extend an existing database view that returns 25th, 50th, and 75th percentile to also return 90th, 95th, or 99th percentile.

Child Tickets

Attachments (3)

onionperf-nwc-latencies-public.png (110.8 KB) - added by karsten 3 months ago.
onionperf-nwc-latencies-onion.png (102.5 KB) - added by karsten 3 months ago.
onionperf-latencies-public-2019-05-25.png (174.9 KB) - added by karsten 3 weeks ago.

Download all attachments as: .zip

Change History (16)

Changed 3 months ago by karsten

Changed 3 months ago by karsten

comment:1 Changed 3 months ago by karsten



comment:2 Changed 2 months ago by karsten

Status: newneeds_review

comment:3 Changed 2 months ago by gaba

Keywords: scalability added

comment:4 Changed 2 months ago by irl

Status: needs_reviewneeds_revision

comment:5 Changed 4 weeks ago by karsten

Parent ID: #29507

Adding this new graph is part of the larger task to evaluate existing OnionPerf data regarding worst-case performance.

comment:6 Changed 3 weeks ago by karsten

Summary: Plot nearly worst-case circuit round-trip latencies to [public|onion] serverInclude highest latency within 1.5 IQR of upper quartile in circuit round-trip latencies graph

Updating the summary based on the discussion above to reflect our plan.

Changed 3 weeks ago by karsten

comment:7 Changed 3 weeks ago by karsten

Status: needs_revisionneeds_review

Here's a graph similar to the one I just added to #29772:


Please take a look!

By the way, there's not much we can do about that spike. It's in the data. There was some issue with op-ab on that date which I didn't investigate now. On the one hand it's annoying that this spike takes away so much space in the graph. On the other hand it's good that we do see such events in the graph and don't just smooth them away.

comment:8 Changed 3 weeks ago by irl

Status: needs_reviewneeds_revision

I know exactly what that issue was on op-ab and we can add it to the timeline. The FWSM in the Cisco Catalyst switch used as the router on that network ran out of memory and was running super slowly.

As with the bandwidths, I think this boxplot-style approach is really good at giving an overview of the measurements. I wonder if there's a name for this type of plot.

(needs_revision for the patch)

comment:9 Changed 3 weeks ago by karsten

Reviewer: irl
Status: needs_revisionneeds_review

Glad to hear you like this plot! I don't know if there's a name for it. I only know that drawing it required quite a bit of code for extracting values from PostgreSQL. There are functions for quantiles, but not for the new high/low values without outliers. There's also no single function in R/ggplot2 to draw this graph. So, even if this graph has been made before, it might not have made its way into the statistics tools yet.

Please review commit fd251d6 in my task-29773 branch.

(Earlier today I considered preparing the patch for #29772, but I'll instead work on another patch, because any revisions to this patch would also have to be made to #29772, and we shouldn't duplicate effort if we can avoid it.)

comment:10 Changed 3 weeks ago by irl

Status: needs_reviewmerge_ready

R, SQL looks reasonable although I did not run it. The descriptions look good.

comment:11 Changed 3 weeks ago by karsten

Thanks for looking! Merging and deploying now...

comment:12 Changed 3 weeks ago by karsten

Merged and deployed, except for the .jar file that I can only replace after the current update run is done later today. Keeping this ticket open until everything's deployed.

comment:13 Changed 3 weeks ago by karsten

Resolution: fixed
Status: merge_readyclosed

Everything looks good on the server. Closing.

Note: See TracTickets for help on using tickets.