Opened 17 months ago

Last modified 5 months ago

#30798 reopened enhancement

Develop and deploy tgen model resembling ping

Reported by: karsten Owned by: metrics-team
Priority: High Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords: metrics-team-roadmap-2020
Cc: metrics-team, acute Actual Points:
Parent ID: #33324 Points: 5
Reviewer: Sponsor: Sponsor59


At last week's tor-scaling meeting we discussed developing a second tgen model that resembles a ping service and deploying an OnionPerf instance with that model.

The current default tgen model in OnionPerf makes a new download every five minutes. That's a tiny request with a response of 50 KiB or 1 MiB or 5 MiB.

This new model would send a tiny request once per second for, say, five minutes, and receive a tiny response back to each of these requests.

We wouldn't have to write analysis code that produces something like a .tpf file right now but could start with analyzing the raw logs for this experiment and extract some hopefully useful visualizations.

I could deploy this new model on my local machine (if it uses an onion service).

Raising priority to high, because it would be great to ideally get this deployed before All Hands.


Child Tickets

Change History (7)

comment:1 Changed 17 months ago by irl

This model would be looking at modeling something close to ICMP ping, but it's quite an approximation. In Internet Engineering, where you're dealing with packet switched networks, ping can be useful to determine both round trip times (average, min, max, jitter) and packet loss. In our case we are dealing with virtual circuits overlaid onto a packet switched network. This means that we're only going to get round trip times, as if there is packet loss we will just see the circuit go down and won't be able to probe further.

We can still do all the round trip time metrics:

  • average - how long does it typically take to get a reply?
  • min - upper bound for latency when network is unloaded
  • max - lower bound for latency when network is loaded
  • max minus min - lower bound for maximum load induced queuing delay
  • jitter - variation in latency

The second probe does depend on the first probe in a way that is not the case for ICMP. Being part of the same stream means that various counters and timers are going to be linked between probes. We should explicitly acknowledge this, work out what those counters/timers are (somewhere in the rate limiting code) and decide if we are affected by them or not.

If there is something like Nagle's algorithm going on then we should ensure that we're getting all the right flushes in and that we're not ending up batching our requests.

From discussion with Ana this should be relatively easy to implement the model, and all the data would be captured in the tgen log for analysis. I'm not sure if it is easier to hack something together, or to use the existing framework, which could be flexible enough to be modified simply.

If we're getting numbers that look reasonable then this is probably enough for the upcoming meeting but I'd want to make sure we're not confusing what these numbers mean. This work may also be useful for scaling in general to better understand what counters/timers exist. Maybe we can work with a network team person to understand this.

In the future it may also be interesting to port RRUL to a tgen model. That's not a cheap test to run though.

comment:2 Changed 15 months ago by irl

Resolution: invalid
Status: newclosed

comment:3 Changed 8 months ago by gaba

Keywords: metrics-team-roadmap-2020Q1 added
Parent ID: #33324

comment:4 Changed 8 months ago by gaba

Resolution: invalid
Status: closedreopened

comment:5 Changed 7 months ago by gaba

Keywords: metrics-team-roadmap-2020April added; metrics-team-roadmap-2020Q1 removed

Move some of the tickets from last metrics roadmap to the roadmap in April.

comment:6 Changed 6 months ago by gaba

Keywords: metrics-team-roadmap-2020 added; metrics-team-roadmap-2020April removed
Points: 5
Sponsor: Sponsor59

comment:7 Changed 5 months ago by karsten

We briefly discussed the ping model as a possible use case for #29370, but then decided against implementing that ticket and instead develop a ping model as internal model where OnionPerf generates TGen files, plus analysis and visualization code, as part of this ticket, assuming there's need for developing such a model.

Note: See TracTickets for help on using tickets.