Opened 4 months ago

Closed 5 weeks ago

#30798 closed enhancement (invalid)

Develop and deploy tgen model resembling ping

Reported by: karsten Owned by: metrics-team
Priority: High Milestone:
Component: Archived/Onionperf Version:
Severity: Normal Keywords:
Cc: metrics-team, acute Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


At last week's tor-scaling meeting we discussed developing a second tgen model that resembles a ping service and deploying an OnionPerf instance with that model.

The current default tgen model in OnionPerf makes a new download every five minutes. That's a tiny request with a response of 50 KiB or 1 MiB or 5 MiB.

This new model would send a tiny request once per second for, say, five minutes, and receive a tiny response back to each of these requests.

We wouldn't have to write analysis code that produces something like a .tpf file right now but could start with analyzing the raw logs for this experiment and extract some hopefully useful visualizations.

I could deploy this new model on my local machine (if it uses an onion service).

Raising priority to high, because it would be great to ideally get this deployed before All Hands.


Child Tickets

Change History (2)

comment:1 Changed 4 months ago by irl

This model would be looking at modeling something close to ICMP ping, but it's quite an approximation. In Internet Engineering, where you're dealing with packet switched networks, ping can be useful to determine both round trip times (average, min, max, jitter) and packet loss. In our case we are dealing with virtual circuits overlaid onto a packet switched network. This means that we're only going to get round trip times, as if there is packet loss we will just see the circuit go down and won't be able to probe further.

We can still do all the round trip time metrics:

  • average - how long does it typically take to get a reply?
  • min - upper bound for latency when network is unloaded
  • max - lower bound for latency when network is loaded
  • max minus min - lower bound for maximum load induced queuing delay
  • jitter - variation in latency

The second probe does depend on the first probe in a way that is not the case for ICMP. Being part of the same stream means that various counters and timers are going to be linked between probes. We should explicitly acknowledge this, work out what those counters/timers are (somewhere in the rate limiting code) and decide if we are affected by them or not.

If there is something like Nagle's algorithm going on then we should ensure that we're getting all the right flushes in and that we're not ending up batching our requests.

From discussion with Ana this should be relatively easy to implement the model, and all the data would be captured in the tgen log for analysis. I'm not sure if it is easier to hack something together, or to use the existing framework, which could be flexible enough to be modified simply.

If we're getting numbers that look reasonable then this is probably enough for the upcoming meeting but I'd want to make sure we're not confusing what these numbers mean. This work may also be useful for scaling in general to better understand what counters/timers exist. Maybe we can work with a network team person to understand this.

In the future it may also be interesting to port RRUL to a tgen model. That's not a cheap test to run though.

comment:2 Changed 5 weeks ago by irl

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.