Opened 2 years ago

Closed 2 years ago

#22122 closed enhancement (implemented)

Add support for six new key-value pairs added by OnionPerf

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone: metrics-lib 1.7.0
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: robgjansen Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

OnionPerf adds six new key-value pairs to the .tpf format that Torperf/CollecTor did not produce: ENDPOINTLOCAL, ENDPOINTPROXY, ENDPOINTREMOTE, HOSTNAMELOCAL, HOSTNAMEREMOTE, and SOURCEADDRESS. We should add support for these keys to metrics-lib, so that we can start using their values.

Child Tickets

Change History (9)

comment:1 Changed 2 years ago by karsten

Status: newneeds_review

Please review my task-22122 branch. (Note to self: before merging, need to reformat JavaDocs a little and include @since tags.)

robgjansen, feel free to ignore the Java parts, but would you mind looking at the comments here and maybe suggest how to improve them? Thanks!

comment:2 Changed 2 years ago by robgjansen

They look good at a high level. I'm not sure how many details you want to include in the comments, so instead of editing your code comments directly, I will just give some explanation below and allow you to curate the details.

  /**
   * Return the TGen client socket name, IP address, and port, formatted as
   * <code>name:ip:port</code>, or <code>null</code> if the OnionPerf line
   * didn't contain that information. */
  public String getEndpointLocal();

The local endpoint is the TGen's client side socket. If a proxy is in use, which is always the case in OnionPerf when downloading over Tor, then the client side socket is the socket that is used to connect to the SOCKS proxy server that Tor runs. The name:ip:port may be NULL:0.0.0.0:0 if TGen was not able to find the info using getaddrinfo() and getnameinfo().

  /**
   * Return the proxy socket name, IP address, and port, formatted as
   * <code>name:ip:port</code>, or <code>null</code> if the OnionPerf line
   * didn't contain that information. */
  public String getEndpointProxy();

The proxy endpoint is the information that the TGen client uses to connect to the SOCKS proxy server that Tor runs. The name:ip:port may be NULL:0.0.0.0:0 if TGen was not able to find the info using getaddrinfo() and getnameinfo(). In OnionPerf, the SOCKS proxy is run on localhost at a well-known SOCKS port that was configured with Tor's SocksPort config option.

  /**
   * Return the TGen server host name, IP address, and port, formatted as
   * <code>hostname:ip:port</code>, or <code>null</code> if the OnionPerf line
   * didn't contain that information. */
  public String getEndpointRemote();

The remote endpoint is the information that the TGen client uses to connect to the remote server. This is the information that allows the SOCKS proxy to create a TCP connection from the exit relay to the TGen server. The name:ip:port may be NULL:0.0.0.0:0 if TGen was not able to find the info using getaddrinfo() and getnameinfo(), although in that case the download would not have succeeded because the exit relay would not know where to connect.

  /**
   * Return the client machine hostname, or <code>null</code> if the OnionPerf
   * line didn't contain that information. */
  public String getHostnameLocal();

This is the value returned by calling gethostname() on the TGen client. If that call failed, then I believe GLib uses (NULL) as the string.

  /**
   * Return the server machine hostname, or <code>null</code> if the OnionPerf
   * line didn't contain that information. */
  public String getHostnameRemote();

This is the value returned by calling gethostname() on the TGen server. The TGen server gets this value on its machine and sends it to the client as part of the header to the response to the clients transfer request. If that call failed, then I believe GLib uses (NULL) as the string.

  /**
   * Return the source IP address, or <code>null</code> if the OnionPerf line
   * didn't contain that information. */
  public String getSourceAddress();

This is collected by OnionPerf itself. This is the best guess of the public IP address of the OnionPerf client machine. This is collected by connecting to https://check.torproject.org/ and finding the IP address in the result. If that fails, the backup method is to connect to Google DNS server 8.8.8.8 on port 53, and then check the IP address listed on the client side socket of that connection. If both of those fail, then OnionPerf returns a string "unknown".

comment:3 Changed 2 years ago by karsten

Thanks, Rob, for the quick feedback! I included as many of those details as possible while trying to leave out implementation specifics. (I also considered my note to self above by reformatting JavaDocs and including @since tags.) Now waiting for a review of the Java parts before this can go in. Thanks again!

comment:4 Changed 2 years ago by iwakeh

Status: needs_reviewmerge_ready

The java side looks fine. -> merge-ready

Should the renaming to 'onionperf' everywhere be done in release 2.0.0?

comment:5 Changed 2 years ago by karsten

Thanks for looking! Squashed and pushed to master.

What other renaming needs to be done? Note that we don't need to rename everything, because we're still supporting parsing of historic Torperf files and because we're still using the Torperf format that OnionPerf kindly produces for us (rather than OnionPerf's own format).

comment:6 Changed 2 years ago by iwakeh

I think there should be an OnionperfResult class that also supports the old torperf format.

Eventually, there might be more use cases for the additional data in onionperf measurements.
And, onionperf already adds many keys that are not part of the torperf format.

So, all 'torperf' should become 'onionperf' soon.

comment:7 in reply to:  6 ; Changed 2 years ago by karsten

Replying to iwakeh:

I think there should be an OnionperfResult class that also supports the old torperf format.

Eventually, there might be more use cases for the additional data in onionperf measurements.
And, onionperf already adds many keys that are not part of the torperf format.

So, all 'torperf' should become 'onionperf' soon.

After giving this some thoughts I think it's more complicated than this. The reason is that we're using OnionPerf in a kind of Torperf-compatibility mode right now, where it uses the same measurement setup (with static downloads of 50KB, 1MB, 5MB files) and produces the same data format with only minimal additions.

But we'll soon want to add more realistic measurement setups (simulated website downloads), produce a richer data format (probably JSON based, IIRC), and present more useful measurement results on Tor Metrics. That's what we'll want to call OnionPerf, not what we're doing right now. And we'll still want to refer to past measurements and data formats as Torperf.

How about we:

  1. change the type annotation in .tpf files produced by OnionPerf to @type torperf 1.1, because that format is still backward compatible to @type torperf 1.0 but adds new fields that a parser for the old format does not recognize;
  2. keep referring to everything in CollecTor that downloads from OnionPerf instances as OnionPerf, because we're really downloading from that new tool, not from the old Torperf tool;
  3. update https://collector.torproject.org/#type-torperf to say that these are "Torperf and OnionPerf Measurement Results" and specify what fields are new in version 1.1;
  4. leave metrics-lib's TorperfResults unchanged and just say that it supports new fields added in version 1.1;
  5. rename metrics-web's onionperf.csv (beta) to torperf2.csv, because it still uses the same measurement setup and data format as the former torperf.csv. It just adds a new column that makes it backward-incompatible with torperf.csv.

And how about once we're moving to more complex measurement setups and a richer data format in OnionPerf we call that @type onionperf 1.0, OnionPerfMeasurement, onionperf.csv, and so on.

What do you think?

comment:8 in reply to:  7 ; Changed 2 years ago by iwakeh

Replying to karsten:

Replying to iwakeh:

I think there should be an OnionperfResult class that also supports the old torperf format.

Eventually, there might be more use cases for the additional data in onionperf measurements.
And, onionperf already adds many keys that are not part of the torperf format.

So, all 'torperf' should become 'onionperf' soon.

After giving this some thoughts I think it's more complicated than this. The reason is that we're using OnionPerf in a kind of Torperf-compatibility mode right now, where it uses the same measurement setup (with static downloads of 50KB, 1MB, 5MB files) and produces the same data format with only minimal additions.

But we'll soon want to add more realistic measurement setups (simulated website downloads), produce a richer data format (probably JSON based, IIRC), and present more useful measurement results on Tor Metrics. That's what we'll want to call OnionPerf, not what we're doing right now. And we'll still want to refer to past measurements and data formats as Torperf.

Valid concerns.

How about we:

  1. change the type annotation in .tpf files produced by OnionPerf to @type torperf 1.1, because that format is still backward compatible to @type torperf 1.0 but adds new fields that a parser for the old format does not recognize;

Agreed.

  1. keep referring to everything in CollecTor that downloads from OnionPerf instances as OnionPerf, because we're really downloading from that new tool, not from the old Torperf tool;

Agreed.

  1. update https://collector.torproject.org/#type-torperf to say that these are "Torperf and OnionPerf Measurement Results" and specify what fields are new in version 1.1;

Agreed.

  1. leave metrics-lib's TorperfResults unchanged and just say that it supports new fields added in version 1.1;

Agreed.

  1. rename metrics-web's onionperf.csv (beta) to torperf2.csv, because it still uses the same measurement setup and data format as the former torperf.csv. It just adds a new column that makes it backward-incompatible with torperf.csv.

I would suggest torperf-1.1.csv to reflect the type version.

And how about once we're moving to more complex measurement setups and a richer data format in OnionPerf we call that @type onionperf 1.0, OnionPerfMeasurement, onionperf.csv, and so on.

What do you think?

That all makes sense. Fine.

comment:9 in reply to:  8 Changed 2 years ago by karsten

Resolution: implemented
Status: merge_readyclosed

Replying to iwakeh:

Replying to karsten:

Replying to iwakeh:

I think there should be an OnionperfResult class that also supports the old torperf format.

Eventually, there might be more use cases for the additional data in onionperf measurements.
And, onionperf already adds many keys that are not part of the torperf format.

So, all 'torperf' should become 'onionperf' soon.

After giving this some thoughts I think it's more complicated than this. The reason is that we're using OnionPerf in a kind of Torperf-compatibility mode right now, where it uses the same measurement setup (with static downloads of 50KB, 1MB, 5MB files) and produces the same data format with only minimal additions.

But we'll soon want to add more realistic measurement setups (simulated website downloads), produce a richer data format (probably JSON based, IIRC), and present more useful measurement results on Tor Metrics. That's what we'll want to call OnionPerf, not what we're doing right now. And we'll still want to refer to past measurements and data formats as Torperf.

Valid concerns.

How about we:

  1. change the type annotation in .tpf files produced by OnionPerf to @type torperf 1.1, because that format is still backward compatible to @type torperf 1.0 but adds new fields that a parser for the old format does not recognize;

Agreed.

Opened #22263 for this.

  1. keep referring to everything in CollecTor that downloads from OnionPerf instances as OnionPerf, because we're really downloading from that new tool, not from the old Torperf tool;

Agreed.

Nothing to be done here.

  1. update https://collector.torproject.org/#type-torperf to say that these are "Torperf and OnionPerf Measurement Results" and specify what fields are new in version 1.1;

Agreed.

Also part of #22263.

  1. leave metrics-lib's TorperfResults unchanged and just say that it supports new fields added in version 1.1;

Agreed.

Nothing to be done here. We don't explicitly say which minor versions are supported. Maybe we should, but in that case we should do that for all descriptor types. In any case, this shouldn't block closing this ticket.

  1. rename metrics-web's onionperf.csv (beta) to torperf2.csv, because it still uses the same measurement setup and data format as the former torperf.csv. It just adds a new column that makes it backward-incompatible with torperf.csv.

I would suggest torperf-1.1.csv to reflect the type version.

Done.

And how about once we're moving to more complex measurement setups and a richer data format in OnionPerf we call that @type onionperf 1.0, OnionPerfMeasurement, onionperf.csv, and so on.

What do you think?

That all makes sense. Fine.

Okay, great. I think we're done here now. Thanks for reviewing! Closing.

Note: See TracTickets for help on using tickets.