Changes between Initial Version and Version 4 of Ticket #7358


Ignore:
Timestamp:
Nov 8, 2012, 8:00:16 PM (7 years ago)
Author:
robgjansen
Comment:

Replying to karsten:

Replying to robgjansen:

Client

  • download statistics (how long to get to first, last, ? byte)
  • circuit build times, build timeouts
  • which relays were chosen for each circuit, and during which time intervals

This is basically what Torperf does. The first item is what trivsocks-client tells us, and the second and third items are what we learn from control port events. Implementing the first item in Tor using control port events is sure tempting. We lack some information, e.g., the exact timestamp when Torperf started the download and the expected number of bytes we want to download. But maybe we can compensate for that. For example, we could collect times between sending the first byte and receiving 1B, 1kB, 2kB, 5kB, 10kB, 20kB, 50kB, 100kB, 200kB, 500kB, and so on, up to the last byte. Not exactly the same as the deciles we have so far, but should be fine for most purposes. Also, we'll have to do that for all circuits that the client opens on behalf of the user.

I'm of the opinion that we let the user application (TorPerf, etc.) continue measuring end-user-specific performance characteristics, precisely because of the lack of information and imprecision that you mentioned. Tor should stick to things its good at measuring, like throughput.

Another item for the client section here might be directory client operations. We might want to keep track how many directory requests a client has made and how many bytes it has sent or received for that. Then we can compare different directory designs more easily.

Also, I think we need to move all statistics from client+relay that have to do with streams to the client-only section.

Great! I've updated the description.

client+relay

  • cell statistics: # queued, processed, waiting times
  • total number of or connections, circuits, and streams over time

What about other connections than OR?

Good point :) Updated.

  • various throughputs (stream, circuit, connection) over various intervals (last second, 10 seconds, 60 seconds, 300 seconds, ? seconds)
  • when steams, circuits, or connections change active/inactive status
  • indications of congestion (inferred by how fast/often token buckets were emptied/empty, queuing times from above)

I assume these will all be implemented as asynchronous control port events, right? Which of them will be emitted whenever there's a change, and which will be emitted periodically?

Another item might be statistics on crypto operations as described in #7134, but without the aggregation step that isn't necessary if we collect these statistics in a simulation/testing environment. The two can probably share a lot of code.

This does seem important. My vision of a "stats" module in #7359 should help avoid duplication of code and help separate statistics from functionality.

relay

  • protocol overheads (raw client data vs protocol traffic)

We also have statistics on bi-directional connection usage already in Tor. But these are probably contained somewhere in the client+relay section.

And we might add statistics on directory server operations with the same reasoning as adding directory client operations.

Updated.

What am I missing?

Not sure, I guess we'll find out while working off this list. Count me in, this is fun stuff. :)

Awesome:)

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #7358

    • Property Type changed from defect to task
  • Ticket #7358 – Description

    initial v4  
    11Here are the statistics I speculate will be useful, and may or may not already be available in some form, and may only be available externally (outside of the Tor code). Keep in mind that some of this is not intended to be collected outside of an experimentation environment, else proper aggregation/scrubbing is required.
    22
    3 Client
    4    * download statistics (how long to get to first, last, ? byte)
     3client
    54   * circuit build times, build timeouts
    65   * which relays were chosen for each circuit, and during which time intervals
     6   * number of streams over time
     7   * stream throughput over time
     8   * how long streams have been active/inactive
     9   * number of and bandwidth expended by client directory operations
    710
    811client+relay
    9    * cell statistics: # queued, processed, waiting times
    10    * total number of or connections, circuits, and streams over time
    11    * various throughputs (stream, circuit, connection) over various intervals (last second, 10 seconds, 60 seconds, 300 seconds, ? seconds)
     12   * cell statistics: number queued and processed, waiting times
     13   * total number of circuits and the various connection types (AP, OR, EXIT, DIR) over time
     14   * throughput of circuits and the various connection types over time
    1215   * when steams, circuits, or connections change active/inactive status
    13    * indications of congestion (inferred by how fast/often token buckets were emptied/empty, queuing times from above)
     16   * how fast/often token buckets were emptied/empty
     17   *
    1418
    1519relay
    1620   * protocol overheads (raw client data vs protocol traffic)
     21   * number of and bandwidth expended by directory server operations
    1722
    1823What am I missing?