Opened 6 months ago

Last modified 5 months ago

#34231 new enhancement

Document and maybe improve how we're mapping TGen transfers to Tor streams/circuits

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords:
Cc: metrics-team, arma, acute, phw Actual Points:
Parent ID: #33328 Points:
Reviewer: Sponsor: Sponsor59-must

Description

OnionPerf uses TGen to make transfers using a local Tor client. OnionPerf also uses Stem to connect to the Tor client's control port and register for control events.

This ticket is about documenting how we can map TGen transfers to Tor streams and circuits. OnionPerf did this to produce the .tpf output format (which we just killed in #34141). But we'll also need this functionality to implement #34218 or #33260.

Here's what we're doing in metrics-lib right now to map transfers and streams:

  • Index Tor circuits by their circuit ID.
  • Index Tor streams by their source port; if there are two or more streams with the same source port, remember them all.
  • Go through TGen transfers one by one. For each, extract the local source port.
  • Go through Tor streams with the same source port and check if transfer end and stream end happened within 150 seconds.
  • If there's a match, look up the corresponding circuit by circuit ID.

Note that OnionPerf took a simpler approach for producing .tpf files by remembering just one stream by source port and not applying that 150 seconds heuristic. The result was that some mappings were wrong. The approach taken by metrics-lib leads to a few missing mappings (probably as many as OnionPerf had), and apparently no wrong mappings.

Is there a way to have an exact mapping that doesn't require a heuristic? And is there a way to do it without having to wait for transfer and stream to end?

Child Tickets

Change History (5)

comment:1 Changed 6 months ago by gaba

Sponsor: Sponsor59-must

comment:2 Changed 6 months ago by gaba

Parent ID: #33328

comment:3 Changed 5 months ago by dennis.jackson

I have a need for similar functionality, with a slightly different twist. I would like to associate web requests from the Tor Browser with particular circuits. So the mapping would be User Action -> HTTP Request -> SOCKS Username/Password for Stream Isolation -> A Circuit.

As far as I understand TorButton uses this exact functionality internally so as to display it to the user, however, I haven't looked into it in enough detail to see if it is robust:

https://gitweb.torproject.org/torbutton.git/tree/chrome/content/tor-circuit-display.js#n45

https://gitweb.torproject.org/torbutton.git/tree/components/domain-isolator.js#n124

In particular, the following comment is informative:

// Watches for STREAM SENTCONNECT events. When a SENTCONNECT event occurs, then
// we assume isolation settings (SOCKS username+password) are now fixed for the
// corresponding circuit. Whenever the first stream on a new circuit is seen,
// looks up u+p and records the node data in the credentialsToNodeDataMap.
// We need to update the circuit display immediately after any new node data
// is received. So the `updateUI` callback will be called at that point.
// See https://trac.torproject.org/projects/tor/ticket/15493

comment:4 Changed 5 months ago by acute

At the moment, Onionperf uses stem to log events from the Tor control socket corresponding to Onionperf's tor process, and later parses these logs (we refer to them as torctl logs) line by line at analysis time into CircuitEvents, StreamEvents, BandwidthEvents and BuildTimeoutSetEvents.

The StreamEvent is used to extract the port which originated the connection (source port) and circuit ID, which is what we currently use for matching. There don't seem to be any other useful StreamEvent variables that can help with matching (see https://stem.torproject.org/api/response.html).

However, I believe we can match tgen streams to Tor circuits in the torctl logs directly using SOCKS authentication.

Tgen 1.0.0 supports generating random usernames and passwords for SOCKS authentication, which can be be used to uniquely identify a transfer and match it to a CircuitEvent (stem already fills the socks_username and socks_password fields during parsing anyway).

I've done a quick test to check, this is how the log lines look like if we enable the random SOCKS authentication strings in tgen:

2020-05-23 18:01:14 1590253274.675001 [info] [tgen-transport.c:771] [_tgentransport_receiveSocksAuth] socks server localhost:127.0.0.1:34810 authentication succeeded with username='zRhBJ8o' and password='zRhBJ8o'


...and this is a sample line from the corresponding torctl log:

2020-05-23 18:01:17 1590253277.57 650 CIRC 406 EXTENDED $87C08DDFD32C62F3C56D371F9774D27BFDBB807B~Unnamed,$B9E7A637B00BBB77853A639CC33245A2FEB8F033~theykilledaaron,$3E13E2EB87CCF5690564EE33E9F9F9F80B229FBB~hotzenplotz BUILD_FLAGS=IS_INTERNAL,NEED_CAPACITY PURPOSE=HS_CLIENT_REND HS_STATE=HSCR_CONNECTING REND_QUERY=afa4fswz3ifwlbwsgk6va7vbbxj35m3geo3hvpc5u22w66yadr6xfayd TIME_CREATED=2020-05-23T17:01:16.357678 SOCKS_USERNAME="zRhBJ8o" SOCKS_PASSWORD="zRhBJ8o"

As far as the code goes, the change to the Onionperf parsers seems simple, and this is a better way of matching.

Some questions/thoughts:

  • Turning on SOCKS authentication in Onionperf means we use stream isolation. My understanding is that each transfer (stream) would use a different circuit, which is what we expect anyway in Onionperf? Would this change affect measurements?
  • Is it likely that the tgen generated SOCKS credentials would conflict?
  • If we have plans to change what we use to parse Onionperf logs, we should check the replacements support this.

comment:5 Changed 5 months ago by karsten

This looks like a very promising approach! Some thoughts:

  • We're currently using TGen 0.0.1. It sounds like we would first have to upgrade to 1.0.0 in order to use this feature (#33974).
  • If we're concerned about tgen generated SOCKS credentials not being long enough to avoid conflicts, we could additionally look at source ports to match transfers and streams.
Note: See TracTickets for help on using tickets.