Opened 7 years ago

Last modified 4 days ago

#5830 assigned task

Write tool to automate web queries to Tor; and use Stem to track stream/circ allocation and results

Reported by: arma Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Normal Keywords: bounty, nickm-cares
Cc: robgjansen, karsten, gsathya, cwacek, arthuredelstein, gk Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by arma)

As part of #5752 we need to know how many circuits we're making now, how many we're discarding early because a stream didn't work, etc.

This is a two-part project: first is a tool to automatically make a series of requests to Tor, in a repeatable way, and second is a Tor controller script, probably using Stem, that watches stream and circuit events (and maybe more), and tracks which streams get allocated to which circuits, how many total circuits are made, how quickly results return, and other statistics. Then we would change the underlying Tor, replay the same set of requests, and know what circuit behaviors to expect.

I expect we'll also discover that we don't export enough info via the control protocol to make good conclusions; in that case we'll also want to modify Tor to export this info.

Child Tickets

Change History (21)

comment:1 Changed 7 years ago by arma

Owner: aagbsn deleted
Status: newassigned

I'd be happy if aagbsn wants to do this, but I don't think he should be the default owner of this ticket.

comment:2 Changed 7 years ago by arma

Another tool that would be handy to put together with this would be something that auto generates web fetches at specified times. To be realistic, maybe it actually launches a wget --mirror or the like, to pull down the images on the pages only after the initial html arrives. Or maybe this is better as one of those Firefox extensions that instruments Firefox to make automated clicks.

comment:3 Changed 7 years ago by arma

Status: assignednew

comment:4 Changed 7 years ago by arma

<rransom> armadev, re #5830: You'll want to add a CIRC2 event triggered when Tor 'abandons' a circuit.

comment:5 Changed 7 years ago by arma

It's likely that you'll have more fun (and make better progress) by ignoring Torflow and just using Stem (https://gitweb.torproject.org/stem.git) to hear the events. Or heck, just write your own little script to connect to the control port and pull down the events you want. Most of the work will be in deciding what to compute based on the events, rather than in parsing them.

Last edited 4 years ago by arma (previous) (diff)

comment:6 Changed 7 years ago by arma

Component: TorflowAnalysis

comment:7 Changed 7 years ago by atagar

I'm not entirely sure what you're looking for, but while stem is functional it's still pretty rough around the edges. Event parsing will be early in Ravi's project so it should be done somewhere in early June, but for now stem only provides the unparsed message objects. Here's an example for printing events...

# Simple script to start a tor instance, attaches to it, and prints BW events
# for a few seconds.

import time

from stem.connection import connect_port, authenticate
from stem.control import BaseController
from stem.process import launch_tor, NO_TORRC

# controller class that simply prints the events that it receives
class EventPrinter(BaseController):
  def _handle_event(self, event_message):
    print event_message

# Start a tor instance that, hopefully, won't conflict with anything. We can
# connect to it and start using the instance when bootstrapping reaches 5%.

print "starting tor..."
tor_process = launch_tor(
  options = {'ControlPort': '2777'},
  torrc_path = NO_TORRC,
  completion_percent = 5,
)

with connect_port(control_port = 2777) as control_socket:
  controller = EventPrinter(control_socket)
  authenticate(controller)
  controller.msg('SETEVENTS BW')
  time.sleep(5)

tor_process.kill()

... and here's an example for doing something similar with TorCtl...

https://gitweb.torproject.org/pytorctl.git/blob/HEAD:/example.py

comment:8 Changed 6 years ago by arma

Keywords: bounty added

If anybody runs across a great developer who wants to get involved in Tor, get up to speed on Stem, and help us do research, this is a great bite-sized project -- write the tool to cause the series of requests to Tor, and the tool to hear (via the control port) how the streams were assigned to circuits, how they succeeded or failed, etc.

I'm marking as 'bounty' because we could do it as "trial" contract work for somebody.

comment:9 Changed 6 years ago by gsathya

Cc: gsathya added

comment:10 Changed 6 years ago by arma

Description: modified (diff)
Summary: Write stream/circ event parser to track circuit useWrite tool to automate web queries to Tor; and use Stem to track stream/circ allocation and results

comment:11 Changed 6 years ago by atagar

I'm happy to help if someone wants to take the lead on this. My comment was eight months back and stem should now have everything that we need. :)

comment:12 Changed 6 years ago by karsten

Cc: karsten added

Somewhat related, I'm planning to use Stem for the Torperf rewrite that fetches popular websites using Selenium/Firefox and that logs request times and circuit details for later analysis. That's a sponsor F deliverable which is due February 28.

comment:13 Changed 6 years ago by arma

Cc: amj703 robgjansen cwacek added

comment:14 Changed 5 years ago by mikeperry

Parent ID: #5752

Fine to do, but I don't think it blocks #5752, nor is it likely to give us any real data on how often users navigate between top-level sites in aggregate/on average (which is the real source of circuit creation under #5752).

comment:15 Changed 3 years ago by arthuredelstein

Cc: arthuredelstein added
Severity: Normal

comment:16 Changed 3 years ago by gk

Cc: gk added

comment:17 Changed 3 years ago by robgjansen

I think OnionPerf may already do much of what you want. It has a 'measure' mode to download data over Tor, a 'monitor' mode to log Tor control port events to file, a 'analyze' mode to process those log files into data files, and a 'visualize' mode to plot the results of the analysis.

I use it to process Shadow results.

https://github.com/robgjansen/onionperf

comment:18 Changed 3 years ago by robgjansen

Oh also, it will hopefully replace TorPerf one day, because the types of requests sent through Tor can be customized allowing us to model much more complex behaviors than a single file of a specific size.

comment:19 Changed 16 months ago by nickm

Keywords: nickm-cares added

comment:20 Changed 16 months ago by karsten

Owner: set to metrics-team
Status: newassigned

comment:21 Changed 4 days ago by amj703

Cc: amj703 removed
Note: See TracTickets for help on using tickets.