Opened 6 months ago

Last modified 8 weeks ago

#33399 accepted enhancement

Measure static guard nodes with OnionPerf

Reported by: acute Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords: metrics-team-roadmap-2020, metrics-team-roadmap-2020-june
Cc: acute, mikeperry Actual Points: 0.1
Parent ID: #33325 Points: 4
Reviewer: Sponsor: Sponsor59-must

Description

The specifications for measuring this are as follows:

  • OP should measure one guard or set of guards at a time. It could use NumEntryGuards in the torrc file to support more than 1 guard
  • A new guard/set of guards must be chosen after a day of measurement
  • All CBT data must be erased when choosing a new set of guards, after day of measurements (at UTC midnight). This could mean erasing or replacing the state file for CBT.

Child Tickets

Change History (18)

comment:1 Changed 5 months ago by gaba

Sponsor: Sponsor59

comment:2 Changed 4 months ago by acute

Currently guards are disabled in OP by setting UseEntryGuards=0 in the client torrc file. To enable them, UseEntryGuards should be set to 1, and additionally NumEntryGuards should be set to 1 (or a number >1 to test multiple guards). I have left an OP test instance running with this set to 3 to gather some data.

Purging the state:
To achieve this, the file called 'state' in the tor_client directory must be removed after log rotation. The guards previously measured could be extracted from this and added to the analysis output. The Tor process must be restarted/reloaded after the logs have rotated. All of this would only happen if the measurement mode is 'guard-enabled'.

Adding a new measurement mode:
A new mode should be made available to the cli, perhaps allowing the admin to specify how many guards to measure at once.

comment:3 Changed 3 months ago by gaba

Keywords: metrics-team-roadmap-2020 added

comment:4 Changed 3 months ago by karsten

Sponsor: Sponsor59Sponsor59-must

Moving to Sponsor59-must, because we should really do these in order to call Sponsor59 done.

comment:5 in reply to:  2 Changed 2 months ago by karsten

Some thoughts on this ticket:

  • IIRC, we're using UseEntryGuards=0 for the tor process on both client and server side. If we start using guards for a limited time now, we should do so on both sides.
  • We should experiment with the time we want to keep guards static. That time could range from (a) five minutes for a single measurement, (b) an hour, (c) a day, or even (d) several days.
    • A possible downside of changing guards at UTC midnight is that we might have a harder time identifying trends over time, because the choice of guards might overlay any other changes in the network.
    • If we pick a time that is too short, our results might be blurred by the stabilizing phase after choosing new guards.
    • Maybe we need to experiment with something like changing guards every hour and analyze how different the first few measurements in that hour are from those towards the end of the hour.
  • Rather than removing the state file we might try out the DROPGUARDS controller command which is supposed to achieve the same thing. What it might not do is remove circuit build timeout state, but maybe Tor is smart enough to consider the event of dropping all guards as drastic enough network change to reset the timeout back to the default and send a BUILDTIMEOUT_SET RESET event---I haven't checked. Note that even after going back to defaults, the first measurement or two will likely be different from those afterwards, because Tor will have to learn what a good timeout is with the new guard(s). Maybe it doesn't matter if we let Tor learn itself that something has changed. This is related to the previous thought on how often to change guards.

Leaving this ticket assigned to metrics-team. If somebody wants to grab it, please do!

comment:6 Changed 2 months ago by karsten

Actual Points: 0.1

comment:7 Changed 2 months ago by acute

Cc: mikeperry added

comment:8 Changed 2 months ago by mikeperry

The function to reset buildtimeout is circuit_build_times_reset(). It is called when there are too many timeouts. It is not called via DROPGUARDS.

We could make a DROPTIMEOUTS or similar command just like DROPGUARDS, that calls circuit_build_times_reset(), if that is simpler than removing the state file. I don't think DROPGUARDS should necessarily automatically reset CBT.

It takes 100 circuits to learn a circuit build timeout. During this phase, circuits are launched roughly every 10 seconds. So it takes about 1000 seconds to learn a timeout, at which point the BUILDTIMEOUT_SET COMPUTED event will be delivered again.

During this time, fix-guards onionperf should not record perf measurements between RESET and SET (as per #33420).

It makes sense that BUILDTIMEOUT_SET events other than COMPUTED are rare in onionperf production instances, because CBT only resets after many timeouts, and only SUSPENDs if TLS activity stops.

comment:9 Changed 2 months ago by karsten

Thanks for the input, mikeperry!

The idea of using a controller command for dropping timeouts rather than removing the state file came from robgjansen who was thinking about running similar experiments in Shadow. I'd say we should at least give it a try and see how complicated it is to implement such a command. Maybe we'll get help from friendly network team people.

Still leaving this ticket assigned to metrics-team to be picked up. It's certainly not a tiny amount of work, but that's already reflected in the 4.0 points estimated for this ticket. If somebody picks it up, please remember to release early and often by sharing intermediate results on this ticket. Thanks!

comment:10 Changed 2 months ago by mikeperry

One additional wrinkle: circuit_build_times_reset() does not emit a BUILDTIMEOUT_SET RESET event by itself. For sanity, I am guessing the DROPTIMEOUTS command should cause this RESET event to get emitted.

This DROPTIMEOUTS command should be a relatively simple patch. If you need it, I can probably hack that up in an hour or two.

comment:11 in reply to:  10 Changed 2 months ago by karsten

Replying to mikeperry:

One additional wrinkle: circuit_build_times_reset() does not emit a BUILDTIMEOUT_SET RESET event by itself. For sanity, I am guessing the DROPTIMEOUTS command should cause this RESET event to get emitted.

This DROPTIMEOUTS command should be a relatively simple patch. If you need it, I can probably hack that up in an hour or two.

That would be awesome. Yes, please!

comment:12 Changed 2 months ago by gaba

Keywords: metrics-team-roadmap-2020-june added

Adding all this tickets to the OnionPerf roadmap for June.

comment:13 Changed 2 months ago by mikeperry

Keywords: metrics-team-roadmap-2020-june removed

https://github.com/mikeperry-tor/tor/commits/droptimeouts provides this functionality.

https://github.com/mikeperry-tor/torspec/commits/droptimeouts provides the spec.

LMK if this looks good and I'll open a sub-ticket for network-team to merge.

Last edited 2 months ago by mikeperry (previous) (diff)

comment:14 Changed 2 months ago by mikeperry

Keywords: metrics-team-roadmap-2020-june added

(Yay trac for removing new keywords because I had a stale tab open)

comment:15 Changed 2 months ago by karsten

Owner: changed from metrics-team to karsten
Status: newaccepted

Thanks! I'll give this a try today and possibly tomorrow.

comment:16 Changed 2 months ago by mikeperry

I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.

This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.

Last edited 2 months ago by mikeperry (previous) (diff)

comment:17 Changed 2 months ago by karsten

I just moved the discussion of DROPTIMEOUTS to #33420. Let's focus on static guards in this ticket and leave everything related to circuit build timeouts for #33420. It might be that we'll want to use both features together once they exist, but development can happen in parallel in these two tickets.

comment:18 in reply to:  16 Changed 8 weeks ago by mikeperry

Replying to mikeperry:

I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.

This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.

Fix committed to the branch: https://github.com/mikeperry-tor/tor/commits/droptimeouts in https://github.com/mikeperry-tor/tor/commit/2e341098f9388e02d849feca161d8992c2645427

Note: See TracTickets for help on using tickets.