Measure static guard nodes with OnionPerf

Trac:
Parent Ticket: #33325 (moved)

added actualpoints::0.1 component::metrics/onionperf metrics-team-roadmap-2020 metrics-team-roadmap-2020-june owner::karsten parent::33325 points::4 priority::medium severity::normal sponsor::59-must status::accepted type::enhancement labels

Trac:
Sponsor: N/A to Sponsor59

Currently guards are disabled in OP by setting UseEntryGuards=0 in the client torrc file. To enable them, UseEntryGuards should be set to 1, and additionally NumEntryGuards should be set to 1 (or a number >1 to test multiple guards). I have left an OP test instance running with this set to 3 to gather some data.

Purging the state: To achieve this, the file called 'state' in the tor_client directory must be removed after log rotation. The guards previously measured could be extracted from this and added to the analysis output. The Tor process must be restarted/reloaded after the logs have rotated. All of this would only happen if the measurement mode is 'guard-enabled'.

Adding a new measurement mode: A new mode should be made available to the cli, perhaps allowing the admin to specify how many guards to measure at once.

Trac:
Keywords: N/A deleted, metrics-team-roadmap-2020 added

Moving to Sponsor59-must, because we should really do these in order to call Sponsor59 done.

Trac:
Sponsor: Sponsor59 to Sponsor59-must

Some thoughts on this ticket:

IIRC, we're using UseEntryGuards=0 for the tor process on both client and server side. If we start using guards for a limited time now, we should do so on both sides.
We should experiment with the time we want to keep guards static. That time could range from (a) five minutes for a single measurement, (b) an hour, (c) a day, or even (d) several days.
- A possible downside of changing guards at UTC midnight is that we might have a harder time identifying trends over time, because the choice of guards might overlay any other changes in the network.
- If we pick a time that is too short, our results might be blurred by the stabilizing phase after choosing new guards.
- Maybe we need to experiment with something like changing guards every hour and analyze how different the first few measurements in that hour are from those towards the end of the hour.
Rather than removing the state file we might try out the DROPGUARDS controller command which is supposed to achieve the same thing. What it might not do is remove circuit build timeout state, but maybe Tor is smart enough to consider the event of dropping all guards as drastic enough network change to reset the timeout back to the default and send a BUILDTIMEOUT_SET RESET event---I haven't checked. Note that even after going back to defaults, the first measurement or two will likely be different from those afterwards, because Tor will have to learn what a good timeout is with the new guard(s). Maybe it doesn't matter if we let Tor learn itself that something has changed. This is related to the previous thought on how often to change guards.

Leaving this ticket assigned to metrics-team. If somebody wants to grab it, please do!

Trac:
Actualpoints: N/A to 0.1

Trac:
Cc: acute to acute, mikeperry

The function to reset buildtimeout is circuit_build_times_reset(). It is called when there are too many timeouts. It is not called via DROPGUARDS.

We could make a DROPTIMEOUTS or similar command just like DROPGUARDS, that calls circuit_build_times_reset(), if that is simpler than removing the state file. I don't think DROPGUARDS should necessarily automatically reset CBT.

It takes 100 circuits to learn a circuit build timeout. During this phase, circuits are launched roughly every 10 seconds. So it takes about 1000 seconds to learn a timeout, at which point the BUILDTIMEOUT_SET COMPUTED event will be delivered again.

During this time, fix-guards onionperf should not record perf measurements between RESET and SET (as per #33420 (moved)).

It makes sense that BUILDTIMEOUT_SET events other than COMPUTED are rare in onionperf production instances, because CBT only resets after many timeouts, and only SUSPENDs if TLS activity stops.

Thanks for the input, mikeperry!

The idea of using a controller command for dropping timeouts rather than removing the state file came from robgjansen who was thinking about running similar experiments in Shadow. I'd say we should at least give it a try and see how complicated it is to implement such a command. Maybe we'll get help from friendly network team people.

Still leaving this ticket assigned to metrics-team to be picked up. It's certainly not a tiny amount of work, but that's already reflected in the 4.0 points estimated for this ticket. If somebody picks it up, please remember to release early and often by sharing intermediate results on this ticket. Thanks!

One additional wrinkle: circuit_build_times_reset() does not emit a BUILDTIMEOUT_SET RESET event by itself. For sanity, I am guessing the DROPTIMEOUTS command should cause this RESET event to get emitted.

This DROPTIMEOUTS command should be a relatively simple patch. If you need it, I can probably hack that up in an hour or two.

Replying to mikeperry:

One additional wrinkle: circuit_build_times_reset() does not emit a BUILDTIMEOUT_SET RESET event by itself. For sanity, I am guessing the DROPTIMEOUTS command should cause this RESET event to get emitted.

This DROPTIMEOUTS command should be a relatively simple patch. If you need it, I can probably hack that up in an hour or two.

That would be awesome. Yes, please!

Adding all this tickets to the OnionPerf roadmap for June.

Trac:
Keywords: N/A deleted, metrics-team-roadmap-2020-june added

https://github.com/mikeperry-tor/tor/commits/droptimeouts provides this functionality.

https://github.com/mikeperry-tor/torspec/commits/droptimeouts provides the spec.

LMK if this looks good and I'll open a sub-ticket for network-team to merge.

Trac:
Keywords: metrics-team-roadmap-2020-june deleted, N/A added

(Yay trac for removing new keywords because I had a stale tab open)

Trac:
Keywords: N/A deleted, metrics-team-roadmap-2020-june added

Thanks! I'll give this a try today and possibly tomorrow.

Trac:
Status: new to accepted
Owner: metrics-team to karsten

I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.

This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.

I just moved the discussion of DROPTIMEOUTS to #33420 (moved). Let's focus on static guards in this ticket and leave everything related to circuit build timeouts for #33420 (moved). It might be that we'll want to use both features together once they exist, but development can happen in parallel in these two tickets.

Replying to mikeperry:

I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.

This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.

Fix committed to the branch: https://github.com/mikeperry-tor/tor/commits/droptimeouts in https://github.com/mikeperry-tor/tor/commit/2e341098f9388e02d849feca161d8992c2645427

changed time estimate to 32h

added 48m of time spent

mentioned in issue #33420 (moved)

mentioned in issue #34257 (moved)

mentioned in issue #33325 (moved)

Measure static guard nodes with OnionPerf

Child items ...

Activity