Circuit padding trace simulator

Trac:
Child Ticket(s): #31790 (moved)

added circpad-researchers-want component::core tor/tor priority::medium severity::normal status::new type::enhancement wtf-pad labels

Trac:
Description: This is the parent ticket for the pieces of work required to make it possible to use circuitpadding.c in a trace simulator outside of Tor, so that defenses could be re-applied to crawl traces quickly without needing to re-crawl a set of sites.

to

This is the parent ticket for the pieces of work required to make it possible to use circuitpadding.c in a trace simulator outside of Tor, so that defenses could be re-applied to crawl traces quickly without needing to re-crawl a set of sites.

An alternate way to do this, instead of extracting this code, is to make use of our unit testing framework and build the tracer as a unit test. We have mechanisms to mock the networking functions so that they output new traces, and then we can use the unit-test style to read in a trace file and output a new one, instead of performing a test.
Cc: N/A to pulls, asn

I added nickm and dgoulet to Cc since they are most familiar with how circuitmux, etc looks on the wire, and how libevent callbacks in a mocked scenario may differ from reality, and to generally sanity check this.

Here are the specific details on the approach:

This simulator needs two components. First: instrumentation of Tor itself, so that researchers can produce reference trace sets from a crawl. Second: A unit-test driven defense simulator, which reads trace files, applies padding machines to them, and outputs new defense-applied trace files.


The undefended reference crawl traces must be collected at three locations: the client, the guard, and the middle node.

For the client and the middle node, the simplest place to collect these traces are in the circpad_cell_event_* callbacks, as well as the circpad_machine_event_* callbacks. Those callbacks can have added loglines/fprintfs that print out a timestamp, an event type, and a cell direction. Note also that the patches in #29494 will make these callback locations more closely reflect actual network wire write time. The middle-node callbacks should also check that this is a researcher machine that is meant to be recorded before emitting loglines, though.

For the guard node, we can allow our tracer clients to use a special magic third padding machine (if we bump CIRCPAD_MAX_MACHINES and increase the field width) just to hop 1. Then, to signal to the guard node which circuits to record, this machine can be negotiated with the guard, so it can mark circuits as belonging to the researcher, for safe collection, by way of adding a dummy padding machine. The same circpad_cell_event callbacks as were used for the client and middle can then be used for guard collection in that case, since a padding machine will be open there, only on the researcher's circuits (though the callbacks should check that this is a researcher machine that is meant to be recorded, just as in the middle node case).

Be warned though: this guard approach may get hairy because of the need to use index 3 on the client, but index 1 on the guard, as well as make sure the machine_index bitfield and CIRCPAD_MAX_MACHINES are bumped to 3 for the client.

The timing differences for cell transit time from middle to guard and client to guard should be used to compute some statistics for simulating this delay upon defense application, as the simulator will not have a guard node. 

Alternatively, all of the classifier input could simply use client traces, with an appropriate model for guard latency built in. This will eliminate the need to pin the guard node in all crawls. However, without at least some live traces for reference, this model may miss out in varying congestion and queuing delays on the guard node from the live scenario.

For the simulator, our unit testing framework should be used to set up client and relay padding machines on mock circuits, with a mock circuitmux between them. Trace files can then be read, advance mocked monotime as per the timestamps in the trace (via timers_advance_and_run() function or similar code), and and call the appropriate circpad_cell_event_* and circpad_machine_event_* callbacks as per the trace file.

These callbacks themselves should then directly emit loglines/fprints as per their already instrumented code for the live trace collections. Remember to model guard latency and congestion into the client output, to create new guard traces for the classifier.

Example unit tests that already do this sort of thing are test_circuitpadding_rtt(), test_circuitpadding_negotiation(), and test_circuitpadding_circuitsetup_machine() (see #30578 (moved) for a fixup of this last test... it may be the most useful) in ./src/test/test_circuitpadding.c.

There is an extra wrinkle that after each event, the simulator must check if the next scheduled padding packet (if any, checked via circ->padding_info[N]->padding_scheduled_at_usec) would happen before or after the next timestamp in the trace. If it would happen before the next timestamp, we need to only advance monotime to that next padding callback, and not to the next timestamp in the trace, so that the callback can fire with an appropriately correct timestamp from monotime.

This simulator will also not measure the delays from libevent to the callbacks, because mocked monotime will prevent it from doing so. To capture this delay, calls to tor_gettimeofday() before advancing monotime, and then again in the callbacks can be used to try to measure this delta and move mocked monotime forward appropriately, but this may still be short of reality, since on relays especially, other libevent event callbacks will delay padding callbacks if they are made first.

Trac:
Cc: pulls, asn to pulls, asn, dgoulet, nickm

Thank you for the detailed instructions Mike, very useful. I've started working on implementing a simulator, goal is to have something useful by the end of the month. Also thank you Nick for merging doc/HACKING/design, it helped me a lot.

The implementation now requires a client and relay trace (got a lazy python script to simulate a relay trace from a client trace as well). My biggest gripe right now is time. Every time a client/relay machine triggers padding, a corresponding event has to be added to the relay/client trace with an estimated time. This estimate will always be, well, wrong. I'm not sure it's possible to make this estimate in such a way that it'll fool time-based classifiers, even if we add in guard traces for better estimates and patches as you mention Mike.

Right now I think it might be best as a starting point to just try to use the simulator to find optimal machines against attacks like Deep Fingerprinting that ignores time. Once we have a better understanding of how feasible and costly that is we can look more closely at how time changes things.

Any thoughts on this? Have I missed some other reason than time estimates for including guard traces?

Also, if some other researcher working on this wants to collaborate please reach out.

Replying to pulls:

The implementation now requires a client and relay trace (got a lazy python script to simulate a relay trace from a client trace as well). My biggest gripe right now is time. Every time a client/relay machine triggers padding, a corresponding event has to be added to the relay/client trace with an estimated time. This estimate will always be, well, wrong. I'm not sure it's possible to make this estimate in such a way that it'll fool time-based classifiers, even if we add in guard traces for better estimates and patches as you mention Mike.

To be clear: I believe this simulator will only be accurate enough to do preliminary tuning of defenses against attacks, especially for expensive classifiers. I think final attack and defense evaluation, and possibly even some final tuning, should be done on the live network. At least until we discover that for all of our tested attack+defense combinations, the live network and the simulator agree.

What do you mean by "wrong" though? We should try to make the simulator as close as possible. We are aware of the circuitmux problem, as well as delay introduced by libevent callbacks. These are both paths we hope we can optimize, though. Are there others?

Right now I think it might be best as a starting point to just try to use the simulator to find optimal machines against attacks like Deep Fingerprinting that ignores time. Once we have a better understanding of how feasible and costly that is we can look more closely at how time changes things.

Do you mean ignores the time deltas between the client/middles and the guard?

Any thoughts on this? Have I missed some other reason than time estimates for including guard traces?

Well, I have always assumed that the most realistic adversary for these attacks is one that runs them from inside the Tor network, where they have much higher resolution over circuit construction and usage, and have full circuit multiplexing information.

We can simulate such an adversary by looking at client traces, or guard TLS traces, I suppose.

Also, if some other researcher working on this wants to collaborate please reach out.

I now have some time to help with this a bit for the next couple weeks. Can you put your work in a branch on github?

Replying to mikeperry:

Replying to pulls:

The implementation now requires a client and relay trace (got a lazy python script to simulate a relay trace from a client trace as well). My biggest gripe right now is time. Every time a client/relay machine triggers padding, a corresponding event has to be added to the relay/client trace with an estimated time. This estimate will always be, well, wrong. I'm not sure it's possible to make this estimate in such a way that it'll fool time-based classifiers, even if we add in guard traces for better estimates and patches as you mention Mike.

To be clear: I believe this simulator will only be accurate enough to do preliminary tuning of defenses against attacks, especially for expensive classifiers. I think final attack and defense evaluation, and possibly even some final tuning, should be done on the live network. At least until we discover that for all of our tested attack+defense combinations, the live network and the simulator agree.

What do you mean by "wrong" though? We should try to make the simulator as close as possible. We are aware of the circuitmux problem, as well as delay introduced by libevent callbacks. These are both paths we hope we can optimize, though. Are there others?

Agree, we're on the same page. By "wrong" I mean in addition to what you mentioned inside of tor everything between traffic leaving tor at client/relay until it ends up in tor at the relay/client: basically the Internet! ;) We can never accurately capture this in a simulation now, that's more something for Shadow++ to strive for.

Right now I think it might be best as a starting point to just try to use the simulator to find optimal machines against attacks like Deep Fingerprinting that ignores time. Once we have a better understanding of how feasible and costly that is we can look more closely at how time changes things.

Do you mean ignores the time deltas between the client/middles and the guard?

Ignores time completely (beyond the ordering of cells and their directions). Deep Fingerprinting, like Wang's kNN and so on, operates on pure cell/packet traces:

1 1 -1 -1

Etc, no time there at all. Since the simulator cannot get time exactly right and some nasty deep learning machinery likely can pick-up on padding cells being sampled from some non-real distribution given enough samples, a first step is to get the simulation to produce correct cell traces with high probability. Finding (reasonably efficient) machines that can defend against attacks operating only on cells would be an awesome first step and hopefully teach us a lot.

Any thoughts on this? Have I missed some other reason than time estimates for including guard traces?

Well, I have always assumed that the most realistic adversary for these attacks is one that runs them from inside the Tor network, where they have much higher resolution over circuit construction and usage, and have full circuit multiplexing information.

We can simulate such an adversary by looking at client traces, or guard TLS traces, I suppose.

Thanks for clarifying, makes sense. With access to the client and (padding) relay traces, it should be possible to get closer to an "ideal" trace for an attacker (not necessarily the most realistic, but stronger)? For example, observing the exact time that cells are sent at a client and when cells are forwarded from relay (at the relay) to client seems close to ideal, right? That way you minimize the network noise. Padding machines that can defend from such an attacker should be able to deal with attackers with more noisy traces.

Also, if some other researcher working on this wants to collaborate please reach out.

I now have some time to help with this a bit for the next couple weeks. Can you put your work in a branch on github?

Awesome, it's here: https://github.com/pylls/circpad-sim . Added you as collaborator. Updated the README with the current state of things. Going to work on input and output next so that I can clean up some of the debug code. Will continue to work actively on it now sans trip Sunday-Wednesday.

mentioned in issue #31789 (moved)

mentioned in issue #31790 (moved)

moved to tpo/core/tor#31788

Circuit padding trace simulator

Child items 0

Activity