I want to have information about how much data is actually for applications, so that external controllers can check for circuit side channel attacks.
To do this, I want to add two fields to CIRC_BW events: a PACKAGED field that reports the sum of all rh.length values for all relay cells, and a DELIVERED field that reports the sum of all rh.length for valid RELAY_DATA cells.
That way, controllers can check for large differences between PACKAGED and DELIVERED, to see if lots of data is being sent on a circuit that is getting silently dropped. They can also check for differences between the raw CIRC_BW value and DELIVERED, to see if an adversary is underfilling cells (though this may be far more prone to false positives).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Ok I forgot that rh.length will be 0 or small for many relay commands. So we're not able to get a dropped data count with the PACKAGED stat as-is. I might just remove that and only have DELIVERED, unless I can think of something clever.
Ideally the values reported here would allow us to differentiate between excess data on the circuit due to cells being valid but not full vs excess data due to dropped cells.
In those two, I add DELIVERED and OVERHEAD. DELIVERED is the same as before (sum of RELAY_COMMAND_DATA rh.length fields). OVERHEAD is the sum of the extra unused space in RELAY_COMMAND_DATA fields.
The idea is that the original READ and WRITTEN fields can be used to count all of the cell data sent on the circuit. Then you can add DELIVERED+OVERHEAD, and subtract that from the original CIRC_BW fields and get the count of bytes consumed due to all non-application cells. This should be an approximation of the dropped cell data on the circuit (not counting legit non-application data like SENDME cells, HS state machine cells).
The plan is to use this to set limits in the vanguards script on the ratio of RELAY_COMMAND_DATA to other stuff on the circuit.
Ok tested this for a while with the vanguards script and realized that I will also need to count stream BEGIN, RESOLVE, and END as delivered application data, or this thing is going to have false positives wrt dropped data.
Trac: Summary: Add PACKAGED and DELIVERED fields to CIRC_BW events to Add OVERHEAD and DELIVERED fields to CIRC_BW events Status: needs_review to needs_revision
This version counts any relay cell the client decides to write as valid delivered written data. For received data, it counts relay cells as valid delivered data if they do not trigger an error condition in the code that handles them.
FYI, I updated the changes file and one of the commit messages here. The new branch is bug25903_v4, which is also in a pull request at: https://github.com/torproject/tor/pull/71
Ok I think I found a bug in tor. I am not sure, but I think there should be a return in connection_edge_process_relay_cell_not_open() when the connected cell can't be parsed. The side effect of the parse failure is the addr will be AF_UNSPEC later, so I moved the accounting call inside that block.
Ok I think I found a bug in tor. I am not sure, but I think there should be a return in connection_edge_process_relay_cell_not_open() when the connected cell can't be parsed. The side effect of the parse failure is the addr will be AF_UNSPEC later, so I moved the accounting call inside that block.
Okay, left a review for you on PR#101. There's just one small thing I'm worried about with an assert() not necessarily being true all the time in the future.
Hrmm. Right now there are no relay cell payloads larger than RELAY_PAYLOAD_SIZE, and checks against this max are earlier in all of the code that calls this function.
Is this your only issue with the branch? IMO, in that case we should merge this, and then change this and the other checks on relay payload length to whatever mechanism we have for specifying the new max length depending on cell type once that invariant changes. Right now, we don't seem to have an alternative mechanism.
16:52 <+nickm> mikeperry: are you asking about wide-create stuff?16:52 <+nickm> that doesn't affect relay payload sizes; they remain the same as before
This makes sense. It isn't the relay payload size that will change with wide creates. You simply have the ability to split a create over multiple relay cells, whose payload body max will still be RELAY_PAYLOAD_SIZE.
So this code will continue to work correctly in that case, properly measuring overhead and delivered data.
Setting to needs_review in case you have other issues. If not, please set to merge_ready.