Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.
A similar situation exists for spurious RELAY_ENDs.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Better idea: since malicious endpoints can withhold the ACK, we might as well just introduce a half-closed state for edge connections. With a simple struct, we can keep track of the outstanding END, SENDME windows, and data windows, and decrement those, and treat packets within the window as valid, until we get an END from the other side.
This also has the advantage that we don't have to wait for anyone to upgrade.
Trac: Summary: Create a RELAY_COMMAND_END_ACK cell type to Track half-closed stream IDs
Wrt DropMark, we can still count very early invalid cells as dropped in the vanguards addon (and eventually in tor-core), and react to those. This behavior won't prevent that, since there will be no streams at that point.
Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.
A similar situation exists for spurious RELAY_ENDs.
This isn't a full fix, because malicious relays can withhold the ack, but having it in place would simplify a lot of the code for dealing with unexpected packets.
Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.
A similar situation exists for spurious RELAY_ENDs.
Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.
A similar situation exists for spurious RELAY_ENDs.
Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.
A similar situation exists for spurious RELAY_ENDs.
Removing the needs-proposal and 035 tags because this version of the fix does not change the protocol or even change client behavior. It only changes the values reported to the CIRC_BW event for the vanguards controller to react to.
teor: If you care, this needs another block of the same checks when the circuit is in purpose PATH_BIAS_TESTING. I also want to reorg the branch a bit when I do that.
Despite the backport snafu, I would like review on 0.3.4 branch until we're satisfied with it. I imagine review will generate a bunch of changes that will be a pain to fix on both branches, and I'd like to do 0.3.4 first so I can keep testing it with the vanguards addon as we go. Then we can merge a rebased version of that result forward to master with just one merge commit for a spot-check as opposed to juggling fixups on both branches and generating a lot of confusion.
Overall, this code seems ok, but it's very large for a backport to a release candidate.
(I understand you've already spoken with Nick on IRC about backporting the code. I don't think this ticket is a good place for backport conversations. Perhaps sending an email to tor-dev would be a good way of addressing concerns?)
Ok I have pushed fixup commits for your code review comments and linked them in reply on the github pull request review.
Additionally, I pushed a squashed branch to mikeperry/ticket25573-v2-squashed, for any fresh review and for reference in backport discussion (which I think I'll take up on network-team).
Ok I also decided to go ahead and sort the list of half-closed connections, so we can bsearch them on the common operations of cell arrival, and also guard against duplicate stream ids. I added tests for duplicates and list management correctness as well. These are additional commits on each branch (the pull request with the fixups from the first review, and also as separate commits on top of mikeperry/ticket25573-v2-squashed).
When we think this is merge-ready, I can do the rebase dance so we actually merge to master, but I'd still like to avoid doing that until we're sure we like it.
Do you test what happens when stream ids wrap? Do you think that's important now with all the other changes?
There are a couple of different types of "wrapping". One is if we pick the initial stream_id for a circuit near UINT16_MAX, and that wraps around. This wrap will happen in get_unique_stream_id_by_circ(), which I don't currently test, but should be fine. I can easily test this, and I will in a fixup. (I also just realized I forgot to switch to bsearch in get_unique_stream_id_by_circ(), so I should do that and test it).
connection_half_edge_add() itself should succeed until the opened list is empty and the half-closed list is full of 65536 IDs. Since connection_half_edge_add() enforces uniqueness (which I already test for), by pigeon-hole principle with streamid_t (uint16_t), it will thus fail due when full to a duplicate stream ID in the list. So wrapping in the half-closed code itself is currently effectively tested to not be possible.
The other way that they can "wrap" is if more than 65536 streams total are split between the opened list and/or the half-closed list. Since get_unique_stream_id_by_circ() enforces uniqueness, by pigeon-hole principle, all stream IDs will be taken, so get_unique_stream_id_by_circ() should fail upon new stream creation.
However, I do not test for the uniqueness property of get_unique_stream_id(), and I don't think anything else does either. I can test get_unique_stream_id_by_circ() with a full (and full-1) half-closed array and ensure it succeeds and then fails, as a way of testing the uniqueness property as well as a full array.
I also do not test the consequences of this failure (there are 3 places in the codebase that check for it), but failure means that the circuit is marked unusable for new streams, and the stream is retried. Going farther out than one call layer from the change seems like test-creep though, so I don't plan to test all of these places. Hopefully they are covered in their respective component tests. They look simple and solid anyway tho.
Ok the following fixups have been pushed to mikeperry/ticket25573-v2:
The space is fixed in c2dbb09dc3e4ed2b9d5a845a592a2d59d2d5e225.
Using bsearch in get_unique_stream_id_by_circ() is fixed in 99153878cb4b7ae309eb7f5109747e3a49ac00e9.
Tests for stream id wrapping and other get_unique_stream_id_by_circ() cases are in b8ae08caa059dc8f937a4afe9f929035b8beb712.
I have pushed a cleanly reorganized and squashed branch to mikeperry/ticket25573-v2-squashed2 (aka https://github.com/torproject/tor/pull/296). Its has 0 delta to mikeperry/ticket25573-v2, and everything has been squashed into 4 commits (a const for smartlist_bsearch; all half-stream tracking code+tests; prevent stream-id reuse code+tests; and truncate accounting code+tests). Only the last two commits change client behavior.
Ok the following fixups have been pushed to mikeperry/ticket25573-v2:
The space is fixed in c2dbb09dc3e4ed2b9d5a845a592a2d59d2d5e225.
Using bsearch in get_unique_stream_id_by_circ() is fixed in 99153878cb4b7ae309eb7f5109747e3a49ac00e9.
These commits seem fine.
Tests for stream id wrapping and other get_unique_stream_id_by_circ() cases are in b8ae08caa059dc8f937a4afe9f929035b8beb712.
I suggest you use tor's crypto_rand() for these tests.
Here are some reasons from our ChangeLog:
- Use our own weak RNG when we need a weak RNG. Windows's rand() and Irix's random() only return 15 bits; Solaris's random() returns more bits but its RAND_MAX says it only returns 15, and so on. Motivated by the fix for bug 7801; bugfix on 0.2.2.20-alpha.
Also, I'm pretty sure some compilers on some OSes will warn if you use rand().
I have pushed a cleanly reorganized and squashed branch to mikeperry/ticket25573-v2-squashed2 (aka https://github.com/torproject/tor/pull/296). Its has 0 delta to mikeperry/ticket25573-v2, and everything has been squashed into 4 commits (a const for smartlist_bsearch; all half-stream tracking code+tests; prevent stream-id reuse code+tests; and truncate accounting code+tests). Only the last two commits change client behavior.
The old branch was cancelled on appveyor, and the new branch timed out:
Ok, I pushed b13db83b7db00d9dd8d4cd6842e5e2f5c9558542 to mikeperry/ticket25573-v2, which is e5c5f90b6311cc469befbdeb2bb31abc7d53da04 on top of mikeperry/ticket25573-v2-squashed2.
The new fully squashed branch is mikeperry/ticket25573-v2-squashed3 with pull request at https://github.com/torproject/tor/pull/297. So glad I'm not trying to also juggle a branch on top of master yet.
I chose not to bother to seed the rng because we don't have direct helpers for that (other than a small maze of AES mocking) and it doesn't affect coverage anyhow.
Ok. I have renamed the 034 branch to ticket25573-034. It is 0 delta to ticket25573-v2 and ticket25573-v2-squashed3. It lives at: https://github.com/torproject/tor/pull/299
Let me know if that merge commit to master is how we want to do things. This is my first merge across the reorg border.
If anyone wants further changes, please advise how you want me to manage the branches/merge commits (ie fixups on the merge commit, or on the previous commits, or squashed or rebased or not, etc).
As for 034 backport, I guess the key question is: are any of these things "Smaller bugs that significantly impact user experience", as per https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases. If vanguards usage counts as user experience, then I think yes. The stream id data corruption issue that this branch fixes might count, too, but it's probably rare in practice.
This looks really clean and comprehensible. Nicely done.
I'm going to merge ticket25573-master to master, and if it doesn't cause any trouble over the first couple of 0.3.5-alpha releases, we should backport.
src/test/test_relaycell.c:41:5: warning: type of ‘pathbias_count_valid_cells’ does not match original declaration [-Wlto-type-mismatch] int pathbias_count_valid_cells(origin_circuit_t *circ, ^src/feature/client/circpathbias.c:930:1: note: return value type mismatch pathbias_count_valid_cells(circuit_t *circ, const cell_t *cell) ^src/feature/client/circpathbias.c:930:1: note: type ‘void’ should match type ‘int’src/feature/client/circpathbias.c:930:1: note: ‘pathbias_count_valid_cells’ was previously declared heresrc/feature/client/circpathbias.c:930:1: note: code may be misoptimized unless -fno-strict-aliasing is used
ow; there a bunch of those mismatches. I've done a better fix as part of #27772 (moved) and backported it to bug25573-034-typefix. Not yet merged in master or 0.3.4.