Track half-closed stream IDs

changed milestone to %Tor: 0.3.4.x-final

added 034-backport component::core tor/tor guard-discovery-stats milestone::Tor: 0.3.4.x-final parent::25574 priority::medium resolution::fixed reviewer::teor severity::normal sponsor::V-can status::closed type::defect labels

Trac:
Parent: N/A to #25574 (moved)

Trac:
Milestone: N/A to Tor: unspecified
Keywords: guard-discovery-stats deleted, guard-discovery-stats 035-roadmap-proposed needs-proposal added

Trac:
Cc: N/A to dmr

Better idea: since malicious endpoints can withhold the ACK, we might as well just introduce a half-closed state for edge connections. With a simple struct, we can keep track of the outstanding END, SENDME windows, and data windows, and decrement those, and treat packets within the window as valid, until we get an END from the other side.

This also has the advantage that we don't have to wait for anyone to upgrade.

Trac:
Summary: Create a RELAY_COMMAND_END_ACK cell type to Track half-closed stream IDs

Wrt DropMark, we can still count very early invalid cells as dropped in the vanguards addon (and eventually in tor-core), and react to those. This behavior won't prevent that, since there will be no streams at that point.

Trac:
Description: In order to eliminate a side channel attack described in https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf we need a way to determine if a stream id is invalid.

Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.

A similar situation exists for spurious RELAY_ENDs.

This isn't a full fix, because malicious relays can withhold the ack, but having it in place would simplify a lot of the code for dealing with unexpected packets.

to

In order to eliminate a side channel attack described in https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf we need a way to determine if a stream id is invalid.

Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.

A similar situation exists for spurious RELAY_ENDs.

Replying to mikeperry:

Wrt DropMark, [...]

Adding parenthetical to tie that term 'DropMark' to the paper (it might not otherwise be obvious by context).

Trac:
Description: In order to eliminate a side channel attack described in https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf we need a way to determine if a stream id is invalid.

Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.

A similar situation exists for spurious RELAY_ENDs.

to

In order to eliminate a side channel attack described in https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf ("DropMark" attack) we need a way to determine if a stream id is invalid.

Many clients (particularly Firefox) will hang up on streams that still have data in flight. In this case, Tor clients send RELAY_COMMAND_END when they are done with a stream, and immediately remove that stream ID from their valid stream mapping. The remaining application data continues to arrive, but is silently dropped by the Tor client. The result is that this ignored stream data currently can't be distinguished from injected dummy traffic with completely random stream IDs, and this fact can be used to mount side channel attacks.

A similar situation exists for spurious RELAY_ENDs.

https://github.com/torproject/tor/pull/262

Removing the needs-proposal and 035 tags because this version of the fix does not change the protocol or even change client behavior. It only changes the values reported to the CIRC_BW event for the vanguards controller to react to.

As such it would be great if we could backport this to 0.3.4 so that we can eliminate these classes of false positives in the vanguard addon. I have tested it with vanguards and it fixes both https://github.com/mikeperry-tor/vanguards/issues/3 and https://github.com/mikeperry-tor/vanguards/issues/25, which I am able to easily reproduce otherwise.

Trac:
Keywords: guard-discovery-stats 035-roadmap-proposed needs-proposal deleted, guard-discovery-stats 034-backport added
Status: new to needs_review
Milestone: Tor: unspecified to Tor: 0.3.4.x-final
Cc: dmr to dmr, asn

Trac:
Reviewer: N/A to teor

teor: If you care, this needs another block of the same checks when the circuit is in purpose PATH_BIAS_TESTING. I also want to reorg the branch a bit when I do that.

Despite the backport snafu, I would like review on 0.3.4 branch until we're satisfied with it. I imagine review will generate a bunch of changes that will be a pain to fix on both branches, and I'd like to do 0.3.4 first so I can keep testing it with the vanguards addon as we go. Then we can merge a rebased version of that result forward to master with just one merge commit for a spot-check as opposed to juggling fixups on both branches and generating a lot of confusion.

Trac:
Status: needs_review to needs_revision

https://github.com/torproject/tor/pull/270 - Ok new reorganized pull request.

Trac:
Status: needs_revision to needs_review

Please see my comments on the pull request.

Overall, this code seems ok, but it's very large for a backport to a release candidate.

(I understand you've already spoken with Nick on IRC about backporting the code. I don't think this ticket is a good place for backport conversations. Perhaps sending an email to tor-dev would be a good way of addressing concerns?)

Trac:
Status: needs_review to needs_revision

Ok I have pushed fixup commits for your code review comments and linked them in reply on the github pull request review.

Additionally, I pushed a squashed branch to mikeperry/ticket25573-v2-squashed, for any fresh review and for reference in backport discussion (which I think I'll take up on network-team).

Trac:
Status: needs_revision to needs_review

Ok I also decided to go ahead and sort the list of half-closed connections, so we can bsearch them on the common operations of cell arrival, and also guard against duplicate stream ids. I added tests for duplicates and list management correctness as well. These are additional commits on each branch (the pull request with the fixups from the first review, and also as separate commits on top of mikeperry/ticket25573-v2-squashed).

When we think this is merge-ready, I can do the rebase dance so we actually merge to master, but I'd still like to avoid doing that until we're sure we like it.

Please make check-spaces:

perl ./scripts/maint/checkSpace.pl -C \
		./src/common/*.[ch] \
		./src/or/*.[ch] \
		./src/test/*.[ch] \
		./src/test/*/*.[ch] \
		./src/tools/*.[ch]
...
 DoubleNL:./src/test/test_relaycell.c:530
make[1]: *** [check-spaces] Error 1
make[1]: *** Waiting for unfinished jobs....

https://travis-ci.org/torproject/tor/jobs/421209942#L3699

Trac:
Status: needs_review to needs_revision

The branch looks good now.

Here are my remaining minor concerns:

Please make check-spaces
Do you test what happens when stream ids wrap? Do you think that's important now with all the other changes?

Replying to teor:

The branch looks good now.

Here are my remaining minor concerns:

Please make check-spaces

Fixed, sorry about that.

Do you test what happens when stream ids wrap? Do you think that's important now with all the other changes?

There are a couple of different types of "wrapping". One is if we pick the initial stream_id for a circuit near UINT16_MAX, and that wraps around. This wrap will happen in get_unique_stream_id_by_circ(), which I don't currently test, but should be fine. I can easily test this, and I will in a fixup. (I also just realized I forgot to switch to bsearch in get_unique_stream_id_by_circ(), so I should do that and test it).

connection_half_edge_add() itself should succeed until the opened list is empty and the half-closed list is full of 65536 IDs. Since connection_half_edge_add() enforces uniqueness (which I already test for), by pigeon-hole principle with streamid_t (uint16_t), it will thus fail due when full to a duplicate stream ID in the list. So wrapping in the half-closed code itself is currently effectively tested to not be possible.

The other way that they can "wrap" is if more than 65536 streams total are split between the opened list and/or the half-closed list. Since get_unique_stream_id_by_circ() enforces uniqueness, by pigeon-hole principle, all stream IDs will be taken, so get_unique_stream_id_by_circ() should fail upon new stream creation.

However, I do not test for the uniqueness property of get_unique_stream_id(), and I don't think anything else does either. I can test get_unique_stream_id_by_circ() with a full (and full-1) half-closed array and ensure it succeeds and then fails, as a way of testing the uniqueness property as well as a full array.

I also do not test the consequences of this failure (there are 3 places in the codebase that check for it), but failure means that the circuit is marked unusable for new streams, and the stream is retried. Going farther out than one call layer from the change seems like test-creep though, so I don't plan to test all of these places. Hopefully they are covered in their respective component tests. They look simple and solid anyway tho.

Ok the following fixups have been pushed to mikeperry/ticket25573-v2:

The space is fixed in c2dbb09dc3e4ed2b9d5a845a592a2d59d2d5e225.

Using bsearch in get_unique_stream_id_by_circ() is fixed in 99153878cb4b7ae309eb7f5109747e3a49ac00e9.

Tests for stream id wrapping and other get_unique_stream_id_by_circ() cases are in b8ae08caa059dc8f937a4afe9f929035b8beb712.

I have pushed a cleanly reorganized and squashed branch to mikeperry/ticket25573-v2-squashed2 (aka https://github.com/torproject/tor/pull/296). Its has 0 delta to mikeperry/ticket25573-v2, and everything has been squashed into 4 commits (a const for smartlist_bsearch; all half-stream tracking code+tests; prevent stream-id reuse code+tests; and truncate accounting code+tests). Only the last two commits change client behavior.

Trac:
Status: needs_revision to needs_review

Replying to mikeperry:

Ok the following fixups have been pushed to mikeperry/ticket25573-v2:

The space is fixed in c2dbb09dc3e4ed2b9d5a845a592a2d59d2d5e225.

Using bsearch in get_unique_stream_id_by_circ() is fixed in 99153878cb4b7ae309eb7f5109747e3a49ac00e9.

These commits seem fine.

Tests for stream id wrapping and other get_unique_stream_id_by_circ() cases are in b8ae08caa059dc8f937a4afe9f929035b8beb712.

I suggest you use tor's crypto_rand() for these tests.

Here are some reasons from our ChangeLog:

    - Use our own weak RNG when we need a weak RNG. Windows's rand() and
      Irix's random() only return 15 bits; Solaris's random() returns more
      bits but its RAND_MAX says it only returns 15, and so on. Motivated
      by the fix for bug 7801; bugfix on 0.2.2.20-alpha.

Also, I'm pretty sure some compilers on some OSes will warn if you use rand().

I have pushed a cleanly reorganized and squashed branch to mikeperry/ticket25573-v2-squashed2 (aka https://github.com/torproject/tor/pull/296). Its has 0 delta to mikeperry/ticket25573-v2, and everything has been squashed into 4 commits (a const for smartlist_bsearch; all half-stream tracking code+tests; prevent stream-id reuse code+tests; and truncate accounting code+tests). Only the last two commits change client behavior.

The old branch was cancelled on appveyor, and the new branch timed out:

So I pushed your branch to my github to trigger another build:

https://ci.appveyor.com/project/teor2345/tor/build/1.0.102

And made myself an appveyor admin so I could trigger future builds more easily (#27372 (moved)).

Trac:
Status: needs_review to needs_revision

Ok, I pushed b13db83b7db00d9dd8d4cd6842e5e2f5c9558542 to mikeperry/ticket25573-v2, which is e5c5f90b6311cc469befbdeb2bb31abc7d53da04 on top of mikeperry/ticket25573-v2-squashed2.

The new fully squashed branch is mikeperry/ticket25573-v2-squashed3 with pull request at https://github.com/torproject/tor/pull/297. So glad I'm not trying to also juggle a branch on top of master yet.

I chose not to bother to seed the rng because we don't have direct helpers for that (other than a small maze of AES mocking) and it doesn't affect coverage anyhow.

Trac:
Status: needs_revision to needs_review

Ok, looks good to me.

Start the merge discussion :-)

Trac:
Status: needs_review to merge_ready

Ok. I have renamed the 034 branch to ticket25573-034. It is 0 delta to ticket25573-v2 and ticket25573-v2-squashed3. It lives at: https://github.com/torproject/tor/pull/299

I created a merge commit to master from that branch as ticket25573-master: https://github.com/torproject/tor/pull/298

Let me know if that merge commit to master is how we want to do things. This is my first merge across the reorg border.

If anyone wants further changes, please advise how you want me to manage the branches/merge commits (ie fixups on the merge commit, or on the previous commits, or squashed or rebased or not, etc).

As for 034 backport, I guess the key question is: are any of these things "Smaller bugs that significantly impact user experience", as per https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases. If vanguards usage counts as user experience, then I think yes. The stream id data corruption issue that this branch fixes might count, too, but it's probably rare in practice.

This looks really clean and comprehensible. Nicely done.

I'm going to merge ticket25573-master to master, and if it doesn't cause any trouble over the first couple of 0.3.5-alpha releases, we should backport.

src/test/test_relaycell.c:41:5: warning: type of ‘pathbias_count_valid_cells’ does not match original declaration [-Wlto-type-mismatch]
 int pathbias_count_valid_cells(origin_circuit_t *circ,
     ^
src/feature/client/circpathbias.c:930:1: note: return value type mismatch
 pathbias_count_valid_cells(circuit_t *circ, const cell_t *cell)
 ^
src/feature/client/circpathbias.c:930:1: note: type ‘void’ should match type ‘int’
src/feature/client/circpathbias.c:930:1: note: ‘pathbias_count_valid_cells’ was previously declared here
src/feature/client/circpathbias.c:930:1: note: code may be misoptimized unless -fno-strict-aliasing is used

Added a fix in bug25573-034-typefix; merged to master.

ow; there a bunch of those mismatches. I've done a better fix as part of #27772 (moved) and backported it to bug25573-034-typefix. Not yet merged in master or 0.3.4.