Opened 11 years ago

Last modified 7 years ago

#779 closed defect (Fixed)

Don't cross the exit streams!

Reported by: arma Owned by:
Priority: High Milestone:
Component: Core Tor/Tor Version: 0.1.2.19
Severity: Keywords:
Cc: arma, nickm, SATtva, slush, hanss Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

slush reports in
http://archives.seul.org/or/talk/Jul-2008/msg00061.html

that when he makes 300 circuits in a quick amount of time, then
refreshes his page, he gets a page with some content from a different
stream in it. See e.g. http://www.slush.cz/centrumyahoo.png for an
example.

Jens has independently reported this too.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (18)

comment:1 Changed 11 years ago by arma

nickm> here's a better plan: confirm whether this is or is not happening at
cell boundaries.
nickm> if it is, then it suggests a much smaller range of possible failure
points

perhaps sebastian's test network could be made with a hundred clients who
each fetch pages with all 1's, all 2's, etc

nickm> suppose that a circuit is torn down and another is created right away
on the same conn with the same circuit ID... or suppose the same thing
happens with streams...
nickm> I wonder what will happen.

in theory circuits choose the next circuitid in order, so for that to happen
in practice, it would have to choose in a nonstandard way,
or we will have to have most of the circuitids in use, so when we close one
then that id is the next one available.
nickm> slush *is* making a whole lot of circuits. this might have something
to do with it

nickm> circuit_get_by_circid_orconn() returns null if the circuit is marked
for close
nickm> and command_process_create_cell()'s duplicate circid test depends uses
circuit_get_by_circid_orconn()
nickm> So it is indeed possible to have two circuits with the same circid and
orconn exist at once, if one is marked for close
nickm> I do not immediately see how this might cause the bug

comment:2 Changed 11 years ago by nickm

Also to note: all 3 of the exit nodes that slush was using are running 0.1.2.18/0.1.2.19. It's conceivable that
this bug is already fixed in trunk or 0.2.0.x, so we should make sure that we're looking at the 0.1.2.x series for
this.

[The entry nodes are 0.2.0.28-rc and 0.1.2.19, but I'm pretty sure that if this is a server-side bug, it must be at the
exit node: since the client can decrypt data from the wrong page, the data must be getting encrypted at the exit server
with the right list of keys for their circuit.]

comment:3 Changed 11 years ago by nickm

circuit_close_all_marked is indeed called only once per second...

comment:4 Changed 11 years ago by nickm

This is a partial analysis. Keep in mind that some of the statements about the code might be wrong. (After all,
bugs don't happen when everything in the code does what we think it should.)

Anyway, if I am reading the code right, 0.1.2.x and 0.2.0.x use circuit_get_by_edge_conn() to decide
where to send data cells from exit connections. (See connection_edge_package_raw_inbuf in relay.c.) This function
in turn uses the edge_connection_t->on_circuit field.

on_circuit is set from connection_exit_begin_conn(), which gets a circuit as an argument from
connection_edge_process_relay_cell(), which gets it from circuit_receive_relay_cell(), which gets it from
command_process_relay_cell(), which (finally!) gets it by calling circuit_get_by_circid_orconn().

comment:5 Changed 11 years ago by slush

Hi everybody. As I wrote to or-talk, I will prepare better testing scenario and will try to bring better materials
to analyze. Unfortunately, Im out of Internet for this week. Please write down, what you think it is important and
should have impact to testing results.

My ideas: Version of Tor on exit node. I will test separately 0.1.x and 0.2.x servers.
I will not use Privoxy and browser, but direct socket access.
I will try, if bug depends on many socket to specific exit relay (so I will open many circuits to specific
exit node instead three exit nodes like original scenario).
I will try, if bug appears randomly (from some number of circuits) or if probability depends on number of
circuits (for example, if there is really problem on marking circuits as closed, there should be better
probability, if there will be thousands of circuits).

Any other ideas?

PS: If there exist any private testing network, it can be interesting to run tests on circuits with
computers under our/your control, as somebody wrote.

comment:6 Changed 11 years ago by nickm

For my analysis above, it would be good to know about how many TCP streams you need to try before the corruption
occurs, and how often it occurs. If it usually takes a lot of streams to make the bug appear, that would be
consistent with my theory above. If not, then something else is going on.

I've committed a bugfix as r16136 that will prevent the circid collision from occurring; if the circid collision is
the cause of this bug, and we can verify that, we should backport that fix and we'll be done.

comment:7 Changed 11 years ago by nickm

Backported the r16136 fix to 0.2.0.x as r16463 and to 0.1.2.x as r16464.

comment:8 Changed 11 years ago by slush

Nick: Thank you...

Well... Im working on demo in these days. It is still helpful or not?

comment:9 Changed 11 years ago by nickm

It would be very helpful if you can find out whether this bug occurs, or does not occur, with patched exit nodes.

Specifically, if the bug still occurs with exit nodes running 0.2.1.3-alpha or later, that will tell us that the
fix I did was not good enough. But if the bug _doesn't_ occur with those "fixed" exit nodes, then we will know
we fixed it.

The "fix" will also be in 0.2.0.31 and 0.1.2.20, once those are out.

comment:10 Changed 11 years ago by hanss

Made a small tar.gz ( about 2 M) about reproducing the mixing of streams on tortila.
It isn't really much content, except the mixed up pages as complete webpages. The method is trivial.
If this is of interest, where can I send this to for the guys here to see? Mail address?

comment:11 Changed 11 years ago by slush

Hans, send me that to slush at centrum dot cz please. I will upload to my server.

comment:12 Changed 11 years ago by hanss

To put this somewhere public is not the problem, but I think at present it might be a good idea for now to let this
information be with those who might be able to fix the bug for. Objections ?

comment:14 Changed 11 years ago by nickm

So we've fixed one bug that could cause this, and failed to find others despite serious looking. I'm changing the
bug status to "requires testing", since we need to confirm that we really fixed this bug.

comment:15 Changed 11 years ago by slush

Hi everybody,

after many tests with original python code, it looks like everything is working well now. To be honest, Im not able to
repeat bug on any configuration at this time.

I think, that it can be related with "Tortila as a bad exit" issue
(http://archives.seul.org/or/talk/Aug-2008/msg00089.html etc.), because Tortila was one of exit nodes in my exploit.
Tortila is not anymore online, so I cannot verify my suggestion.

Because Im not able to repeat bug in any combination of exit node versions, Im voting for change bug status to "SOLVED".

Thanks,
Marek

comment:16 Changed 10 years ago by nickm

Marking as fixed. This was maybe Tortila's fault, or maybe Tortila's ISP's fault, or maybe the fault of the
circuit/stream bug I fixed in August. Or maybe not, in which case we should reopen this bug if it reoccurs.

comment:17 Changed 10 years ago by nickm

flyspray2trac: bug closed.

comment:18 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.