Opened 7 years ago

Closed 7 years ago

#7691 closed enhancement (implemented)

Path bias code should probe unusable circuits

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: MikePerry201212
Cc: nickm Actual Points: 6
Parent ID: Points:
Reviewer: Sponsor:

Description

There are a couple of cases where the path bias "use" accounting from #7440 can run into issues. In particular, circuits used in attempts to connect to unresponsive external hosts are indistinguishable from malicious failure. Also, cannibalized circuits have a similar problem, in that they are technically immediately "dirty" but they are actually unused.

So the plan is to issue a probe RELAY_BEGIN cell upon circuit close to an internal address such as 0.a.b.c:25. This will cause well-behaved exit nodes to kick us an EXITPOLICY RELAY_END cell back, which we can then use to declare the circuit as functional, avoiding the path bias false positive.

For some additional best-practice checks, we should perhaps locally track the a.b.c tuple for each probe to ensure it is the same in the response (yes, the IP is echoed, but not the port), and we should ensure no other unexpected/corrupted RELAY cells arrive on that same circuit, otherwise we should close it and mark it failed. Hopefully this latter property is already always enforced. If not, we should probably enforce it while we're at it.

Child Tickets

Change History (10)

comment:1 Changed 7 years ago by mikeperry

Actual Points: 4
Status: newneeds_review

Ok, I pushed a draft branch that seems to work to mikeperry/bug7691. I have some questions in the XXX's, and it would also be useful to know if what I've done is totally insane.

The branch is based off of mikeperry/bug7341, and may get rebased (since the same tests apply for this bug, #7341, #7440, and #7157).

I'll probably test this more thoroughly overnight.

comment:2 Changed 7 years ago by mikeperry

In the next couple days, I will probably have to rebase this on top of the fixed 209-path-bias-changes from Nick's reviews in #7157.

However, for the record, the current tip of mikeperry/bug7961 (71aa4f3a0213b23d56807d86229154b83758e708) has survived ~12,000 circuit constructions and usages over a period of a few days (hidserv client and server, ipv4, DNS success, DNS fail). This was after previous extensive testing uncovered a number of edge cases in this code and 209, which I also fixed by rebasing this branch.

comment:3 Changed 7 years ago by nickm

For the first commit on the branch: What's with the denominator on all of the 14/6, 9/6, 10/6 fractions? Where does that 6 come from? (And for that matter, where do the numerators come from?)

What's the %hhd format? It sounds like it should be a short short int, but that doesn't make sense. You can just format shorts as ints, since integer types smaller than int are promoted to int as part of calling a varargs function. the %hd format is only for scanf.

As you tested this, what kind of /what amount testing did you do with hidden services?

comment:4 Changed 7 years ago by nickm

Also: the indentation on the block after "/* Log an info message if we're going to launch a new intro circ in parallel */" seems to have gotten messed up.

Also: if an introduction never actually happens, do the circuits eventually time out? If so, what kills them?

Will read the 2nd commit next.

comment:5 in reply to:  3 ; Changed 7 years ago by mikeperry

Hrmm, Technically this is all #7341 material below. I guess that's my fault for not making my email clear to you that two bugs were in this branch (one per commit).

Replying to nickm:

For the first commit on the branch: What's with the denominator on all of the 14/6, 9/6, 10/6 fractions? Where does that 6 come from? (And for that matter, where do the numerators come from?)

See the diagram and the cases in the comment immediately above those calculations. 6 is the number of hop-RTTs in a normal 3 hop circuit construction, due to telescoping. The other timeouts are scaled by the number of hop-RTTs they have relative to a normal circuit.

To get the numerators, use the diagram at the top of the comment and imagine counting how many times cells cross each of those links between hops A, B, and C during construction for each case. Then add up the hop-RTT sums for each link a, b, c. Then assume all links are equal.

What's the %hhd format? It sounds like it should be a short short int, but that doesn't make sense. You can just format shorts as ints, since integer types smaller than int are promoted to int as part of calling a varargs function. the %hd format is only for scanf.

This is probably just cruft. I was getting what seemed to be nonsense value for purpose in the logs at one point, so as a wild guess I tried to be precise in case there was va_list weirdness... My printf manpage says hhd is for int8_t, is what the args are.

To be honest, I forget what the real problem was here. I *think* I was getting nonsense values because I was misbraining the cases in the crazy state machine in circuit_expire_building() and had some bug in how I was changing the states around. Whatever it was, I fixed it somehow :).

As you tested this, what kind of /what amount testing did you do with hidden services?

I tested the code on both client and server side for hidden service connections (wget loop). I did several thousand connections on both sides over the course of revising these commits. For the most part, I tested with 'MaxCircuitDirtiness 10' so as to perform as many intro+rend constructions as possible.

Also: if an introduction never actually happens, do the circuits eventually time out? If so, what kills them?

They should get killed at the "hs_extremely_old_cutoff" value, which is a bit long (something like 2 minutes under normal circumstances), but it does kill them. IIRC, this code was exercised in my live tests.

I also did watch for leaked circuits during testing. I did have to fix a couple cases that caused them in early revisions (this is what led me to discover and subsequently rip out Roger's old implementation of the idea), but there no cases I observed in the 12k+ run I mentioned above, nor for several previous runs as I worked on the code for #7691.

comment:6 in reply to:  5 Changed 7 years ago by rransom

Replying to mikeperry:

I tested the code on both client and server side for hidden service connections (wget loop). I did several thousand connections on both sides over the course of revising these commits. For the most part, I tested with 'MaxCircuitDirtiness 10' so as to perform as many intro+rend constructions as possible.

What does ‘MaxCircuitDirtiness 10’ mean for hidden-service rendezvous circuits today?

comment:7 Changed 7 years ago by nickm

Status: needs_reviewneeds_revision

Reading the second one now.

--

 /// XXX: Generate a random 0.a.b.c address

This is trivial to do. It could be approximately:

  char *probe_nonce;
  uint32_t addr;
  crypto_rand(&addr, sizeof(addr));
  addr &= 0x00ffffff;
  probe_nonce = tor_dup_ip(addr);

(unchecked). The port should probably be random too, right?

-- The documentation is sparser than I would prefer. Like, "Sends a probe down a circuit that wasn't usable." Not usable how? A probe of what type? For what purpose? (I know the answers, but a reader later on is going to have to figure out what's going on here.)

-- Forcing the response length to be 9 seems wrong. As a rule, we allow more bytes than expected and ignore them.

-- No raw memcmp calls; either tor_memeq() or fast_memeq() will be correct.

-- The cell_t* argument to pathbias_check_probe_response should probably be const.

Other than that, it seems plausible so far.

comment:8 in reply to:  7 Changed 7 years ago by mikeperry

Replying to nickm:

Reading the second one now.

--

 /// XXX: Generate a random 0.a.b.c address

This is trivial to do. It could be approximately:

  char *probe_nonce;
  uint32_t addr;
  crypto_rand(&addr, sizeof(addr));
  addr &= 0x00ffffff;
  probe_nonce = tor_dup_ip(addr);

Implemented and tested. Seems to work fine. Thanks for pointing out the helpers. I am obviously somewhat ignorant to most of the support stuff in src/common.

(unchecked). The port should probably be random too, right?

I opted not to randomize the port because it is not echoed in the reply. It seemed not to add any actual security for this reason, and keeping it set at 25 saved me from worrying about potential special cases that might exist (like 0?).

-- The documentation is sparser than I would prefer. Like, "Sends a probe down a circuit that wasn't usable." Not usable how? A probe of what type? For what purpose? (I know the answers, but a reader later on is going to have to figure out what's going on here.)

-- Forcing the response length to be 9 seems wrong. As a rule, we allow more bytes than expected and ignore them.

-- No raw memcmp calls; either tor_memeq() or fast_memeq() will be correct.

-- The cell_t* argument to pathbias_check_probe_response should probably be const.

Other than that, it seems plausible so far.

I think all of these should be fixed in mikeperry/bug7691-rebased. I rebased it to origin/master to retest it briefly, but it should be the same two commits plus one commit each for the review changes for each bug (#7341 and this one). The top commit hash is d05ff310a5547b15433314617d6f1b9e9ccfe5b8.

comment:9 Changed 7 years ago by mikeperry

Actual Points: 46
Status: needs_revisionneeds_review

comment:10 Changed 7 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

okay. I don't see any more bugs here; if I missed some, I guess we'll find them as we go. Merging.

Note: See TracTickets for help on using tickets.