Opened 5 months ago

Closed 5 months ago

#24895 closed defect (fixed)

MAX_REND_FAILURES is 1, but we will try three times

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 032-backport, 031-backport, 030-backport, 029-backport, 025-backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: dgoulet Sponsor:

Description (last modified by arma)

Off-by-2 error counting to MAX_REND_FAILURES for onion services:

In can_relaunch_service_rendezvous_point(), we check

  if (circ->build_state->failure_count > MAX_REND_FAILURES ||
      circ->build_state->expiry_time <= time(NULL)) {

to decide whether to abort the relaunch.

But the incrementing of failure_count happens in the relaunch, i.e. after this code. Also, it says ">" rather than ">=".

Yet the definition of MAX_REND_FAILURES is

/** How many times will a hidden service operator attempt to connect to a
 * requested rendezvous point before giving up? */
#define MAX_REND_FAILURES 1

So when the first attempt fails, failure_count is 0, which is not > 1, so we try again. When the second attempt fails, failure_count is 1, which is also not > 1. It's only after the third attempt fails that we decide that MAX_REND_FAILURES has been reached.

This bug affects legacy onion services, and it will also affect nextgen onion services once #24894 is fixed.

Child Tickets

Change History (13)

comment:1 Changed 5 months ago by arma

Description: modified (diff)

comment:2 Changed 5 months ago by arma

Status: newneeds_review

I made a bug24895 branch in my arma. It's based on maint-0.2.9, since I think fixing it that far back is wise given that 0.2.9 will remain LTS for so long.

There will probably be some simple conflicts as we forward-port, but it's a one-liner plus a comment so hopefully it won't be too bad.

Really I want to change the logic around so we increment the failure count earlier in the function, but I think we should wait until 0.3.3 for that refactoring.

comment:3 Changed 5 months ago by dgoulet

Cc: dgoulet removed
Keywords: 031-backport 030-backport 029-backport 025-backport added
Reviewer: dgoulet
Status: needs_reviewmerge_ready

We'll need to backport this back to 025... Shouldn't be complicated.

I agree with Roger here that ideally we would increment the failure count much earlier that is before the check is done on the failure count.

But this is simple enough for now to backport far away and it has a comment explaining why so we aren't lost about this gymnastic.

This does affect both v2 and v3 where the code changed in 032 but not the logic and code flow.

lgtm;

comment:4 Changed 5 months ago by teor

Status: merge_readyneeds_revision

I think reducing the effective value of MAX_REND_FAILURES from 3 to 1 will have serious reachability consequences for some onion services.

Do you think we should set MAX_REND_FAILURES to 2 instead?

comment:5 Changed 5 months ago by arma

Stay tuned for a revised branch where I propose that we make it into a consensus param, which defaults to 2 but which we set to 1 in the consensus for the next while.

comment:6 Changed 5 months ago by dgoulet

Datapoint: In months of running v2 and v3 onions, I've seen twice a relaunch of a rendezvous point circuit. If the Guard can keep up with the service circuit creation, it is something I've rarely seen failing.

Ok agree that perhaps having two tries to reach the RP is what I think we should have in normal circumstances, not only 1 which is the current patch. I like the idea of having a consensus parameters so we can adjust accordingly depending on the network load.

But for the current network situation, I think we want to bring it down to 1 for now because right now 1 million clients introducing would be inducing two million circuits by the services they are trying to reach. At that scale, I'm ready to call it amplification attack vector.

comment:7 Changed 5 months ago by arma

Status: needs_revisionneeds_review

My bug24895-029 branch adds this consensus param, and applies to maint-0.2.9. (I don't think any relevant huge services still run on 0.2.5, so I am fine leaving oldoldstable alone here.)

I picked a max of 10 (out of my hat, "chosen by fair dice roll"), and a default of 2 (based on discussion above). I could maybe have picked a default of 1, since we're going to deploy 1 on the network as soon as this patch goes out, and we have no plans to change it from 1. But I went with 2 in case we later decide to try it.

I also went with the consensus param name "maxrendfailures", which is maybe a bit vague because it doesn't specify whether I mean the client side or the service side. If somebody likes a different size bike shed, now is the time to speak up. :)

comment:8 Changed 5 months ago by dgoulet

This looks good! Ok, so I made two tiny winy changes here and I'll explain why. They are in the fixup commit 9e3b63b041f0e60e:

  1. I changed the name maxrendfailures to hs_service_max_rdv_failures which, like you said above, is to specifically point it out to be service side. Secondly, because this needs to be merged forward, with v3 in 032, that value is used by both versions and moved to hs_common.h so the naming is important to clearly name space it to the "hs" subsystem and identify it as service. Third, rend in the code is associated with v2 thus the use of rdv instead. Finally, I think a name like that will be more explicit when read from the consensus or torrc file by the dirauth.
  1. I simply made the default, min and max values a #define which, when merging forward, will be moved to hs_common.c.

This patch won't merge forward cleanly because of the big v3 change starting in 031 so here are the respective branches per version.

NOTE: The fixup commit from above is in all the branches so --autosquash might be desirable before merging. And I hope I made the merge forward OK... Let me know if anything goes wrong. There will be a very easy conflict from 029 to 030.

Branch: ticket24895_029_01, ticket24895_031_01 and ticket24895_032_01
Spec branch: ticket24895_01

comment:9 Changed 5 months ago by nickm

For other reviewers: The tor branches are called bug... not ticket...

comment:10 Changed 5 months ago by nickm

Status: needs_reviewmerge_ready

These look fine to me. Roger, do you like them?

comment:11 Changed 5 months ago by arma

Sure, sounds great. Thanks.

(Once they're all merged, I'll go add the oxford comma to "Default, minimum and maximum" in master. :)

comment:12 Changed 5 months ago by dgoulet

I've squashed the fixup commit and created _02 branches. The git diff should be zero with the _01 branches.

Last thing, I edited the last commit to reflect the new consensus parameter name.

bug24895_029_02, bug24895_031_02 and bug24895_032_02

comment:13 Changed 5 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

woo. Merging now!

Note: See TracTickets for help on using tickets.