Opened 7 months ago

Closed 7 months ago

#21771 closed defect (fixed)

Cannot change bridge (with Tor 0.3.0.4-rc)

Reported by: torvlnt33r Owned by: asn
Priority: Very High Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version: Tor: 0.3.0.4-rc
Severity: Normal Keywords: bridge 030-backport
Cc: asn Actual Points:
Parent ID: Points: 1
Reviewer: Sponsor:

Description

Trying to replace a bridge in torrc (that had been successfully used for a long time) with another one results in many warnings "[warn] Failed to find node for hop 0 of our path. Discarding this circuit.", and failure to connect.

Only way out found is to edit "state" file and remove the "Guard" line for the previous bridge (with "listed=0"), or come back to previous bridge option in torrc.

I found this while testing further after reporting what I initially thought was a bug in TorBrowser (https://trac.torproject.org/projects/tor/ticket/21768)

Child Tickets

Change History (9)

comment:1 Changed 7 months ago by nickm

Keywords: 030-backport added
Milestone: Tor: 0.3.1.x-final
Priority: MediumVery High

comment:2 Changed 7 months ago by nickm

Cc: asn added
Owner: set to nickm
Status: newassigned

Adding asn to cc, since he's got a pretty good sense of how this system works. Assigning it to myeslf, since it's probably my breakage on prop271.

comment:3 Changed 7 months ago by nickm

Owner: changed from nickm to asn
Points: 1

Assigning to asn with permission. (gk has bisected to confirm that it's somewhere in my original prop271 code, but asn says it's easy to reproduce, and plans to give it a try in the next couple of days.)

comment:4 Changed 7 months ago by gk

Closed #21768 as duplicate.

comment:5 Changed 7 months ago by asn

OK here is some recon.

It seems like we are continuously hitting the following code path with this bug:

[warn] Not expanding the guard sample any further; just hit the maximum sample threshold of 1

which brings us to the following code:

  smartlist_t *eligible_guards = get_eligible_guards(options, gs, &n_guards);
...
  const int max_sample = get_max_sample_size(gs, n_guards);
...
    /* Has our sample grown too large to expand? */
    if (n_sampled >= max_sample) {
      log_warn(LD_GUARD, "Not expanding the guard sample any further; "
               "just hit the maximum sample threshold of %d",
               max_sample);
      goto done;
    }

with get_max_sample_size() like this:

get_max_sample_size(guard_selection_t *gs,
                    int n_guards)
{
...
  /* With bridges, max_sample is "all of them" */
  if (using_bridges)
    return n_guards;

and get_eligible_guards() sets n_guards like this:

  if (gs->type == GS_TYPE_BRIDGE) {
    const smartlist_t *bridges = bridge_list_get();
    SMARTLIST_FOREACH_BEGIN(bridges, bridge_info_t *, bridge) {
      ++n_guards;
      if (NULL != get_sampled_guard_for_bridge(gs, bridge)) {
        continue;
      }
      smartlist_add(eligible_guards, bridge);
    } SMARTLIST_FOREACH_END(bridge);

So in our case, because we swapped bridge X for bridge Y, the above code will only do a single smartlist loop (since only Y is in our bridge list), and it will set n_guards to 1.

But since our sampled guards list already contains bridge X, tor will refuse to add bridge Y since we "just hit the maximum sample threshold of 1".

Some suggested ways forward:
a) We could reconsider how get_max_sample_size() handles the bridge case.
b) We could edit get_eligible_guards() so that it also takes into account our sampled guards and not just the size of our bridge list (i.e. how many bridges are in our torrc).

comment:6 Changed 7 months ago by nickm_mobile

I talked with asn a little and here's what I think we decided: That (a) is the better approach, and that the right solution is to make the sample size unlimited for bridges.

comment:7 Changed 7 months ago by asn

With Nick we discussed that the right fix is probably to implement the original intention of the code snippet here:

get_max_sample_size(guard_selection_t *gs,
                    int n_guards)
{
...
  /* With bridges, max_sample is "all of them" */
  if (using_bridges)
    return n_guards;

and actually make the max_sample size be "all of them" in the bridge case instead of just being the number of bridges in torrc.

Possible implementations:

a) The obvious (but slightly dirty) way of doing this would be to change return n_guards above to return INT_MAX. I verified that this would fix the bug of this ticket.

b) If we want a nicer semantic than INT_MAX we could perhaps return the number of bridges in our torrc plus the number of currently sampled bridges.

c) The third alternative would be to return DFLT_MAX_SAMPLE_SIZE but that would not actually fix all edge cases, since if our sampled guard list contains lots of old bridges we could actually hit the limit.

Personally, I think going with (a) and a small comment describing the logic is fine. Alternatively, we could go with (b) if we want to seem smarter.

I can write a quick patch here after we decide on the approach.

comment:8 in reply to:  7 Changed 7 months ago by asn

Status: assignedneeds_review

Replying to asn:

With Nick we discussed that the right fix is probably to implement the original intention of the code snippet here:

get_max_sample_size(guard_selection_t *gs,
                    int n_guards)
{
...
  /* With bridges, max_sample is "all of them" */
  if (using_bridges)
    return n_guards;

and actually make the max_sample size be "all of them" in the bridge case instead of just being the number of bridges in torrc.

Possible implementations:

a) The obvious (but slightly dirty) way of doing this would be to change return n_guards above to return INT_MAX. I verified that this would fix the bug of this ticket.

Please check my branch bug21771 for a fix using logic (a) above.

comment:9 Changed 7 months ago by nickm_mobile

Resolution: fixed
Status: needs_reviewclosed

Cherry-picked to maint-0.3.0 and merged forward!

Note: See TracTickets for help on using tickets.