Opened 2 months ago

Closed 4 weeks ago

#21052 closed defect (fixed)

Bad prop271 behavior when exhausting all guards

Reported by: asn Owned by: asn
Priority: High Milestone: Tor: 0.3.0.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: prop271, tor-guard, regression
Cc: Actual Points: 0.6
Parent ID: #20822 Points: 1.5
Reviewer: nickm Sponsor:

Description

Here is a weird prop271 behavior. If you get Tor to exhaust all of its
primary/confirmed/sampled guards, it will just get stuck until a guard gets
marked for retry (which can take half an hour or more).

Specifically, if you disable the network until Tor starts hitting:

    if (guard == NULL) {
      log_warn(LD_GUARD, "Absolutely no sampled guards were available.");
      return NULL;
    }

it will just get stuck in a "Absolutely no sampled guards were available."
loop until a guard gets marked for retry.

In our pre-prop271 guard algorithm, we used to mark all guards as retriable if
we exhaust all of them. I think this is a strictly better behavior than just
waiting until a guard retry timeout triggers.

Furthermore in our pre-prop271 guard algorithm, when we exhaust all of our
guards, we mark our network as likely-down. The idea is that if our network was
marked as likely-down and then we managed to connect to a guard, we would treat
that as a network-up event and then start trying guards from the top of our
list.

This was a pretty effective heuristic that really saved lots of guard exposure
to people with unstable internet. We have a similar one for prop271 but it's
only based on time (get_internet_likely_down_interval() etc.) and not on behavior. I think doing
this old heuristic in prop271 might be a great idea. Here is how it could work:

When we exhaust all our guards, we mark all guards as retriable. The next time
we manage to connect to a guard, we stall the circuit and we call
mark_primary_guards_maybe_reachable() so that we attempt to connect to our
primaries again, before using that low-priority guard.

Child Tickets

Change History (8)

comment:1 Changed 2 months ago by nickm

  • Keywords regression added
  • Milestone changed from Tor: 0.3.1.x-final to Tor: 0.3.0.x-final
  • Owner set to nickm
  • Priority changed from Medium to High
  • Status changed from new to accepted

comment:2 Changed 2 months ago by nickm

  • Points changed from 0.7 to 1.5

comment:3 Changed 6 weeks ago by asn

  • Actual Points set to 0.6
  • Status changed from accepted to needs_review

Please check branch bug21052 in my repo for a simple fix where it marks all guards for retry when we exhaust our sampled set. I tested it and it seems to work. Will be doing tests with my Tor browser the following days.

comment:4 Changed 6 weeks ago by nickm

  • Owner changed from nickm to asn
  • Reviewer set to nickm
  • Status changed from needs_review to assigned

comment:5 Changed 6 weeks ago by nickm

  • Status changed from assigned to needs_review

comment:6 follow-up: Changed 4 weeks ago by nickm

Yes, I think this is a good approach. Nice code reuse too. I've merged it to master.

I do wonder about whether this can ever cause us to hammer on all of our sampled guards, over and over. But maybe that suggest we should rate-limit circuits when we seem to be down or something. In any case, it's not a regression from 0.2.9.x behavior as far as I can tell.

As a followup, could you please write a quick note about this in guard-spec.txt before we close the ticket?

comment:7 in reply to: ↑ 6 Changed 4 weeks ago by asn

Replying to nickm:

Yes, I think this is a good approach. Nice code reuse too. I've merged it to master.

I do wonder about whether this can ever cause us to hammer on all of our sampled guards, over and over. But maybe that suggest we should rate-limit circuits when we seem to be down or something. In any case, it's not a regression from 0.2.9.x behavior as far as I can tell.

As a followup, could you please write a quick note about this in guard-spec.txt before we close the ticket?

ACK. Please see branch bug21052 in my torspec repo.

comment:8 Changed 4 weeks ago by nickm

  • Resolution set to fixed
  • Status changed from needs_review to closed

Thanks! Merged!

Note: See TracTickets for help on using tickets.