Opened 5 months ago

Closed 5 months ago

#25843 closed defect (fixed)

Make NumEntryGuards consistent with #271 consensus params

Reported by: mikeperry Owned by:
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: asn, arma, nickm Actual Points:
Parent ID: Points:
Reviewer: nickm Sponsor:

Description

Right now, NumEntryGuards tries to do some magic to set "guard-n-primary-guards-to-use" to the torrc value, and "guard-n-primary-guards" to twice that value.

This prevents us from testing our current favored Proposal 291 params (2 for each). So the torrc option could either be changed to set both of those values to the same, or we could make two torrc options.

We should also ensure that whatever we do, we have the ability to set the torrc such that we could get the same behavior as *all* existing clients would get, with those consensus params, if we wanted. This includes directory guards (which we currently believe will and should be the same as the two primary guards).

asn has a testing branch here: https://github.com/asn-d6/tor/tree/prop291_testing. This is not ready for merge, be could be adapted into a mergable, backportable thing.

He also has a useful guard log promotion patch here: https://github.com/asn-d6/tor/tree/guard_monitor

Child Tickets

Change History (13)

comment:1 Changed 5 months ago by mikeperry

Here's a list of things we're trying to learn from this testing:

  1. How often does the code decide that one of these two primaries is "down" when it is not?
  2. How often does the code prefer one guard over the other? (They should be split roughly 50/50 with this patch as-is, unless you get unlucky with path restrictions.. does that happen a lot?).
  3. How often do we decide to use guards other than our two primaries with this patch?
  4. What circumstances cause us to use guards other than our two primaries with this patch?
  5. Do we use the same two directory guards as our primary guards?
  6. Do we ever have microdescriptor shortages or 503 directory busy issues with this patch?
  7. What happens when we wander into the uncharted "sampled guard" territory of prop271?
  8. Do our failure modes for the above/other issues ever result in complete downtime for the client? (Can we fix that easily?)
  9. Can the client be induced to spam or otherwise thrash on its guards when it thinks one or both are down/unreachable?
  10. How does the vanguard controller behave with this patch?

I think asn has some of this information already.

comment:2 Changed 5 months ago by asn

Status: newneeds_review

Upstream pull request can be found here: https://github.com/torproject/tor/pull/57
Branch name: bug25843

Please let me know if this behavior is not right for upstream. I thought about it for a bit, and decided that it's a reasonable behavior and probably less arbitrary than the previous one. It's always a bit unclear what users who set NumEntryGuards are thinking, so since we (the devs) are thinking of adopting this behavior, it's probably the right one.

Last edited 5 months ago by asn (previous) (diff)

comment:3 in reply to:  1 Changed 5 months ago by asn

Replying to mikeperry:

Here's a list of things we're trying to learn from this testing:

OK, I've been running the above branch for a week or so. Here are the things I can answer already:


  1. How often does the code decide that one of these two primaries is "down" when it is not?

This is the same behavior as single-guard, so not much has changed here.

Tor doesn't have many false positives here when the network is good, but it can have false negatives [i.e. tor keeps on thinking that a guard is up, but that guard is overheating and sending DESTROYs to every circuit (#25347)].

When the network is bad, it's a totally different situation. See question 3.

  1. How often does the code prefer one guard over the other? (They should be split roughly 50/50 with this patch as-is, unless you get unlucky with path restrictions.. does that happen a lot?).

I think we good here. It's a smartlist_choose() in select_entry_guard_for_circuit().

  1. How often do we decide to use guards other than our two primaries with this patch?
  2. What circumstances cause us to use guards other than our two primaries with this patch?

This is related to question 1. Usually this can happen when the network is bad, and Tor thinks that some guards are down when they are not. There are cases where Tor can end up thinking that the primaries are down, and it will use the guards below the primaries (i.e. enter the wilderness). I tested this manually using iptables and I immediately found #25783. We need to fix this bug and re-test this, to see what else appears.

The iptables command was a simple:
iptables -A OUTPUT -d 202.54.1.22 -j DROP
where 202.54.1.22 was my top primary guard.

  1. Do we use the same two directory guards as our primary guards?

Yes. We only use our two primary guards for directory guards.

  1. Do we ever have microdescriptor shortages or 503 directory busy issues with this patch?

I haven't encountered such a thing yet, but also I haven't looked for it. I haven't encountered #21969 either.

  1. What happens when we wander into the uncharted "sampled guard" territory of prop271?
  2. Do our failure modes for the above/other issues ever result in complete downtime for the client? (Can we fix that easily?)

I haven't done enough testing here, but sometimes things work perfectly, and some other times Tor gets stuck for a bit and then gets unstuck and works fine. In the case of #25783 Tor got stuck for about 6 minutes tho... We really need to look into this more.

  1. Can the client be induced to spam or otherwise thrash on its guards when it thinks one or both are down/unreachable?

This is #25347. Tor will keep on trying to establish circuits to its guards even tho they are overloaded and sending DESTROYs all the time. There is no single best approach to solve that problem.

  1. How does the vanguard controller behave with this patch?

I think asn has some of this information already.

comment:4 Changed 5 months ago by dgoulet

Reviewer: nickm

comment:5 Changed 5 months ago by nickm

Status: needs_reviewneeds_revision

This seems good for testing, but I don't think this is actually the right behavior if we want to make these values adjustable IRL. Instead, I think we should make separate, independent configuration options. That way, if we (or anybody else!) want to experiment with different values, we can actually experiment with the full range of possibilities.

comment:6 in reply to:  5 Changed 5 months ago by asn

Status: needs_revisionmerge_ready

Replying to nickm:

This seems good for testing, but I don't think this is actually the right behavior if we want to make these values adjustable IRL. Instead, I think we should make separate, independent configuration options. That way, if we (or anybody else!) want to experiment with different values, we can actually experiment with the full range of possibilities.

Good point. Please check branch bug25843_v2 which introduces the NumPrimaryGuards torrc option.

comment:7 Changed 5 months ago by asn

Status: merge_readyneeds_review

comment:8 Changed 5 months ago by nickm

Status: needs_reviewneeds_revision

Better! Two fixes here:

1) Should we we allow NumEntryGuards to be greater than NumPrimaryGuards? I think the answer should be no.
2) There should be a manpage entry for this.

comment:9 in reply to:  8 Changed 5 months ago by asn

Replying to nickm:

Better! Two fixes here:

1) Should we we allow NumEntryGuards to be greater than NumPrimaryGuards? I think the answer should be no.
2) There should be a manpage entry for this.

ACK. Please see branch bug25843_v2 which includes a squash commit for the above two comments.

comment:10 Changed 5 months ago by asn

Status: needs_revisionneeds_review

comment:11 Changed 5 months ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Squashed and merged!

comment:12 Changed 5 months ago by mikeperry

Resolution: implemented
Status: closedreopened

Eep. This patch rejects if NumEntryGuards > NumPrimaryGuards. For current Prop#291 testing, we want both to be the same (set to 2).

I will double-check this for other properties (such as using the same dirguards) and push a branch in a bit.

comment:13 Changed 5 months ago by mikeperry

Resolution: fixed
Status: reopenedclosed

Err nevermind. I briefly confused the two. Sorry for the alarm.

Note: See TracTickets for help on using tickets.