Opened 6 years ago

Closed 6 years ago

Last modified 18 months ago

#8683 closed defect (fixed)

moria1 isn't voting for Fast flag when it should

Reported by: arma Owned by:
Priority: Immediate Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-auth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

moria1 running master (and thus #8273 and #8435) is voting weirdly about its flags -- particularly Fast and Exit.

For example, this one really ought to get Fast:

r Ferrari458 ABwTs6Vacbl3ymXshVOdecZTo/w r5fmkGFLqpZj8tHabjbPP5IidFo 2013-04-11 03:37:00 68.38.171.200 9001 9030
s HSDir Running V2Dir Valid
v Tor 0.2.3.25
w Bandwidth=605 Measured=1120
p reject 1-65535
m 8,9,10,11,12,13,14,15,16,17 sha256=oRdjVNraRMF9pWQJzl8UxZdfNBoHSWF0GlYWailMsBU

And this one really ought to get Exit:

r sumkledi ABPSI4nNUNC3hKPkBhyzHozozrU S6rBWKQIhhXr/+EVxRX/DT9AB8I 2013-04-11 04:01:46 178.218.213.229 80 0
s Running Valid
v Tor 0.2.2.35
w Bandwidth=51 Measured=20
p accept 80,443
m 8,9,10,11,12,13,14,15,16,17 sha256=9K0eJJ5HKhiAianDAkw+0Zs8WFIUDE9D6p/VY3w0WOE

Child Tickets

Change History (21)

comment:1 Changed 6 years ago by karsten

In f93f7e3, we start requiring relays to have measured bandwidth in dirserv.c's dirserv_compute_performance_thresholds(). However, assigning the Exit flag shouldn't depend on having the relay's bandwidth. See dirserv.c line 1971. That may explain why turtles and moria1 assign so few Exit flags to relays. It doesn't explain the rest of this bug though.

comment:2 Changed 6 years ago by nickm

Losing Exit seems just plain weird. Looking at dirserv.c, we seem to set rs->is_exit from node->is_exit, and we set node->is_exit as:

      node->is_exit = (!router_exit_policy_rejects_all(ri) &&
                       exit_policy_is_general_exit(ri->exit_policy));

I don't know what could have gone wrong with that.

On the Fast issue: we set is_fast as:

  rs->is_fast = node->is_fast =
    router_is_active(ri, node, now) &&
    !dirserv_thinks_router_is_unreliable(now, ri, 0, 1);

And dirserv_thinks_router_is_unreliable(ri)'s return value is based on comparing (fast_bandwidth, which is set based on dirserv_get_credible_bandwidth()) to dirserv_get_bandwidth_for_router(ri). Is that correct?

comment:3 in reply to:  2 Changed 6 years ago by arma

Replying to nickm:

Losing Exit seems just plain weird. Looking at dirserv.c, we seem to set rs->is_exit from node->is_exit, and we set node->is_exit as:

      node->is_exit = (!router_exit_policy_rejects_all(ri) &&
                       exit_policy_is_general_exit(ri->exit_policy));

I don't know what could have gone wrong with that.

We only set it if router_counts_toward_thresholds() is true. That function demands (have_mbw
!require_mbw).

comment:4 Changed 6 years ago by nickm

Hm. What fraction of nodes _do_ have measured bandwidth? Well, both of the above have Measured= lines in them, so I don't think that (have_mbw
!require_mbw) can be true.

comment:5 Changed 6 years ago by nickm

I have some more or less untested stuff in branch bug8683_ideas. Needs review and consideration.

comment:6 Changed 6 years ago by nickm

Keywords: tor-auth added

comment:7 in reply to:  5 ; Changed 6 years ago by arma

Replying to nickm:

I have some more or less untested stuff in branch bug8683_ideas. Needs review and consideration.

Looks like we're changing the meaning of fast_bandwidth to be in kilobytes rather than bytes?

Unexpected side effects:

  • The FastFlagMinThreshold consensus param will cause current clients to freak out if you set it to 4 but the clients think 4096 is the minimum.
  • It changes dirserv_get_flag_thresholds_line(), so Karsten's scripts to track thresholds will all need to check what version generated the line and handle it there.

comment:8 in reply to:  7 Changed 6 years ago by nickm

Replying to arma:

Replying to nickm:

I have some more or less untested stuff in branch bug8683_ideas. Needs review and consideration.

Looks like we're changing the meaning of fast_bandwidth to be in kilobytes rather than bytes?

We already did that as part of #8273; we just didn't do it consistently.

I think the alternative is worse, but I could be wrong.

Unexpected side effects:

  • The FastFlagMinThreshold consensus param will cause current clients to freak out if you set it to 4 but the clients think 4096 is the minimum.

Not really; were' changing the minimum to 4, after all, and I think clients don't even look at that value. The worst that will happen is that authorities that don't upgrade will give a warning. But authorities should upgrade. Or we could say that these thresholds are in bytes, and divide them by 1000. That's cool too.

  • It changes dirserv_get_flag_thresholds_line(), so Karsten's scripts to track thresholds will all need to check what version generated the line and handle it there.

For that, adding the appropriate multiplies sounds like the right thing to do.

comment:9 Changed 6 years ago by nickm

I have added a branch "bug8683_kb_rename" to do the kb renaming you suggested on top of bug8683_ideas. It's pretty big, though...

commit d1a9f04b4b629b8c3e88373adf995f752edcd2f5
Author: Nick Mathewson <nickm@torproject.org>
Date:   Thu Apr 11 11:43:40 2013 -0400

    Rename all fields which measure bw in kb to end with _kb

 src/or/dirserv.c     | 182 +++++++++++++++++++++++++--------------------------
 src/or/dirserv.h     |   4 +-
 src/or/dirvote.c     |  42 ++++++------
 src/or/dirvote.h     |   2 +-
 src/or/or.h          |   6 +-
 src/or/routerlist.c  |   4 +-
 src/or/routerparse.c |  35 +++++-----
 src/test/test_dir.c  | 104 ++++++++++++++---------------
 8 files changed, 191 insertions(+), 188 deletions(-)

comment:10 Changed 6 years ago by nickm

Status: newneeds_review

That patch also restores the dirserv_get_flag_thresholds_line output, and the semantics of FastFlag{Min,Max}Threshold.

comment:11 Changed 6 years ago by arma

moria1 is running this new branch. At first glance it looks like it works.

I notice that the Exit flag is now not assigned unless you have the Running flag. I think that might be different behavior from before (and doesn't make much sense to me -- but then it's not actually a bug until somebody does #8685).

comment:12 Changed 6 years ago by arma

Here's an example scenario to consider: what if the other authorities vote Running for this guy, but we don't? Do we want to vote against his Exit flag in this case?

Same with Fast -- I'd say we should vote as reflects our thresholds and definitions, and not take away flags just because we didn't find the relay reachable.

Unless I misunderstand how the new decisions are made?

comment:13 Changed 6 years ago by nickm

Are you sure we used to vote for the Exit flag on non-Running routers? Looking at 0.2.3, I see that we do:

    if (ri && router_is_active(ri, node, now)) {
      // ...
      node->is_exit = (!router_exit_policy_rejects_all(ri) &&
                       exit_policy_is_general_exit(ri->exit_policy));

And router_is_active returned 0 if the node's is_running flag is not set.

comment:14 Changed 6 years ago by nickm

(I just updated the branch again to remove a couple of no-longer-used computed values)

comment:15 in reply to:  13 Changed 6 years ago by arma

Replying to nickm:

Are you sure we used to vote for the Exit flag on non-Running routers? Looking at 0.2.3, I see that we do:

    if (ri && router_is_active(ri, node, now)) {
      // ...
      node->is_exit = (!router_exit_policy_rejects_all(ri) &&
                       exit_policy_is_general_exit(ri->exit_policy));

And router_is_active returned 0 if the node's is_running flag is not set.

tor26 this hour (running 0.2.4.11-alpha) voted this about sumkledi:

r sumkledi ABPSI4nNUNC3hKPkBhyzHozozrU S6rBWKQIhhXr/+EVxRX/DT9AB8I 2013-04-11 04:01:46 178.218.213.229 80 0
s Exit Named Valid
v Tor 0.2.2.35
w Bandwidth=51
p accept 80,443

comment:16 Changed 6 years ago by nickm

Huh. Where did my analysis go wrong?

comment:17 Changed 6 years ago by nickm

(I can't see how that can happen in 0.2.4.11-alpha unless vote_on_reachability is false, in which case nobody would get Running.)

comment:18 Changed 6 years ago by arma

Fun!

Here's my theory: we set node->is_exit while router_is_active is true, and once router_is_active stops being true, we never change our opinion about is_exit.

You would think that there would then be a subtle bug where the relay changes its exit policy while staying unreachable, and then we'd mistakenly keep voting Exit. But I think when the exit policy changes, the node_t gets replaced with the new one, which starts out without the flags.

There *is* a different sort of subtle bug, where if we've never found a relay reachable, we don't vote the Exit flag for it, even if its exit policy warrants it. That happens for example when the directory authority starts up.

I just checked, and now that moria1 has been running for a while, it is in fact voting Exit-but-not-Running on some relays. So it looks like we have retained the old behavior. (I'm not sure if that's counter to the goals of #8435, since I don't know what those goals were.)

comment:19 Changed 6 years ago by nickm

Okay, so that part isn't a regression. I say we merge what we have here (branch "bug8683_kb_rename"), and open tickets for whatever else you think should get fixed.

comment:20 Changed 6 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Okay, rebased and merged. Please open any remaining issues as new tickets.

comment:21 Changed 18 months ago by teor

Severity: Normal

Set all tickets without a severity to "Normal"

Note: See TracTickets for help on using tickets.