Opened 10 years ago

Closed 9 years ago

Last modified 7 years ago

#1206 closed defect (not a bug)

Fluxe3 is not a guard

Reported by: Sebastian Owned by:
Priority: Low Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.2.6-alpha
Severity: Keywords:
Cc: Sebastian, arma, nickm, BarkerJr Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by mikeperry)

Fluxe3 isn't considered a guard (only one of the authorities votes for it).
It has been up for 13 days, before that it had a downtime for ~30 minutes,
and before that it was running for a week. Before that it was offline.

There are a few things that seem to be strange. When arma looked at
what moria thinks about fluxe3 three days ago, moria thought that fluxe3's
uptime was 18 days. It somehow missed the fact that the descriptor changed
to reflect the new uptime.

The other question is, does this influence why Fluxe3 is not a guard? Or are
there other issues?

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (13)

comment:1 Changed 10 years ago by arma

Here's moria1's router-stability entry for fluxe3:

R ED13D1D13C1E57C6A406DD64551D2F905AB99AFF
+MTBF 9759 0.03464 S=2009-12-18 07:23:00
+WFU 9759 149363

So moria1 thinks it's been up since 2009-12-18 (20 days or so), and before
that it was up for (weighted somehow) 9759 seconds out of (weighted somehow
else) 149363 seconds.

So if 20 days is 1728000 seconds, moria1 considers its wfu to be
(1728000+9759) / (1728000+149363) = 1737759/1877363 which is only 92.5% wfu,
so not enough to be a guard.

So I think it's following the algorithm correctly, and the algorithm has a
problem.

comment:2 Changed 10 years ago by BarkerJr

I'm seeing the same issue on three of my relays: BarkerJrNet, BarkerJrCoast2, and BarkerJrCoast3. These relays have uptimes over a month and are Fast and Stable.

comment:3 Changed 10 years ago by BarkerJr

Weirdly enough, when I upped bandwidth to 100KB, it became guard. Then I dropped to 75KB and it dropped the guard flag. Does this mean that the "Fast" flag is incorrectly defined?

comment:4 Changed 10 years ago by arma

Fast is given to the top 7/8 of the relays by bandwidth.

Guard is given to the top 1/2 of the relays by bandwidth, if (and
only if) they also have sufficiently high weighted-fractional-uptime.

See sec 3.3 of dir-spec.txt for the details.

comment:5 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final

So, the right solution here may be to lower the WFU threshold, or increase the decay factor on the weighting so that older time counts even less. Can somebody with an authority tell me what the current wfu thresholds are?

comment:6 Changed 9 years ago by mikeperry

Description: modified (diff)

Actually, I'm now less convinced that the WFU calculation was done wrong in this case.

Based on the WFU numbers, prior to the 20 uptime, the router should have been up for something like 9759/(0.9540) or 81325s (22 hours) in the last 149363/(0.9520) 244691s (14 days).

Sebastian, in your original report you said that before the 20 days of uptime, Fluxe3 was "offline". Might it have been offline for about 2 weeks, maybe with a day of uptime somewhere in there? If so, WFU might be correct, and we just need to think about setting better thresholds.

comment:7 Changed 9 years ago by mikeperry

Thanks for nothing, trac.

The math should read:

"9759/(0.9540) or 81325s (22 hours) in the last 149363/(0.9520) 244691s (14 days)"

and is based on compounding alpha (0.95) every 12 hours.

comment:8 Changed 9 years ago by Sebastian

Yes, I think it was offline for about 2 weeks or so. I don't remember exactly, maybe we can get an answer with metrics?

comment:9 Changed 9 years ago by Sebastian

Hm, it appears a lot more likely that this was not in fact a bug. What is the process on choosing lower thresholds? Is this even the right place to discuss that?

comment:10 in reply to:  5 Changed 9 years ago by arma

Replying to nickm:

So, the right solution here may be to lower the WFU threshold, or increase the decay factor on the weighting so that older time counts even less. Can somebody with an authority tell me what the current wfu thresholds are?

Sep 03 16:50:01.094 [info] Cutoffs: For Stable, 247593 sec uptime, 589401 sec MTBF. For Fast: 20480 bytes/sec. For Guard: WFU 98.000%, time-known 680372 sec, and bandwidth 71680 or 102400 bytes/sec. We have enough stability data.
Sep 03 17:50:01.121 [info] Cutoffs: For Stable, 266189 sec uptime, 635182 sec MTBF. For Fast: 20480 bytes/sec. For Guard: WFU 98.000%, time-known 691200 sec, and bandwidth 76800 or 102400 bytes/sec. We have enough stability data.

moria1's wfu threshold has been 98% for the past weeks. Presumably ever since Mike lowered this constant:

#define WFU_TO_GUARANTEE_GUARD (0.98)

See also

    guard_wfu = median_double(wfus, n_familiar);
  if (guard_wfu > WFU_TO_GUARANTEE_GUARD)
    guard_wfu = WFU_TO_GUARANTEE_GUARD;

So it looks like the median WFU is over 98%, and the guarantee cuts it down to 98%?

comment:11 Changed 9 years ago by arma

Triage: like all the WFU bugs, this one shouldn't block 0.2.2 (alas). We're going to need research and a new proposal, is my guess.

comment:12 Changed 9 years ago by Sebastian

Resolution: Nonenot a bug
Status: newclosed

Closing this as not a bug as per my comment above.

comment:13 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.