While asn and I were investigating the HSDir votes in the consensus for relays without V2Dir flag (no DirPort open) we discovered a strange behavior for faravahar:

network-status-version 3 microdesc
vote-status consensus
consensus-method 18
valid-after 2015-03-24 23:00:00
fresh-until 2015-03-25 00:00:00
valid-until 2015-03-25 02:00:00
voting-delay 300 300

consensus: 2503 HSDir relays
faravahar: 1376 HSDir relays

consensus: 5302 Stable relays
faravahar: 3236 Stable relays

consensus: 2437 Stable and HSDir relays
faravahar: 1309 Stable and HSDir relays

consensus: 66 not Stable but HSDir relays
faravahar: 67 not Stable but HSDir relays

All HSDir relays in the consensus have also the V2Dir flag, because the majority of DAs haven't upgraded to mitigate #14202.

However, faravahar voted the HSDir flag for 520 relays who do not have the V2Dir flag and, obviously, these relays are not in the network agreed consensus. So, we can say faravahar actually voted for 1376 - 520 = 856 HSDir relays from the network agreed (useable by clients) consensus.

(From these 520 non-V2Dir HSDir relays:

  • 23 relays are missing Stable falag.
  • 497 are Stable.)

faravahar missed to vote 1647 HSDir flags
faravahar missed to vote 2066 Stable flags

We cannot say exactly that it missed to vote, since faravahar does not have to agree with the rest of DAs on each and every vote - this is why we have a consensus based on votes / majority. Still, the difference should be smaller than we currently see.

While it is normal for each DA not to vote exactly identical, we see a difference for the HSDir relays of 1647. This is more than 50% of all HSDirs voted by the rest of 8 DAs.

For the HSDir flags missed by faravahar, we thought that it has a very high value for MinUptimeHidServDirectoryV2 and it only votes HSDir flag for relays with very high uptime. A closer look on faravahar HSDir votes eliminates that assumption. For reference:

Fingerpints of relays for which faravahar voted HSDir | uptime at time of writing (24.03.2015):

003000C32D9E16FCCAEFD89336467C01E16FB00D 5 days
00339258B444376593E069201C4A4AA45F95AA87 15 days
0063D0DE32C80691A0AC1A968A8CCF5ABA420E29 57 days
0086EF7F056983D5A0EBB37F36A44CB738B16D97 20 days
0124398EE6783F402A6FC430D0AD110982E503AC 16 days
016DD78B2C3A468DDD48AF63F48D25660F93DAAC 4 days
0192067CFC14F3E3022F99D32FC39016C270AC4C 9 days
0473E3701C9EA2F8367DBAB453B6DC4EE78DEE1B 17 days
0522B7D9FBDFA6FDA81DC6B12A8FA0BA1F7084B4 6 days
C309A31AD772FFDD0805C9FECB6D4748A7CBF684 4 days

5 days later the behavior did not change:
network-status-version 3
vote-status vote
consensus-methods 13 14 15 16 17 18 19 20
published 2015-03-28 21:50:01
valid-after 2015-03-28 22:00:00
fresh-until 2015-03-28 23:00:00
valid-until 2015-03-29 01:00:00

3145 HSDir relays in the consensus
1585 HSDir relays seen by faravahar

5165 Stable relays in the consensus
3214 Stable relays seen by faravahar

And, it is missing just HSDir and Stable flags, because it sees almost all relays in the consensus as running:
running relays in consensus: 6623
running relays seen by faravahar: 6443
The difference is not so big for the running relays as it is for HSDirs and Stables.

comment:1 Changed 4 years ago by arma

My first guess is that it's failing to reach many of these relays on a few reachability tests each -- so they get the Running flag because they rarely fail two in a row, but failing even one is enough to reset faravahar's internal uptime counter for that relay.

(I also wonder if we should make a 'Tor network' component in trac, for tickets like this one.)

comment:2 Changed 4 years ago by s7r

That makes sense. 1 out of 9 DA behaving wrong should not be enough to affect the consensus, but should we leave it like this? I still think this is worth fixing.

comment:3 Changed 4 years ago by arma

Cc: inf0 added

comment:4 Changed 4 years ago by s7r

Component: - Select a componentTorflow
Owner: set to aagbsn

comment:5 Changed 4 years ago by starlight

Cc: starlight.2015q2@… added

comment:6 Changed 17 months ago by teor

Component: Core Tor/TorflowCore Tor/DirAuth
Resolution: worksforme
Severity: Blocker
Status: newclosed

This is apparently working now.

