Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#3327 closed defect (fixed)

Overzealous descriptor regeneration bug remains

Reported by: arma Owned by:
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: Falo Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Olaf reports fast relays dropping from the consensus even once they use the #1810 patch.

Rather than trying to debug it over email, a trac entry seems a fine place so we have a record (and so others can participate).

Child Tickets

Change History (19)

comment:1 Changed 8 years ago by arma

Once you're running the #1810 patch, you should be seeing info-severity log lines like:

May 30 16:59:32.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion key
May 30 17:05:20.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion key
May 30 17:05:22.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: config change
May 30 17:08:18.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: ORPort found reachable
May 30 17:08:19.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: DirPort found reachable
May 30 17:39:09.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: rotated onion key
May 31 09:12:59.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: set onion key
Jun 01 03:13:26.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: time for new descriptor

(that one is from moria1)

What does your set of log lines look like, ideally including a period where publication was working and a period where it fell out of the consensus?

comment:2 Changed 8 years ago by Sebastian

Cc: Falo added

Adding Falo to CC per request by rransom

comment:3 Changed 8 years ago by Falo

1810 patch is running since yesterday on blutmagie. No "Decided to publish new relay descriptor" logging occured so far. Keep you posted...

comment:4 Changed 8 years ago by Sebastian

this might be related to karsten's trouble. We should see if we have any indication of this on non-dirauths

comment:5 Changed 8 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.2.2.x-final

Moving this to 0.2.2.x-final, since it seems to look important. We can kick it out again if it isn't.

comment:6 Changed 8 years ago by Falo

right now at 22:58 UTC+2 all four blutmagie routers are flagged running on the Tor projects's consensus-health web site, whereas only blutmagie3 and blutmagie4 are flagged running on my Tor node dedicated for feeding the tns site. Blutmagie and blutmagie2 are lacking the running flag. There's no indication these routers decided to publish new relay descriptor recently.

torstatus:~/tmp# telnet localhost 9051

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
authenticate "XXX-censored-XXX"

250 OK
getinfo ns/name/blutmagie
250+ns/name/blutmagie=
r blutmagie !YpexOmh7UhpZxr15GIolAewDoGU bbf9bJUi0nIosJVoE4M52OI5sZk 2011-06-05 !01:15:17 192.251.226.206 443 80
s Exit Fast Guard HSDir Named Stable V2Dir Valid
w Bandwidth=47100
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
getinfo ns/name/blutmagie2
250+ns/name/blutmagie2=
r blutmagie2 Z+yEN22cTEZ9zoYhqsoQkWC1Jk4 m49nalEUoKBjRfHcc4ScLq2bXdM 2011-06-05 !01:15:18 192.251.226.206 8080 707
s Exit Fast Guard HSDir Named Stable V2Dir Valid
w Bandwidth=65500
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
getinfo ns/name/blutmagie3
250+ns/name/blutmagie3=
r blutmagie3 ZsqH4WTxz86MO7XAlSF6KFeLi68 +ZmFcRXlziVExGfe2xB/gMeOtr4 2011-06-05 !01:15:18 192.251.226.205 443 80
s Exit Fast Guard HSDir Named Running Stable V2Dir Valid
w Bandwidth=36400
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK
getinfo ns/name/blutmagie4
250+ns/name/blutmagie4=
r blutmagie4 e2mNMn8WlVkECP7ZXN7hVld00TY v9Ah1Xy8dmuIigYgqaBkU8MINR0 2011-06-05 !01:15:19 192.251.226.205 22 21
s Exit Fast Guard HSDir Named Running Stable V2Dir Valid
w Bandwidth=40400
p reject 25,119,135-139,445,465,563,587,1214,4661-4666,6346-6429,6660-6999
.
250 OK

comment:7 Changed 8 years ago by Falo

I upgraded Tor feeding torstatus.blutmagie.de from tor-0.2.2.24-alpha to tor-0.2.3.1-alpha. Let's see if this changes something. Today tor-0.2.2.24-alpha still reported single routers flagged "running" in Tor Metrics Portal's Consensus Health not running.

comment:8 Changed 8 years ago by Sebastian

0.2.3.1 also has the bug. The recently released 0.2.2.28-beta should not have the most common case

comment:9 Changed 8 years ago by Sebastian

oh wait, I misunderstood your comment. When upgrading to 0.2.3.2 or 0.2.2.26 or later, remember to add the FetchV2Networkstatus config option

comment:10 Changed 8 years ago by arma

Any news here?

moria1's output looks great, but I would expect it to.

comment:11 Changed 8 years ago by Falo

After upgrading my Tor network status box from 0.2.2.24-alpha to 0.2.3.1-alpha two weeks ago I never saw my routers missing the running flag again. Thus the issue seems to be solved. It looks like it was not a problem with the routers blutmagie1-4 running 0.2.3.1-alpha.

Pls close this ticket.

comment:12 Changed 8 years ago by Sebastian

Well isn't that fun. the #1810 fix is not in 0.2.3.1, so we still have some bug left in 0.2.2.x that probably got miraculously resolved in 0.2.3.1.

comment:13 Changed 8 years ago by nickm

rransom suggests that a #535 solution is needed here.

comment:14 Changed 8 years ago by nickm

Status: newneeds_review

See branch bug3327 in my public repo. If we like it upon review, I propose that we test it out first in 0.2.3.x, and merge it into 0.2.2.x only when it's had some testing in the wild.

comment:15 Changed 8 years ago by arma

bug3327 looks reasonable. I'd suggest making it a 'major' feature, and changing 'our' to 'their' in the changes file. And maybe s/Routers/Relays/ while we're at it.

Note that we're going to see more publication attempts, and thus more descriptors in the wild, in some cases. The first case that comes to mind is a relay that thinks it's reachable but a quorum of directory authorities can't reach it. I expect we'll see 12 times as many descriptors for those relays. Clients won't see them, so it's not so bad, but karsten's metrics datasets will bloat. I wonder if relays that cache v2 data will fetch them too, if tor26 thinks they're reachable.

It sure will be tricky to figure out if this patch is working right, since it only kicks in for the case that we don't think exists much right now. I wonder if we might tell the directory authority our reason for generating the new descriptor, e.g. as an http header when we post? That would let us keep a better eye on whether (and for whom) this patch is seeing action.

I agree that we'll be happier putting this feature into 0.2.3.

comment:16 Changed 8 years ago by nickm

Uploaded a new bug3327 branch with the changes you suggested. What do you think now?

comment:17 Changed 8 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Okay, I've cleaned it up, squashed it, tested it a little, and merged it.

comment:18 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:19 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.