Opened 9 years ago

Closed 7 years ago

Last modified 7 years ago

#2479 closed defect (fixed)

be more lenient about changed descriptors

Reported by: arma Owned by:
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-auth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We have a series of bugs where relays publish a descriptor within 12 hours of their last descriptor, but the authorities drop it because it's not different "enough" from the last one and it's too close to the last one.

The original goal of this idea was to a) reduce the number of new descriptors authorities accept (and thus have to store) and b) reduce the total number of descriptors that clients and mirrors fetch. It's a defense against bugs where relays publish a new descriptor every minute.

Now that we're putting out one consensus per hour, we're doing better at the total damage that can be caused by 'b'.

There are broader-scale design changes that would help here, and we've had a trac entry open for years about how relays should recognize that they're not in the consensus, or recognize when their publish failed, and republish sooner.

In the mean time, I think we should change some of the parameters to make the problem less painful.

The first is

/** Any changes in a router descriptor's publication time larger than this are
 * automatically non-cosmetic. */
#define ROUTER_MAX_COSMETIC_TIME_DIFFERENCE (12*60*60)

Let's change that to 1 or 2 hours. That will reduce the number of times we encounter this problem.

The second proposed parameter change is

/** How old can a router get before we (as a server) will no longer
 * consider it live? In seconds. */
#define ROUTER_MAX_AGE_TO_PUBLISH (60*60*20)

I'd like to move that to 23 or 24 hours.

Ideally it should be in the 48 hour or longer range, since if a relay is still getting the Running flag assigned to it, let's keep using it, and if it's not, no harm in voting about it. But I worry that clients will fetch a 36 hour old descriptor, drop it because it's old, and get into a cycle. (I *think* we made it so they wouldn't get into such a cycle, but what do I know.)

Child Tickets

Change History (9)

comment:1 Changed 9 years ago by arma

I wonder if there are cycles where moria1 puts a cosmetically similar descriptor into its vote, other authorities fetch it, discard it because it's cosmetically similar, etc.

I'm going to change moria1 to these new parameters and see what happens from it.

comment:2 Changed 8 years ago by arma

I've been running moria1 for weeks with the new params, and everything seems peachy.

The main difference is that http://metrics.torproject.org/consensus-health.html lists

tor26  	        3320 total  	2546 Running
ides 	        3319 total 	2574 Running
maatuska 	3311 total 	2532 Running
dannenberg 	3318 total 	2534 Running
urras 	        3320 total 	2522 Running
moria1 	        3492 total 	2562 Running
dizum 	        3321 total 	2535 Running
gabelmoo 	3311 total 	2524 Running
consensus 	2538 total 	2538 Running

That is, moria1 is voting about 3500 relays when most people are voting about 3300. moria1 produces a few more Running votes than the others, but not many more.

I haven't done the analysis to see if it's voting Running for nodes that are in the "last couple of hours before they're going to fall out of moria1's vote too" category. We could put this on the bottom end of Karsten's todo list to check, or we could just decide this change is potentially helpful and not harmful and run with it.

Specifically, I'm running with these two params changed:

 /** How old can a router get before we (as a server) will no longer
  * consider it live? In seconds. */
-#define ROUTER_MAX_AGE_TO_PUBLISH (60*60*20)
+#define ROUTER_MAX_AGE_TO_PUBLISH (60*60*24)
 /** Any changes in a router descriptor's publication time larger than this are
  * automatically non-cosmetic. */
-#define ROUTER_MAX_COSMETIC_TIME_DIFFERENCE (12*60*60)
+#define ROUTER_MAX_COSMETIC_TIME_DIFFERENCE (60*60)

comment:3 Changed 8 years ago by arma

Milestone: Tor: 0.2.2.x-finalTor: 0.2.3.x-final
Priority: normalmajor

Looking at those two params again, I would be more confident trying to get the second one (ROUTER_MAX_COSMETIC_TIME_DIFFERENCE) into 0.2.2, than trying to get the first one in.

That said, putting these in master is fine too since they're authority parameters. Changing milestone.

comment:4 Changed 8 years ago by arma

Component: Tor RelayTor Directory Authority

comment:5 Changed 8 years ago by nickm

I'm still okay with doing these for 0.2.3.x, assuming they only affect authorities.

comment:6 Changed 7 years ago by nickm

Status: newneeds_review

So, incrementing ROUTER_MAX_COSMETIC_TIME_DIFFERENCE would potentially cause us to replace descriptors every hour for routers with one of the bugs that would make them publish way too frequently. So I'm going to be a coward and reduce it to 3h rather than 1h. If it really has to be 1h and/or we have reason to think that won't matter, please say.

See branch bug2497 in my public repo.

Arma, do you still think is a good idea? Do we still think it would make a difference like it did 13 months ago, or did we fix enough of the underlying bugs that this doesn't matter any more?

comment:7 Changed 7 years ago by arma

Resolution: fixed
Status: needs_reviewclosed

We should still do it. Even if we resolved all the bugs like that for now, there will be new ones someday.

Done.

comment:8 Changed 7 years ago by nickm

Keywords: tor-auth added

comment:9 Changed 7 years ago by nickm

Component: Tor Directory AuthorityTor
Note: See TracTickets for help on using tickets.