Make the interval to become a hsdir a little longer

changed milestone to %Tor: 0.2.2.x-final

added component::core tor/tor milestone::Tor: 0.2.2.x-final owner::rransom priority::high resolution::fixed status::closed tor-auth type::defect labels

Ok to close as a duplicate of #2709 (moved)? The patch there accomplishes the goals here too.

Trac:
Component: Tor Relay to Tor Directory Authority

No it doesn't, why would it?

You're right, it's a separate issue. Never mind.

This ticket is related to #2716 (moved), in that you might want more than 30 or 60 minutes grace period if we don't resolve #2716 (moved).

Trac:
Milestone: N/A to Tor: 0.2.3.x-final

Replying to Sebastian:

Currently, a relay needs to be up for at least 24h to be considered a HSDir. I think we should change this to 24h and 30 minutes or 25 hours, because this will give the directory authorities a little bit more time to notice a relay has disappeared before voting HSDir for it. This helps because a lot of our relays are on connections that disconnect once every 24hours exactly (which is the reason for the 24h interval in the first place), and it might help to ensure better reachability of HSDir-duties performing relays.

Sounds like a fine plan. Let me know when you have a patch.

Ideally we would get some feedback from our analysis project (what I'm starting to call the part of metrics that looks at data and tells us answers about what's going on) about how quickly nodes that have the HSDir flag disappear, to prove that this change is needed (and to prove that the new number we've picked is a good one). Or we could just guess a good number and switch to it.

Does this change need a proposal or is discussion here fine?

Trac:
Milestone: Tor: 0.2.3.x-final to Tor: 0.2.2.x-final

#2649 (moved) does need to go into the next 0.2.2.x-alpha. ...the one about changing the required uptime for being HSDir? nickm: Yes. rransom: ok. why? (or say why on the ticket) nickm: The set of routers with the HSDir flag is not currently stable enough, and the only way to test a new criterion for the HSDir flag is to actually use it on the live network, and I assume that requires that it be put into a 0.2.2.x-alpha release so the non-developer-operated DAs will use it. rransom: fair enough; changed the milestone

Trac:
Priority: normal to major

Trac:
Status: new to assigned
Owner: N/A to rransom

See bug2649 ( !ssh://mob@repo.or.cz/srv/git/tor/rransom.git bug2649 ) for a branch that (a) allows DAs to not vote on the HSDir flag (and makes them default to not voting on it), and (b) increases the default minimum uptime from 24 hours to 25 hours.

The reason to make DAs default to not voting on the HSDir flag is that I doubt that 25 hours is high enough to keep the HSDir set stable, and I don't know what minimum uptime is high enough, so we will need to experiment for a while, and that is much easier if the non-developer-operated DAs don't need to be updated and/or reconfigured for every test of a new value.

Trac:
Status: assigned to needs_review

Do we actually have good statistics on hsdir stability? If so, where? Without those already in place we shouldn't start experimenting. Also, we should figure out what super-increased stability entails - one goal of the design is that the position in the ring shifts slowly.

The first patch sets a bad default imo. If we want to do it at all (I'd prefer we don't) then it should be enabled by default, not disabled. We shouldn't add more and more options so subset of the dirauths gets to decide something and the others don't.

I agree with Sebastian. It's okay to ask authority ops to disable the thing for now for testing, but it's kind of iffy IMO to disable it by default. I doubt whether all of the more responsive relay ops would even remember to turn this on.

Other than that, this looks fine... except that set_routerstatus_from_routerinfo is becoming one of those functions with multiple flag arguments. That's starting to get error-prone. If we need to add any more flags, we should change it to take an unsigned bitfield.

Replying to nickm:

I agree with Sebastian. It's okay to ask authority ops to disable the thing for now for testing, but it's kind of iffy IMO to disable it by default. I doubt whether all of the more responsive relay ops would even remember to turn this on.

See bug2649 ( !git://git.torproject.org/rransom/tor.git bug2649 ) for a fixup commit; remember to run git rebase -i --autosquash 09d7af7789d1b5cd1fdad59fc7eafa7748b4bb57 before merging.

That's missing the manpage fix. i still think we should only apply the 25hour patch.

Trac:
hsdir-set-instability-graph.pdf

graph of HSDir set instability

See [hsdir-set-instability-graph.pdf hsdir-set-instability-graph.pdf] for a graph of (size of set symmetric difference between HSDir sets in consensus N and consensus N+i)/(size of HSDir set in consensus N), for i = 1..4. (I'm counting relays entering the HSDir set as well as relays leaving the set because both events make an HSDir relay unavailable to hidden services and clients.)

The ratio shown in the graph is an estimate of the probability that an HS will be unable to deliver a single copy of its descriptor to a client due to the HS, its client, and the HSDir relay responsible for that copy having different consensuses; my understanding is that clients (both HS clients and HS servers) routinely have consensuses out of date by two or three hours, and sometimes four hours. We can probably assume that the disruptions are uniformly distributed around the HSDir ring, in which case the probability that an HS will be entirely unavailable to a client for a given hour due to HSDir-set instability is roughly one-sixth the three-hour probability shown in the graph for that hour.

The scripts used to generate this graph are currently in task-2649 ( git://git.torproject.org/rransom/metrics-tasks.git task-2649 ).

1/6 of what looks to be generally under 10% seems like an ok starting point.

If we wanted to get it lower, how should we decide on what is our desired stability?

I'm for squashing and merging the patch series, then opening another ticket to figure out the best HSDir value.

I've put a squashed and rebased version in bug2649_squashed in my public repo. Please have a close look: the squash and rebase was nontrivial.

Replying to nickm:

I'm for squashing and merging the patch series, then opening another ticket to figure out the best HSDir value.

I've put a squashed and rebased version in bug2649_squashed in my public repo. Please have a close look: the squash and rebase was nontrivial.

Looks good.

Okay, merging this one onto 0.2.2 and master. Thanks!

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

Trac:
Keywords: N/A deleted, tor-auth added

Trac:
Component: Tor Directory Authority to Tor

closed

moved to tpo/core/tor#2649 (closed)

Make the interval to become a hsdir a little longer

Child items ...

Activity