Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#2649 closed defect (fixed)

Make the interval to become a hsdir a little longer

Reported by: Sebastian Owned by: rransom
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-auth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Currently, a relay needs to be up for at least 24h to be considered a HSDir. I think we should change this to 24h and 30 minutes or 25 hours, because this will give the directory authorities a little bit more time to notice a relay has disappeared before voting HSDir for it. This helps because a lot of our relays are on connections that disconnect once every 24hours exactly (which is the reason for the 24h interval in the first place), and it might help to ensure better reachability of HSDir-duties performing relays.

Does this change need a proposal or is discussion here fine?

Child Tickets

Attachments (1)

hsdir-set-instability-graph.pdf (167.3 KB) - added by rransom 8 years ago.
graph of HSDir set instability

Download all attachments as: .zip

Change History (22)

comment:1 Changed 8 years ago by arma

Component: Tor RelayTor Directory Authority

Ok to close as a duplicate of #2709? The patch there accomplishes the goals here too.

comment:2 Changed 8 years ago by Sebastian

No it doesn't, why would it?

comment:3 Changed 8 years ago by arma

You're right, it's a separate issue. Never mind.

This ticket is related to #2716, in that you might want more than 30 or 60 minutes grace period if we don't resolve #2716.

comment:4 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-final

comment:5 in reply to:  description Changed 8 years ago by arma

Replying to Sebastian:

Currently, a relay needs to be up for at least 24h to be considered a HSDir. I think we should change this to 24h and 30 minutes or 25 hours, because this will give the directory authorities a little bit more time to notice a relay has disappeared before voting HSDir for it. This helps because a lot of our relays are on connections that disconnect once every 24hours exactly (which is the reason for the 24h interval in the first place), and it might help to ensure better reachability of HSDir-duties performing relays.

Sounds like a fine plan. Let me know when you have a patch.

Ideally we would get some feedback from our analysis project (what I'm starting to call the part of metrics that looks at data and tells us answers about what's going on) about how quickly nodes that have the HSDir flag disappear, to prove that this change is needed (and to prove that the new number we've picked is a good one). Or we could just guess a good number and switch to it.

Does this change need a proposal or is discussion here fine?

comment:6 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: 0.2.2.x-final

comment:7 Changed 8 years ago by rransom

<rransom> #2649 does need to go into the next 0.2.2.x-alpha.
<nickm> ...the one about changing the required uptime for being HSDir?
<rransom> nickm: Yes.
<nickm> rransom: ok. why?
<nickm> (or say why on the ticket)
<rransom> nickm: The set of routers with the HSDir flag is not currently stable enough, and the only way to test a new criterion for the HSDir flag is to actually use it on the live network, and I assume that requires that it be put into a 0.2.2.x-alpha release so the non-developer-operated DAs will use it.
<nickm> rransom: fair enough; changed the milestone

comment:8 Changed 8 years ago by arma

Priority: normalmajor

comment:9 Changed 8 years ago by rransom

Owner: set to rransom
Status: newassigned

comment:10 Changed 8 years ago by rransom

Status: assignedneeds_review

See bug2649 ( ssh://mob@repo.or.cz/srv/git/tor/rransom.git bug2649 ) for a branch that (a) allows DAs to not vote on the HSDir flag (and makes them default to not voting on it), and (b) increases the default minimum uptime from 24 hours to 25 hours.

The reason to make DAs default to not voting on the HSDir flag is that I doubt that 25 hours is high enough to keep the HSDir set stable, and I don't know what minimum uptime is high enough, so we will need to experiment for a while, and that is much easier if the non-developer-operated DAs don't need to be updated and/or reconfigured for every test of a new value.

comment:11 Changed 8 years ago by Sebastian

Do we actually have good statistics on hsdir stability? If so, where? Without those already in place we shouldn't start experimenting. Also, we should figure out what super-increased stability entails - one goal of the design is that the position in the ring shifts slowly.

The first patch sets a bad default imo. If we want to do it at all (I'd prefer we don't) then it should be _enabled_ by default, not disabled. We shouldn't add more and more options so subset of the dirauths gets to decide something and the others don't.

comment:12 Changed 8 years ago by nickm

I agree with Sebastian. It's okay to ask authority ops to disable the thing for now for testing, but it's kind of iffy IMO to disable it by default. I doubt whether all of the more responsive relay ops would even remember to turn this on.

Other than that, this looks fine... except that set_routerstatus_from_routerinfo is becoming one of those functions with multiple flag arguments. That's starting to get error-prone. If we need to add any more flags, we should change it to take an unsigned bitfield.

comment:13 in reply to:  12 Changed 8 years ago by rransom

Replying to nickm:

I agree with Sebastian. It's okay to ask authority ops to disable the thing for now for testing, but it's kind of iffy IMO to disable it by default. I doubt whether all of the more responsive relay ops would even remember to turn this on.

See bug2649 ( git://git.torproject.org/rransom/tor.git bug2649 ) for a fixup commit; remember to run git rebase -i --autosquash 09d7af7789d1b5cd1fdad59fc7eafa7748b4bb57 before merging.

comment:14 Changed 8 years ago by Sebastian

That's missing the manpage fix. i still think we should only apply the 25hour patch.

Changed 8 years ago by rransom

graph of HSDir set instability

comment:15 Changed 8 years ago by rransom

See hsdir-set-instability-graph.pdf for a graph of (size of set symmetric difference between HSDir sets in consensus N and consensus N+i)/(size of HSDir set in consensus N), for i = 1..4. (I'm counting relays entering the HSDir set as well as relays leaving the set because both events make an HSDir relay unavailable to hidden services and clients.)

The ratio shown in the graph is an estimate of the probability that an HS will be unable to deliver a single copy of its descriptor to a client due to the HS, its client, and the HSDir relay responsible for that copy having different consensuses; my understanding is that clients (both HS clients and HS servers) routinely have consensuses out of date by two or three hours, and sometimes four hours. We can probably assume that the disruptions are uniformly distributed around the HSDir ring, in which case the probability that an HS will be entirely unavailable to a client for a given hour due to HSDir-set instability is roughly one-sixth the three-hour probability shown in the graph for that hour.

The scripts used to generate this graph are currently in task-2649 ( git://git.torproject.org/rransom/metrics-tasks.git task-2649 ).

comment:16 Changed 8 years ago by arma

1/6 of what looks to be generally under 10% seems like an ok starting point.

If we wanted to get it lower, how should we decide on what is our desired stability?

comment:17 Changed 8 years ago by nickm

I'm for squashing and merging the patch series, then opening another ticket to figure out the best HSDir value.

I've put a squashed and rebased version in bug2649_squashed in my public repo. Please have a close look: the squash and rebase was nontrivial.

comment:18 in reply to:  17 Changed 8 years ago by rransom

Replying to nickm:

I'm for squashing and merging the patch series, then opening another ticket to figure out the best HSDir value.

I've put a squashed and rebased version in bug2649_squashed in my public repo. Please have a close look: the squash and rebase was nontrivial.

Looks good.

comment:19 Changed 8 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Okay, merging this one onto 0.2.2 and master. Thanks!

comment:20 Changed 7 years ago by nickm

Keywords: tor-auth added

comment:21 Changed 7 years ago by nickm

Component: Tor Directory AuthorityTor
Note: See TracTickets for help on using tickets.