Opened 7 years ago

Last modified 2 years ago

#7986 needs_revision enhancement

Lengthen the consensus validity interval

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-client, nickm-patch, low-bandwidth, sponsor4 needs-analysis stability
Cc: arma, mikeperry Actual Points:
Parent ID: #2681 Points: medium
Reviewer: Sponsor: Sponsor4

Description

Right now, the 24 hour "build a consensus in this interval or die horribly" time is chosen completely arbitrarily. I know of no reason to not allow 48 or 72 hours.

See comments on #2681 for more discussion of this point. There's some discussion there of choosing an optimal time, but I suggest we just have some clients try 48 or 72 hours and see how it goes.

I'm boldly going with 72.

cc'ing arma because he remembers where the pain points are on the directory info download food chain, and mike because he has felt the pain.

Child Tickets

Change History (41)

comment:1 Changed 7 years ago by nickm

Keywords: 023-backport added
Status: newneeds_review

Implemented in branch "bug7986" in my public repository using 72 hours as the interview. Should I merge this? Marking as 023-backportable.

comment:2 Changed 6 years ago by nickm

Oops. The branch name is "feature7986", apparently.

comment:3 Changed 6 years ago by andrea

Looks like this is just d155db89820468439e1ce6016f3a41cf87bf64e1, which is a trivial change. No doubt it has the intended effect. It sounds like a reasonable idea, but the other suggestion in 2681 of making clients depend on when they received the consensus as well as how old it is would be better, if more work. I say merge it and consider that improvement later.

comment:4 Changed 6 years ago by nickm

Okay. I'll give armadev and mikeperry a little longer to see if they have any feedback here, and if not, merge.

This is a very simple patch; it just raises a constant interval to 72 hours.

comment:5 Changed 6 years ago by asn

I like the idea. I also agree with Andrea about merging this for now and deciding on a better improvement later.

(BTW, in #7241 there are some funky graphs (ticket:7241:comment:5) that say that in a period 72 hours, the Tor network remains more than 95% the same wrt bandwidth weights.)

comment:6 Changed 6 years ago by mikeperry

Status: needs_reviewneeds_revision

There's also a NS_EXPIRY_SLOP define in main.c with a confusing comment that says it needs to be the same as REASONABLY_LIVE_TIME, but complains that the REASONABLY_LIVE_TIME value was "too high" even though they *were* the same.

I also looked at some other defines we may want to consolidate or turn into functions of this value in my proposal:
https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/212-using-old-consensus.txt#l71

It doesn't seem like we need to change any of them yet as they are still all higher than what it seemed like they needed to be if we raise this value, but maybe it might be worth considering cleaning them up now?

comment:7 Changed 6 years ago by nickm

Status: needs_revisionneeds_review

NS_EXPIRY_SLOP looks like all it's used to do is tell us to call router_dir_infor_changed. So that one can change too. I have no idea what I was talking about when I called a value one "Too high".

See feature7986 again -- does that look like what you meant?

comment:8 Changed 6 years ago by arma

I want to see us put this feature in. It would be good for Tor.

That said, I worry about several classes of bugs we're likely to see.

The most important class of bug is "decide we don't want a directory object, throw it out, realize we want it, fetch it, goto step 0". If we have some logic somewhere that decides a relay descriptor isn't useful if it's 36 hours out-of-date, but we have one referenced from our consensus, we're in for some sad times. I vaguely recall that in the past we had exactly this issue: we had a plan to make it so if the descriptor is still referenced by the consensus, it didn't matter if it was more than 24 hours old. We nearly implemented that plan, but ran into problems of the sort in this paragraph and backed out. I don't think we've fixed any of those problems since then.

The less exciting but still relevant class of bug is "don't admit to having a descriptor if it's more than 24 hours old". For example, we have one of those right now with this branch: networkstatus_getinfo_by_purpose() won't list any relays or bridges whose descriptors are more than 24 hours old. What others do we have like that?

So, how to proceed? I think the most important question for me is how Tor can recognize that it's hit one of the "drop and refetch and drop" bugs.

comment:9 Changed 6 years ago by arma

In the interest of forward progress, here's another question we might try to answer: what's our testing plan here? If we rarely hit the "need to go past 24 hours" mark, then changing our behavior in that case will produce untested behavior. See also #4580.

Maybe we should put some time into reproducing #4580 on a testing tor network, and then when we change things from this ticket, we can see what changes in the testing tor network?

comment:10 in reply to:  9 ; Changed 6 years ago by andrea

Replying to arma:

In the interest of forward progress, here's another question we might try to answer: what's our testing plan here? If we rarely hit the "need to go past 24 hours" mark, then changing our behavior in that case will produce untested behavior. See also #4580.

Maybe we should put some time into reproducing #4580 on a testing tor network, and then when we change things from this ticket, we can see what changes in the testing tor network?

Hmm - suffice to set up a client, let it get bootstrapped and grab a consensus, then use iptables to block all outgoing connections on the dirport so it can't download a new consensus?

comment:11 in reply to:  10 Changed 6 years ago by nickm

Replying to andrea:>

Hmm - suffice to set up a client, let it get bootstrapped and grab a consensus, then use iptables to block all outgoing connections on the dirport so it can't download a new consensus?

There are too many "dirports", alas. Also, the failure mode involved making connections, asking for a consensus, finding out that there wasn't one you liked, and then going from there.

Another idea might be setting up a medium chutney testing network or a small shadow network, plus something to generate a little traffic on the clients, letting it bootstrap, then killing the authorities, then seeing what happens.

For more ideas, I guess we could alter the code so that it acts like a node that's seeing an old consensus, or acts like a node that isn't getting a consensus, or some such, somehow. That could be harder, though, and more error prone.

comment:12 Changed 6 years ago by nickm

Keywords: 024-backport added; 023-backport removed
Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final
Priority: majorcritical

This is going to need a heap more analysis and testing before it's close to mergeable. It simply can't go into 0.2.4 as-is at this point without said testing and analysis.

comment:13 Changed 5 years ago by andrea

Keywords: 025-backport added
Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final
Priority: criticalmajor

Deferring per triage with nickm, and adding 025-backport since it makes little enough sense for it to be in 0.2.6 and have 024-backport but not 025.

comment:14 Changed 5 years ago by nickm

Keywords: 026-triaged added; 024-backport 025-backport removed

comment:15 Changed 5 years ago by nickm

Status: needs_reviewneeds_revision

comment:16 Changed 5 years ago by nickm

Keywords: 026-triaged-0 added; 026-triaged removed

comment:17 Changed 5 years ago by nickm

Keywords: nickm-patch added

Add nickm-patch keyword to needs_revision tickets in 0.2.6 where (I think) I wrote the patch, so I know which ones are my responsibility to revise.

comment:18 Changed 5 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.7.x-final

comment:19 Changed 4 years ago by nickm

Keywords: 027-triaged-1-out added

Marking triaged-out items from first round of 0.2.7 triage.

comment:20 Changed 4 years ago by nickm

Keywords: PostFreeze027 added

If we wind up with a nice patch for any of these in the appropriate window, we should sure merge it.

comment:21 Changed 4 years ago by nickm

Keywords: PostFreeze027 removed
Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:22 Changed 4 years ago by nickm

Keywords: 028-triaged added

comment:23 Changed 4 years ago by nickm

Points: medium

comment:24 Changed 4 years ago by nickm

Keywords: pre028-patch added

comment:25 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

It is impossible that we will fix all 188 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "needs_revision" tickets, looking for things to move to 0.2.9.

Note that if the requested revisions happen in advance of the 0.2.8 freeze, they can get considered for 0.2.8.

comment:26 Changed 3 years ago by nickm

Priority: HighMedium

comment:27 Changed 3 years ago by isabela

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

tickets market to be removed from milestone 029

comment:28 Changed 3 years ago by arma

Keywords: low-bandwidth added
Severity: Normal

comment:29 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:30 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:31 Changed 3 years ago by nickm

Keywords: sponsor4 added

Bulk-adding "sponsor4" keyword to items that would appear to reduce low-bw clients' directory bandwidth usage. But we shouldn't build these without measurement/proposals: see #21205 and #21209.

comment:32 Changed 2 years ago by nickm

Sponsor: Sponsor4

Setting "sponsor4" sponsor on all tickets that have the sponsor4 keyword, but have no sponsor set.

comment:33 Changed 2 years ago by nickm

Remove Sponsor4 keyword, now that Sponsor4 is the value of the Sponsor field.

comment:34 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:35 Changed 2 years ago by nickm

Keywords: 027-triaged-in added

comment:36 Changed 2 years ago by nickm

Keywords: 027-triaged-in removed

comment:37 Changed 2 years ago by nickm

Keywords: 027-triaged-1-out removed

comment:38 Changed 2 years ago by nickm

Keywords: 026-triaged-0 removed

comment:39 Changed 2 years ago by nickm

Keywords: 028-triaged removed

comment:40 Changed 2 years ago by nickm

Keywords: pre028-patch removed

comment:41 Changed 2 years ago by nickm

Keywords: needs-analysis stability added
Note: See TracTickets for help on using tickets.