The right time to put fallback-consensus in is before make dist, not after.
I think that the effects of the desired "voodoo" should be to to have make dist include it if it's present
and nonempty, and to have dist-rpm do the same.
How to achieve this voodoo does not immediately leap out at me; more effort may help.
This could be as simple as adding a %{_datadir}/fallback-consensus to the %files section of the spec.
But maybe it needs to be made conditional somehow so that the rpm build doesn't fail if the fallback-consensus
is absent. That part, I don't know how to do.
Also, maybe we should be installing this into /usr/share/tor rather than into /usr/share?
Well, the goal was for new clients to hit a long-lived directory cache rather than hitting an authority per se, and it's a good idea to do that through some means. The "FallbackConsensus" means doesn't seem like it's necessarily a great one, though. Perhaps we should remove it go for something else.
The main remaining goal is to give Tor clients a lot more than 8 IP addresses to bootstrap into the network.
We haven't moved forward on the fallback consensus design because we would use it in place of the 8 directory authorities when trying to bootstrap, and if half the relays in it are down, we spend a lot of time timing out before we find one that works.
I think a good compromise approach would be to only fall back to the fallback consensus when the 8 known bootstrap points have failed. It will mean it takes longer for your Tor to bootstrap in the case where before it wouldn't have worked at all, but it won't slow things down if they're going to work normally.
Another approach would just be to stick a pile of IP addresses in each release, and run some metrics function over the recent directory archives to spit out the 500 addresses that seem most promising. But formatting things as a consensus seems like it should save some coding, specifying, etc.
In the distant future we could add some sort of flag like IsLikelyToStillBeInThisLocationLater (I think there's a proposal for that), but if the only use for that flag is having clients read it from disk in their fallback consensus, it's kind of weird to tell it to all clients all the time.
Yet another approach would be having directory authorities able to vote on a special consensus that includes only relays they think will be around for a while. They would vote on it and write it to disk every day or so, and we could just package the most recent version of that file in each release.
Yet yet another approach would be to have the metrics script generate something that looks like a consensus but has no signatures, and then we'd ship that and the only Tor code changes would be having Tor not worry about signatures when handling that file.
This is something I'd be excited to see designed and deployed in the 0.2.3 timeframe.
Most of the infrastructure is in place -- we just need to not use it unless the connections to the 'main' bootstrap mechanisms fail. See also #2878 (moved) for an interacting issue.
Trac: Milestone: Tor: unspecified to Tor: 0.2.3.x-final Priority: minor to major
No. This ticket is for "design and implement a Tor feature that will try bootstrapping from the directory authorities first but if they don't work then load the fallback consensus file that got shipped with Tor and try some of the relays in it."
This would technically be great to have as a DoS resistance measure in case the dirauths need to rate limit or block external network traffic, or simply crash. See #2665 (moved) and #2664 (moved) for related sibling tickets.
This ticket's solution would also solve #4483 (moved) I think, which might be possible to mark as a dup of this?
I'd love to see a solution here. I think that we might want to step back and think about whether fallbackconsensus is such a great idea, though. To make it work right, I think we'd need to get proposal 146 done, so that clients that try to bootstrap from the fallback consensus do so with a plausible list of routers.
An alternative approach is just to ship clients with a list of directory sources (IP:ORPort:IDDigest) and generate such list manually and/or via some metrics-based process.
Currently, an authority's IP:Port is used for about 4 things:
A place for clients to fetch initial directory info.
A place for servers to upload their descriptors.
A place for directory caches to fetch up-to-date directory info.
A place for authorities to contact one another for voting.
It seems to me that the first usage about is probably generating the lion's share of the load on authorities, followed probably by the third. Decoupling the first usage is the main way to achieve this ticket's goals, in my eyes. I have no horse in the race of whether fallbackconsensus is the right way to do that; it may well not be.
I've started work on a draft branch (which will be rebased). The idea is to remove the fallback consensus entirely, and instead have a FallbackDir option that lists other directory mirrors to fall back to.
Quick review of dbac20ffbd608d58d89fe644397bcf8fce2e00c0 for nickm/fallback_dirsource:
Is the plan to hardcode these in add_default_fallback_dir_servers(), ship a default torrc, or add support for an additional data dir file? I assume the first one?
I don't actually see the new "FallbackDir" command in config.c?
How sensitive to downtime are these fallback servers? It appears that the bootstrap consensus/descriptor fetching code wasn't changed for them?
Related to 3: I have not yet investigated what about the code makes it so bad that individual dirauths are unreachable for a while (#4483 (moved)). Might this change make it safer to either make the timeouts much lower or schedule multiple parallel requests? That way, we can safely add lots and lots of possibly unstable dir mirrors to our list, and also tolerate dirauth failure through this change. We risk DoSing the directory servers with too many requests through bugs this way, but that's what loglines are for. For the alpha series, we could add notice-level logs to router_pick_dirserver_generic() and/or the actual connection attempt code, and maybe even the server side, too.
I don't understand the question. We should pick ones that don't have a lot of downtime, and are going to keep the same IP for some while? Either way, that seems like another ticket entirely, right?
I don't understand the question. We should pick ones that don't have a lot of downtime, and are going to keep the same IP for some while? Either way, that seems like another ticket entirely, right?
Yeah. I guess I was convolutedly asking if we could close #4483 (moved) if we get this working. It sounds like "no" right now, but see #4483 (moved) for further questions.
Also, I think it would serve us to design this so that we can put pretty much anyone in this fallback mirror set without consequence: even if 80% of them end up being down in a year. If this means we really should solve #4483 (moved) too, perhaps #4483 (moved) should be made into a child of this ticket, instead of #2664 (moved) directly?
Ok, I reviewed this, but a couple more comments/questions:
Making the dirauths also fallback dir mirrors might complicate with how we want to handle #4483 (moved). I think we want to be able to say "attempt to get the consensus from D dirauths and M fallback mirrors" somehow. See comment 5 there for details. It looks like we can still the inspect dir_server_t is_authority flag to get this done, but that might be a reason for keeping them in fully disjoint lists instead? Or maybe not, since it's not the common case...
Can you add an example fallback dirserver line in the comments for add_default_fallback_dir_servers()? It looks like a subset of the dirauth line syntax is acceptable, but an example would be great.
Note: I don't know enough about directory activity to properly evaluate the correct use of the fallback dir servers vs dirauths in all cases, so perhaps my review doesn't count as exhaustive. But I can also test both this and #4483 (moved) with some firewall rules to block access to all of the dirauths once both are ready.
Ok, I reviewed this, but a couple more comments/questions:
Making the dirauths also fallback dir mirrors might complicate with how we want to handle #4483 (moved). I think we want to be able to say "attempt to get the consensus from D dirauths and M fallback mirrors" somehow. See comment 5 there for details. It looks like we can still the inspect dir_server_t is_authority flag to get this done, but that might be a reason for keeping them in fully disjoint lists instead? Or maybe not, since it's not the common case...
I think it's pretty easy to tweak the rules for what frequency trusted authorities get picked when we're looking for fallback mirrors. I think that while the design for #4483 (moved) is still in progress (and IMO maybe in need of a proposal), it'd be fine to merge this with the idea that we might need to tweak it some later.
Can you add an example fallback dirserver line in the comments for add_default_fallback_dir_servers()? It looks like a subset of the dirauth line syntax is acceptable, but an example would be great.
Good idea. I'll add one once we know that we want to have fallback dir servers. :) For now, the manpage should be pretty much what you want. (I hope I added a manpage entry or wow will that last sentence sound stupid.)
How will we pick the fallback servers/how long is the list expected to be? In 5c51b3f1f0d4c394392aa6fce89bbe0960117771, for example, you're introducing router_get_fallback_dirserver_by_digest() that searches a smartlist, which was fine for the short list of trusted authorities, but in comments further up on this ticket I'm seeing numbers like 500 servers being bandied about, which makes me a bit nervous about linear-time searches. Is this going to be so infrequent an operation it isn't worth worrying about, though?
I don't think I have any big concerns about this other than that one thing, but I'd like a clearer idea of how we get the default list of fallback dirservers and how long it's expected to be before signing off on this.
I think about 30-50 dirservers is more reasonable than 500. I think the right way to pick them is to look at long-lived high-uptime caches whose operators we know and which have had a persistent IP for a very long time.
For the linear search, I'm happy to move it to a loopup by digestmap if need be, but I don't see anywhere that we call it in the critical path. If that's so, I'd prefer to wait until we see a profile: I'll bet you an imaginary cookie that it doesn't show up in a profile.