fallback-consensus file impractical to use

changed milestone to %Tor: 0.2.4.x-final

added bootstrap component::core tor/tor milestone::Tor: 0.2.4.x-final parent::2664 performance priority::high prop206 resolution::implemented status::closed tor-client tor-dos-dirauth type::enhancement version::0.2.0.9-alpha labels

The right time to put fallback-consensus in is before make dist, not after.

I think that the effects of the desired "voodoo" should be to to have make dist include it if it's present and nonempty, and to have dist-rpm do the same.

How to achieve this voodoo does not immediately leap out at me; more effort may help.

The right next steps are to try this and see what breaks and report it exactly.

Ok, I changed src/config/Makefile.am back in r13221. Andrew, can you tell us exactly how make dist-rpm fails now?

I'm guessing we need to change tor.spec so it knows about this new file and what it should do with it?

Here's an excerpt from Andrew's make dist-rpm:

Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/tor-0.2.0.17.alpha.dev-tor.0.rh4_6.200801220042-root error: Installed (but unpackaged) file(s) found: /usr/share/fallback-consensus

Looks like we don't need autoconf voodoo. We need rpm spec file voodoo.

This could be as simple as adding a %{_datadir}/fallback-consensus to the %files section of the spec. But maybe it needs to be made conditional somehow so that the rpm build doesn't fail if the fallback-consensus is absent. That part, I don't know how to do.

Also, maybe we should be installing this into /usr/share/tor rather than into /usr/share?

(Fallback consensus logic gives broken results on 0.2.0.x when it's used; postponing till after 0.2.0.x, since it can't actually be used well here.)

Do we still want to do fallback consensus things at all?

Trac:
Description: We can put the fallback-consensus in the tarball that results from 'make dist', but it breaks 'make dist-rpm'.

Right now (0.2.0.12-alpha) it's commented out of src/config/Makefile.am

We need whatever voodoo it takes to let make dist-rpm do its thing too, before we can reenable it.

[Automatically added by flyspray2trac: Operating System: All]

to

We can put the fallback-consensus in the tarball that results from 'make dist', but it breaks 'make dist-rpm'.

Right now (0.2.0.12-alpha) it's commented out of src/config/Makefile.am

We need whatever voodoo it takes to let make dist-rpm do its thing too, before we can reenable it.

[Automatically added by flyspray2trac: Operating System: All]
Parent: N/A to N/A
Keywords: N/A deleted, N/A added

Well, the goal was for new clients to hit a long-lived directory cache rather than hitting an authority per se, and it's a good idea to do that through some means. The "FallbackConsensus" means doesn't seem like it's necessarily a great one, though. Perhaps we should remove it go for something else.

The main remaining goal is to give Tor clients a lot more than 8 IP addresses to bootstrap into the network.

We haven't moved forward on the fallback consensus design because we would use it in place of the 8 directory authorities when trying to bootstrap, and if half the relays in it are down, we spend a lot of time timing out before we find one that works.

I think a good compromise approach would be to only fall back to the fallback consensus when the 8 known bootstrap points have failed. It will mean it takes longer for your Tor to bootstrap in the case where before it wouldn't have worked at all, but it won't slow things down if they're going to work normally.

Another approach would just be to stick a pile of IP addresses in each release, and run some metrics function over the recent directory archives to spit out the 500 addresses that seem most promising. But formatting things as a consensus seems like it should save some coding, specifying, etc.

In the distant future we could add some sort of flag like IsLikelyToStillBeInThisLocationLater (I think there's a proposal for that), but if the only use for that flag is having clients read it from disk in their fallback consensus, it's kind of weird to tell it to all clients all the time.

Yet another approach would be having directory authorities able to vote on a special consensus that includes only relays they think will be around for a while. They would vote on it and write it to disk every day or so, and we could just package the most recent version of that file in each release.

Yet yet another approach would be to have the metrics script generate something that looks like a consensus but has no signatures, and then we'd ship that and the only Tor code changes would be having Tor not worry about signatures when handling that file.

Trac:
Milestone: post 0.2.1.x to Tor: unspecified

Trac:
Actualpoints: N/A to N/A
Summary: fallback-consensus needs autoconf voodoo to fallback-consensus file impractical to use
Points: N/A to N/A

This is something I'd be excited to see designed and deployed in the 0.2.3 timeframe.

Most of the infrastructure is in place -- we just need to not use it unless the connections to the 'main' bootstrap mechanisms fail. See also #2878 (moved) for an interacting issue.

Trac:
Milestone: Tor: unspecified to Tor: 0.2.3.x-final
Priority: minor to major

Trac:
Keywords: N/A deleted, performance bootstrap added

Trac:
Milestone: Tor: 0.2.3.x-final to Tor: unspecified

Is this a ticket for the rpm packaging component now?

No. This ticket is for "design and implement a Tor feature that will try bootstrapping from the directory authorities first but if they don't work then load the fallback consensus file that got shipped with Tor and try some of the relays in it."

This would technically be great to have as a DoS resistance measure in case the dirauths need to rate limit or block external network traffic, or simply crash. See #2665 (moved) and #2664 (moved) for related sibling tickets.

This ticket's solution would also solve #4483 (moved) I think, which might be possible to mark as a dup of this?

Trac:
Parent: N/A to #2664 (moved)

Trac:
Milestone: Tor: unspecified to Tor: 0.2.4.x-final
Keywords: performance bootstrap deleted, performance bootstrap dos-resistance added

I'd love to see a solution here. I think that we might want to step back and think about whether fallbackconsensus is such a great idea, though. To make it work right, I think we'd need to get proposal 146 done, so that clients that try to bootstrap from the fallback consensus do so with a plausible list of routers.

An alternative approach is just to ship clients with a list of directory sources (IP:ORPort:IDDigest) and generate such list manually and/or via some metrics-based process.

Currently, an authority's IP:Port is used for about 4 things:

A place for clients to fetch initial directory info.
A place for servers to upload their descriptors.
A place for directory caches to fetch up-to-date directory info.
A place for authorities to contact one another for voting.

It seems to me that the first usage about is probably generating the lion's share of the load on authorities, followed probably by the third. Decoupling the first usage is the main way to achieve this ticket's goals, in my eyes. I have no horse in the race of whether fallbackconsensus is the right way to do that; it may well not be.

I've started work on a draft branch (which will be rebased). The idea is to remove the fallback consensus entirely, and instead have a FallbackDir option that lists other directory mirrors to fall back to.

Oh. The draft branch is called fallback_dirsource

Quick review of dbac20ffbd608d58d89fe644397bcf8fce2e00c0 for nickm/fallback_dirsource:

Is the plan to hardcode these in add_default_fallback_dir_servers(), ship a default torrc, or add support for an additional data dir file? I assume the first one?
I don't actually see the new "FallbackDir" command in config.c?
How sensitive to downtime are these fallback servers? It appears that the bootstrap consensus/descriptor fetching code wasn't changed for them?

Related to 3: I have not yet investigated what about the code makes it so bad that individual dirauths are unreachable for a while (#4483 (moved)). Might this change make it safer to either make the timeouts much lower or schedule multiple parallel requests? That way, we can safely add lots and lots of possibly unstable dir mirrors to our list, and also tolerate dirauth failure through this change. We risk DoSing the directory servers with too many requests through bugs this way, but that's what loglines are for. For the alpha series, we could add notice-level logs to router_pick_dirserver_generic() and/or the actual connection attempt code, and maybe even the server side, too.

hardcode, I think.
Oops. Added in .
I don't understand the question. We should pick ones that don't have a lot of downtime, and are going to keep the same IP for some while? Either way, that seems like another ticket entirely, right?

oops. Added in . 2'. oops. oops. added in bbc4c0222ae496799

Replying to nickm:

hardcode, I think.

Oops. Added in .

Great, this sounds fine then.

I don't understand the question. We should pick ones that don't have a lot of downtime, and are going to keep the same IP for some while? Either way, that seems like another ticket entirely, right?

Yeah. I guess I was convolutedly asking if we could close #4483 (moved) if we get this working. It sounds like "no" right now, but see #4483 (moved) for further questions.

Also, I think it would serve us to design this so that we can put pretty much anyone in this fallback mirror set without consequence: even if 80% of them end up being down in a year. If this means we really should solve #4483 (moved) too, perhaps #4483 (moved) should be made into a child of this ticket, instead of #2664 (moved) directly?

Now see fallback_dirsource_v2. I am liking this branch. If we can actually populate these with something reasonable, I think it could be a good idea.

Trac:
Status: new to needs_review

Ok, I reviewed this, but a couple more comments/questions:

Making the dirauths also fallback dir mirrors might complicate with how we want to handle #4483 (moved). I think we want to be able to say "attempt to get the consensus from D dirauths and M fallback mirrors" somehow. See comment 5 there for details. It looks like we can still the inspect dir_server_t is_authority flag to get this done, but that might be a reason for keeping them in fully disjoint lists instead? Or maybe not, since it's not the common case...
Can you add an example fallback dirserver line in the comments for add_default_fallback_dir_servers()? It looks like a subset of the dirauth line syntax is acceptable, but an example would be great.

Note: I don't know enough about directory activity to properly evaluate the correct use of the fallback dir servers vs dirauths in all cases, so perhaps my review doesn't count as exhaustive. But I can also test both this and #4483 (moved) with some firewall rules to block access to all of the dirauths once both are ready.

Trac:
Keywords: performance bootstrap dos-resistance deleted, performance bootstrap dirauth-dos-resistance added

Replying to mikeperry:

Ok, I reviewed this, but a couple more comments/questions:

Making the dirauths also fallback dir mirrors might complicate with how we want to handle #4483 (moved). I think we want to be able to say "attempt to get the consensus from D dirauths and M fallback mirrors" somehow. See comment 5 there for details. It looks like we can still the inspect dir_server_t is_authority flag to get this done, but that might be a reason for keeping them in fully disjoint lists instead? Or maybe not, since it's not the common case...

I think it's pretty easy to tweak the rules for what frequency trusted authorities get picked when we're looking for fallback mirrors. I think that while the design for #4483 (moved) is still in progress (and IMO maybe in need of a proposal), it'd be fine to merge this with the idea that we might need to tweak it some later.

Can you add an example fallback dirserver line in the comments for add_default_fallback_dir_servers()? It looks like a subset of the dirauth line syntax is acceptable, but an example would be great.

Good idea. I'll add one once we know that we want to have fallback dir servers. :) For now, the manpage should be pretty much what you want. (I hope I added a manpage entry or wow will that last sentence sound stupid.)

Trac:
Keywords: performance bootstrap dirauth-dos-resistance deleted, performance bootstrap dirauth-dos-resistance needs-proposal added

Trac:
Keywords: performance bootstrap dirauth-dos-resistance needs-proposal deleted, performance bootstrap dirauth-dos-resistance needs-proposal tor-client added

Trac:
Component: Tor Client to Tor

Trac:
Cc: arma,nickm to arma, nickm, ln5

Trac:
Keywords: performance bootstrap dirauth-dos-resistance needs-proposal tor-client deleted, performance bootstrap dirauth-dos-resistance prop206 tor-client added

Okay, are there remaining reasons I shouldn't merge fallback_dirsource_v2 ?

There's a now a rebased version in fallback_dirsource_v3 that works on master.

Begin code review:

How will we pick the fallback servers/how long is the list expected to be? In 5c51b3f1f0d4c394392aa6fce89bbe0960117771, for example, you're introducing router_get_fallback_dirserver_by_digest() that searches a smartlist, which was fine for the short list of trusted authorities, but in comments further up on this ticket I'm seeing numbers like 500 servers being bandied about, which makes me a bit nervous about linear-time searches. Is this going to be so infrequent an operation it isn't worth worrying about, though?

I don't think I have any big concerns about this other than that one thing, but I'd like a clearer idea of how we get the default list of fallback dirservers and how long it's expected to be before signing off on this.

I think about 30-50 dirservers is more reasonable than 500. I think the right way to pick them is to look at long-lived high-uptime caches whose operators we know and which have had a persistent IP for a very long time.

For the linear search, I'm happy to move it to a loopup by digestmap if need be, but I don't see anywhere that we call it in the critical path. If that's so, I'd prefer to wait until we see a profile: I'll bet you an imaginary cookie that it doesn't show up in a profile.

Merging with Andrea's okay.

Trac:
Status: needs_review to closed
Resolution: None to implemented

Canonicalize dirauth-dos to tor-dos-dirauth

Trac:
Keywords: performance bootstrap dirauth-dos-resistance prop206 tor-client deleted, performance, tor-client, bootstrap, tor-dos-dirauth, prop206 added

closed

mentioned in issue #2665 (moved)

mentioned in issue #2681 (moved)

mentioned in issue #2701 (moved)

mentioned in issue #3792 (closed)

fallback-consensus file impractical to use

Child items ...

Activity