During the entry guard discussions, we have decided that it's a good idea to make all relays directory servers. We mainly needed the entry guards to be directories, but it seems easier and more elegant to just turn all relays to directory servers.
This is easier nowadays than in the past because BEGIN_DIR makes it so that directory servers don't need to have a separate DirPort open. (However, maybe relays get the V2Dir flag only if they have a DirPort open?)
Also, since all relays have all the directory documents anyway, it doesn't further bloat their disk to become directory servers.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
It's easier to have all relays be caches than to have all guards be caches: until now, there is nothing in Tor (AFAIK) where a relay behaves differently depending on any flag it has been assigned in any consensus. If we say "be a dir cache if you're a guard", then we need to consider cases where the client realizes that a relay is a guard before the relay itself realizes, and so on.
(For one particularly hairy case: imagine that a negative feedback loop is created, where the extra burden of being a directory cache lowers a relay's visible bandwidth, so it is no longer a guard, so it loses the guard flag, so its visible bandwidth increases, etc etc.)
So I think "all servers cache" is a better model than "all guards cache."
- if (!server_mode(options) || !advertised_server_mode())+ if (options->ClientOnly || !advertised_server_mode())
Also, this code doesn't actually change client behavior: it would make all servers cache, but it wouldn't do anything to let clients know that they're caching so they can use them. Also IIUC it wouldn't make servers actually download directory information on the same schedule caches do, or make them keep it as long.
Trac: Milestone: N/Ato Tor: 0.2.6.x-final Status: needs_review to needs_revision
- if (!server_mode(options) || !advertised_server_mode())+ if (options->ClientOnly || !advertised_server_mode())}}}
server_mode() checks two things: 1) Is this relay a client and 2) does it have an open ORPort. We already check if it has an open ORPort so I didn't see a reason for adding a possible context switch and rechecking it. Another alternative could be:
{{{
if (options->BridgeRelay || options->DirPort_set)
if (options->BridgeRelay || options->DirPort_set || server_mode(options))
return 1;
if (!server_mode(options) || !advertised_server_mode())
if (!advertised_server_mode())
return 0;
> Also, this code doesn't actually change client behavior: it would make all servers cache, but it wouldn't do anything to let clients know that they're caching so they can use them. Also IIUC it wouldn't make servers actually download directory information on the same schedule caches do, or make them keep it as long.Yes, sorry, I posted this branch prematurely.
asn opened #12597 (moved) as a dupe, but I think that description is better, so stealing it. Adding needs-proposal because I think this will need a change in the spec.
Trac: Status: needs_revision to new Cc: bastik.public@gmx-topmail.detobastik.public@gmx-topmail.de, asn Description: Now that tor supports BEGIN_DIR it will be very nice if all relays are directory caches, as well. This will be especially useful if we move to a single guard, thus removing the possibility that the guard is not a dir cache.
to
During the entry guard discussions, we have decided that it's a good idea to make all relays directory servers. We mainly needed the entry guards to be directories, but it seems easier and more elegant to just turn all relays to directory servers.
This is easier nowadays than in the past because BEGIN_DIR makes it so that directory servers don't need to have a separate DirPort open. (However, maybe relays get the V2Dir flag only if they have a DirPort open?)
Also, since all relays have all the directory documents anyway, it doesn't further bloat their disk to become directory servers. Keywords: N/Adeleted, needs-proposal, tor-guard added Type: enhancement to task
FWIW, if we assume that the basic idea of the proposal is reasonable, we will probably want to implement the directory authority parts first, and deploy it to authorities. Then implement the relay part. And finally implement the client part.
To implement the relay side of this: Most of the stuff that checks options->DirPort_set should now call some function that wraps server_mode(). And add the appropriate fields to the router descriptor generated in router_dump_router_to_string.
To implement the authority side: add a supports_tunnelled_dir_requests field to routerinfo_t, and have that field get set in router_parse_entry_from_string when the appropriate entry is present in the descriptor, OR whenever DirPort is nonzero. Then copy it into a new "is_v2_dir" field in routerstatus when generating routerstatus entries in set_routerstatus_from_routerinfo(). Lastly, look at is_v2_dir in routerstatus when deciding whether to emit V2Dir in routerstatus_format_entry().
To implement the client side: adjust the node_is_dir function so that it returns true if it has a dirport, or if it has a routerinfo with a supports_tunnelled_dir_requests field, or if it has a routerstatus with the is_v2_dir field set. Finally, adjust router_pick_directory_server_impl() so that it checks node_is_dir() instead of status->dir_port.
(Then, audit the rest of the code that uses these functions, and test that it won't do anything funny, like try to make connections to port 0. Test, test, test.)
Anybody want to take a stab at some or all of this?
I pushed bug12538_prop_squash to my pub repo. It was tested with Chutney in a network containing relays both with and without dirports, and also in a network with zero relays with dirports (but authorities had them, as required). After the first consensus period clients successfully sent tunnelled directory requests in both networks.
Note, counter to proposal 237, clients prefer sending tunnelled requests over direct. I forgot this was the existing functionality, and I decided against changing it (soley so we match the proposal).
Slightly bikesheddy, I named the function which decides if a relay accepts directory request as "is_directory_server()", I feel like this should be changed to "am_directory_server()", because it's asking itself, so it's almost more grammatically correct. Thoughts/preferences?
I'm wondering if it would be useful to have "is_directory_server()" function that could take a routerinfo_t and test that object to see if it's a directory instead of the "options" object parsing?
It could then be used with "desc_routerinfo" object representing our router information thus fixing the semantic "issue".
Also, I'm a big fan of namespacing function with the file name so this function is in "router.c" and public thus maybe should be called "router_is_directory_server()" ? (unless there is a coding rules I'm not aware of)
I'm wondering if it would be useful to have "is_directory_server()" function that could take a routerinfo_t and test that object to see if it's a directory instead of the "options" object parsing?
It could then be used with "desc_routerinfo" object representing our router information thus fixing the semantic "issue".
I considered taking a routerinfo_t as a parameter, but I decided against it because 1) I wanted to match the signature of server_mode() as closely as possible, and 2) this function is only called when a router must decide if it serves dir requests, so an or_options_t should be sufficient.
Until this changes in the future, simply checking the is_v2_dir field of another router's routerinfo_t is all that is needed to determine if it is a dir server. I think it would be unnecessarily complicated otherwise, so I think it's okay that we have different interfaces for answering the questions "am I a directory server?" verses "is she a directory server?".
Also, I'm a big fan of namespacing function with the file name so this function is in "router.c" and public thus maybe should be called "router_is_directory_server()" ? (unless there is a coding rules I'm not aware of)
I agree, but I'll follow Nick's suggestions below in this case.
I don't think it's necessary because supports_tunnelled_dir_requests is only set when the descriptor is parsed in router_parse_entry_from_string and we set it true if either the dirport is set or "tunnelled-dir-server" was in the descriptor. Is there a case I'm missing?
+ /* This router accepts tunnelled directory requests via begindir if it has+ * an open dirport or it included "tunnelled-dir-server". */+ if (find_opt_by_keyword(tokens, K_DIR_TUNNELLED) || router->dir_port > 0) {+ router->supports_tunnelled_dir_requests = 1;+ }
dir_server_mode() or running_as_dir_server() would be a better name for is_directory_server()
Agreed, let's use the former so it matches server_mode() and public_server_mode().
Do we have a test for router_pick_directory_server_impl()? If not, should we get one somehow?
No, and yes.
It's not a very elegant solution (directly including routerlist.c). Alternatively I can make router_pick_directory_server_impl and routerlist_insert STATIC or create TOR_UNIT_TESTS-specific wrappers around them. The existing non-static wrappers added unnecessary complexity considering I'm trying to test a single function, but if using a wrapper is more mergeable then I can expand the unit test so that will work. Which do you prefer?
Making them STATIC seems like a fine idea to me, if the goal is to be able to call them from the unit tests. That's what STATIC is for, after all.
Great, done. Pushed bug12538_prop_squash_3 which squashed the changed into the original commits and rebases on current master. I should also mention I ifdefs a new function networkstatus_set_current_consensus_from_ns in networkstatus on TOR_UNIT_TESTS. It's a bit of a hack which allows setting the global current_consensus without trying to fetch new descriptors at the end. Let me know if there's a better way to do this.
I think the log_info(LD_BUG) should be a log_warn ? Otherwise, we'll never see it.
Isn't there a define for the length of the output of tor_addr_to_str? Also, doesn't it always nul-terminate?
#endif //TOR_UNIT_TESTING should refer to TOR_UNIT_TESTS instead.
If we use a switch in networkstatus_set_current_consensus_from_ns() , then we'll get a warning if we forget to add more cases in the future.
current_ns_consensus = NULL and current_md_consensus=NULL are both redundant.
Why return -2 instead of -1 ?
More important:
We need to look at something like decide_to_advertise_dirport() [maybe a modified version of it] when deciding whether to set tunneled-dir-server. The bandwidthrate stuff is particularly important.
The directory_fetches_from_authorities() function should also return 0 if we decided not to set dir_port or tunneled-dir-requests. Otherwise low-bandwidth ORs will hit the authorities rather than the caches, which is not what we want.
Oh. Also, make sure you try compiling with --enable-gcc-warnings , and fix all the warnings this finds. (I think there's one in the unused arg of test_nodelist_node_is_dir().)
In the tests:
+ memset(rs3->status.descriptor_digest, 181818, DIGEST_LEN) makes no sense; the second argument to memset needs to be a byte.
The new test code seems to be substantially similar to other tests in test_dir. Can the common code be factored out? Copy-and-paste is a bad practice in programming; that's what we have functions for.
thanks for the reviews! These issues should be resolved now.
I also tested this branch in mixed chutney nets with 0.2.5 - one network had five authdirs running this patch and another network had three 0.2.5 and two authdirs running this patch, both nets had a mix of clients and relays running 0.2.5 and this patch. No warning during voting and flags were assigned correctly.
When all five dirauths ran this patch, all relays which declared "tunnelled-dir-server" were assigned v2dir, regardless of dirport configuration and patched clients chose them and successfully used them. 0.2.5 clients only chose the dirauths and relays with advertised dirports.
In the net with three 0.2.5 dirauths and two patched dirauths, the relays were assigned v2dir only if they had an open dirport. Patched relays and clients fellback to the current functionality - only choosing dircaches with open dirports.
There are two branches. feature12538_rebased_2 has subsequent changes/corrections as new commits. feature12538_rebased_2_fixup squashed and fixed up the changes with the original commits. Both are rebased and apply cleanly on master.
Re-reviewing based on the git diff master... output:
directory_fetches_from_authorities() has a semantic change that seems wrong in the replacement of me->dir_port. But now I see that dir_server_mode() is not quite the same...
In the router.c changes, are we at any risk of having bridges not be directory caches? Bridges all need to be dircaches.
Shouldn't the documentation for supports_tunnelled_dir_requests say it's also set if dir_port > 0?
To restate my concern from above, doing this would mean requiring all relays need to have ~200 MB of disk around to cache old documents, without need because we mainly want guards to be caches.
Re-reviewing based on the git diff master... output:
directory_fetches_from_authorities() has a semantic change that seems wrong in the replacement of me->dir_port. But now I see that dir_server_mode() is not quite the same...
yes and yes. The semantic change was wrong, I think I made it correct now, though. dir_server_mode did change slightly since the last the you looked at this - now we confirm we think we have enough available bandwidth before we advertise we're a dircache.
In the router.c changes, are we at any risk of having bridges not be directory caches? Bridges all need to be dircaches.
No, I don't think so. I re-reviewed the changes and I don't see anything that impacts the bridge logic. As a sanity check I ran a bridges and it accepted and responded to tunneled dir connections without a problem.
Shouldn't the documentation for supports_tunnelled_dir_requests say it's also set if dir_port > 0?
Yes. Corrected.
Otherwise this looks ok to me now.
Great! But now things are changing slightly, but not too much, I hope.
To restate my concern from above, doing this would mean requiring all relays need to have ~200 MB of disk around to cache old documents, without need because we mainly want guards to be caches.
Sorry, I see now you mentioned this in [comment:15 comment 15].
Sebastian and I discussed this, and the current design is not-quite-right. We can certainly make every relay a dircache, but that doesn't avoid unnecessary consequences. As an example, this would have a significant impact on memory-constrained systems. As such, more precisely, our goal is that, after the network has sufficiently upgraded, whenever a client chooses a guard it should always be a dircache. This leaves two options, either all guards become a dircache (when they get the guard flag) or being a dircache is a requirement for getting the guard flag. The former is tricky, as arma said above, the later requires a spec change.
The current plan, given this, is implementing the latter option based on my current patch. Step 0, add a torrc option so relay operators can control whether or not their relay is a dircache. Step 1, get this merged into 0.2.6, as planned. Step 2, write proposal amending Guard flag criteria. Step 3, implement proposal and merge into 0.2.7.
Not sure if this is the right place to ask, but I'm receiving warnings from CollecTor about bridge descriptors containing unrecognized "tunnelled-dir-server\n" lines. Is that line going to be included in the spec and using that format (no arguments, just the keyword)? If so, I'll modify CollecTor to keep that line. Right now, these descriptors are discarded by CollecTor. Thanks! (And if there's a better place to ask this, please let me know.)
Not sure if this is the right place to ask, but I'm receiving warnings from CollecTor about bridge descriptors containing unrecognized "tunnelled-dir-server\n" lines. Is that line going to be included in the spec and using that format (no arguments, just the keyword)? If so, I'll modify CollecTor to keep that line. Right now, these descriptors are discarded by CollecTor. Thanks! (And if there's a better place to ask this, please let me know.)
It is the right line for now, but the code hasn't been merged. This argues strongly for not merging anything until we have the spec updated.
Understood. I decided to merge the code anyway, because we shouldn't discard these bridge descriptors because of this new line. If the format changes in the future, we'll have to add the new format as well. Either let me know when that happens, or expect me to ask whenever CollecTor starts shouting at me. Thanks for the information, and sorry for the noise.
Not sure if this is the right place to ask, but I'm receiving warnings from CollecTor about bridge descriptors containing unrecognized "tunnelled-dir-server\n" lines. Is that line going to be included in the spec and using that format (no arguments, just the keyword)? If so, I'll modify CollecTor to keep that line. Right now, these descriptors are discarded by CollecTor. Thanks! (And if there's a better place to ask this, please let me know.)
Ah, Yikes! Sorry Karsten! That was my test bridge :(
I wasn't sure if the spec should be updated before or after the patch was merged, but now I think Sebastian is correct. Spec update on prop237_spec_update in https://git.torproject.org/user/sysrqb/torspec.git.
To restate my concern from above, doing this would mean requiring all relays need to have ~200 MB of disk around to cache old documents, without need because we mainly want guards to be caches.
Sorry, I see now you mentioned this in [comment:15 comment 15].
Sebastian and I discussed this, and the current design is not-quite-right. We can certainly make every relay a dircache, but that doesn't avoid unnecessary consequences. As an example, this would have a significant impact on memory-constrained systems. As such, more precisely, our goal is that, after the network has sufficiently upgraded, whenever a client chooses a guard it should always be a dircache. This leaves two options, either all guards become a dircache (when they get the guard flag) or being a dircache is a requirement for getting the guard flag. The former is tricky, as arma said above, the later requires a spec change.
The current plan, given this, is implementing the latter option based on my current patch. Step 0, add a torrc option so relay operators can control whether or not their relay is a dircache. Step 1, get this merged into 0.2.6, as planned. Step 2, write proposal amending Guard flag criteria. Step 3, implement proposal and merge into 0.2.7.
Hm. If there's a special flag you need to set in order to become a guard, I think that would probably be bad for the network. I wonder if we couldn't have high-bandwidth relays (as determined by some criteria) automatically turn this on?
We should also adjust our planned timing for amending the Guard flag criteria so that we don't cause a large number of Guards to disappear because they were running older versions or hadn't set the flag.
The idea is rather that the flag is turned on by default but you can turn it off. We looked at the numbers yesterday, even if we upgraded right away we'd only lose about 15% of all guards (that's way too much to do it, but not so much to make this unfeasible imo)
Yes, the idea is the flag will be on by default, but operators may opt-out, if they want. I also agree 0.2.7 is almost certainly too soon, but the timing of that change can happen when the conditions are correct, no rush.
This implements a new configuration option, named DirCache. Manpage and some tests are included. Changes file is slightly amended so it mentions this new option. A chutney net seemed to operate without any surprises, and bridges should not be affected and do not seem affected based on testing. During configuration validation tor explicitly rejects configs which both disable DirCache and enable a DirPort or BridgeRelay. During validation tor also emits a warning if its detected the system has less than 300MB of memory. 300MB is a bit arbitrary. I originally considered warning at 400MB, but decided to decrease it.
Same a last time, two branches, feature12538_rebased_3 was append-only and _fixup was squashed/fixed-up. Rebased on master as of ~10 hours ago.
feature12538_rebased_3
feature12538_rebased_3_fixup
In the spec, you should be concrete with the version that introduced it, like "versions until 0.2.6.2-alpha don't recognize this".
In the code, I noticed this comment:
+ /** True iff this router included "tunnelled-dir-server" in its descriptor,+ * implies it accepts tunnelled directory requests, or it advertised+ * dir_port > 0. */
should the "implies" be implying? When I read it first I wonder if there are three ways this can be true.
I haven't yet review the test changes, only the code. Will hopefully get to the tests tonight
I'm concerned about the routerlist_free_all() call because what happens with the node_t things that we created along with it? Just calling nodelist_free_all() would be a workaround, but I think this may have exposed a deeper problem with routerlist_free_all(). Hrm.
I'm concerned about the routerlist_free_all() call because what happens with the node_t things that we created along with it? Just calling nodelist_free_all() would be a workaround, but I think this may have exposed a deeper problem with routerlist_free_all(). Hrm.
I added the workaround. Good catch! After thinking about this some more, if it were the case where routerlist_free_all() were called at any other time, excluding its call during cleanup prior to exit and in tests, I wonder if we should explicitly call any other _free_all(), such as for the microdescs. At this point, I don't think we need to because I don't see any other dangling pointers remaining after calling routerlist_free_all() and nodelist_free_all(). The microdesc referenced by the node_t should still be ref'd in the microdesc cache, and the routerstatus should still be ref'd in the latest consensus. Is this not the case in some situations?
Thanks for the review, Sebastian!
Rebased on master and pushed the changes:
feature12538_rebased_4
feature12538_rebased_4_fixup