Opened 5 years ago

Closed 4 years ago

Last modified 3 years ago

#12538 closed task (fixed)

Make all relays automatically be dir caches

Reported by: cypherpunks Owned by:
Priority: High Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: tor-guard, tor-relay, prop237, 026-triaged-1, sebastian-review, 027-triaged-1-out, 028-triage, 028-triaged, mike-can, pre028-patch, TorCoreTeam201512, 201511-deferred
Cc: bastik.public@…, asn, ln5, mikeperry, isis Actual Points:
Parent ID: Points: medium/large
Reviewer: Sponsor:

Description (last modified by sysrqb)

During the entry guard discussions, we have decided that it's a good idea to make all relays directory servers. We mainly needed the entry guards to be directories, but it seems easier and more elegant to just turn all relays to directory servers.

This is easier nowadays than in the past because BEGIN_DIR makes it so that directory servers don't need to have a separate DirPort open. (However, maybe relays get the V2Dir flag only if they have a DirPort open?)

Also, since all relays have all the directory documents anyway, it doesn't further bloat their disk to become directory servers.

Child Tickets

Change History (100)

comment:1 Changed 5 years ago by cypherpunks

Component: - Select a componentTor

comment:2 Changed 5 years ago by bastik

Cc: bastik.public@… added
Keywords: tor-relay added

(I'm not the creator of this ticket, I just want to get updates)

if we move to a single guard, thus removing the possibility that the guard is not a dir cache.

At least all Guards would have to be dir caches then.

comment:3 Changed 5 years ago by nickm

It's easier to have all relays be caches than to have all guards be caches: until now, there is nothing in Tor (AFAIK) where a relay behaves differently depending on any flag it has been assigned in any consensus. If we say "be a dir cache if you're a guard", then we need to consider cases where the client realizes that a relay is a guard before the relay itself realizes, and so on.

(For one particularly hairy case: imagine that a negative feedback loop is created, where the extra burden of being a directory cache lowers a relay's visible bandwidth, so it is no longer a guard, so it loses the guard flag, so its visible bandwidth increases, etc etc.)

So I think "all servers cache" is a better model than "all guards cache."

comment:4 in reply to:  3 Changed 5 years ago by sysrqb

Status: newneeds_review

Replying to nickm:

So I think "all servers cache" is a better model than "all guards cache."

Great, I agree.

First attempt at a patch on branch bug12538 in my repo. gitweb. Mostly untested.

comment:5 Changed 5 years ago by nickm

Milestone: Tor: 0.2.6.x-final
Status: needs_reviewneeds_revision

I'm not sure I understand the purpose of:

-  if (!server_mode(options) || !advertised_server_mode())
+  if (options->ClientOnly || !advertised_server_mode())

Also, this code doesn't actually change client behavior: it would make all servers cache, but it wouldn't do anything to let clients know that they're caching so they can use them. Also IIUC it wouldn't make servers actually download directory information on the same schedule caches do, or make them keep it as long.

comment:6 in reply to:  5 Changed 5 years ago by sysrqb

Replying to nickm:

I'm not sure I understand the purpose of:

-  if (!server_mode(options) || !advertised_server_mode())
+  if (options->ClientOnly || !advertised_server_mode())

server_mode() checks two things: 1) Is this relay a client and 2) does it have an open ORPort. We already check if it has an open ORPort so I didn't see a reason for adding a possible context switch and rechecking it. Another alternative could be:

-  if (options->BridgeRelay || options->DirPort_set)
+  if (options->BridgeRelay || options->DirPort_set || server_mode(options))
    return 1;
-  if (!server_mode(options) || !advertised_server_mode())
+  if (!advertised_server_mode())
    return 0;

Also, this code doesn't actually change client behavior: it would make all servers cache, but it wouldn't do anything to let clients know that they're caching so they can use them. Also IIUC it wouldn't make servers actually download directory information on the same schedule caches do, or make them keep it as long.

Yes, sorry, I posted this branch prematurely.

comment:7 Changed 5 years ago by sysrqb

Cc: asn added
Description: modified (diff)
Keywords: tor-guard needs-proposal added
Status: needs_revisionnew
Type: enhancementtask

asn opened #12597 as a dupe, but I think that description is better, so stealing it. Adding needs-proposal because I think this will need a change in the spec.

comment:8 Changed 5 years ago by asn

Matt posted a proposal draft here:
https://lists.torproject.org/pipermail/tor-dev/2014-July/007247.html

FWIW, if we assume that the basic idea of the proposal is reasonable, we will probably want to implement the directory authority parts first, and deploy it to authorities. Then implement the relay part. And finally implement the client part.

comment:9 Changed 5 years ago by nickm

Keywords: 026-triaged-1 added

comment:10 Changed 5 years ago by nickm

Keywords: prop237 added; needs-proposal removed

Matthew Finkel's proposal is now number 237.

comment:11 Changed 5 years ago by nickm

To implement the relay side of this: Most of the stuff that checks options->DirPort_set should now call some function that wraps server_mode(). And add the appropriate fields to the router descriptor generated in router_dump_router_to_string.

To implement the authority side: add a supports_tunnelled_dir_requests field to routerinfo_t, and have that field get set in router_parse_entry_from_string when the appropriate entry is present in the descriptor, OR whenever DirPort is nonzero. Then copy it into a new "is_v2_dir" field in routerstatus when generating routerstatus entries in set_routerstatus_from_routerinfo(). Lastly, look at is_v2_dir in routerstatus when deciding whether to emit V2Dir in routerstatus_format_entry().

To implement the client side: adjust the node_is_dir function so that it returns true if it has a dirport, or if it has a routerinfo with a supports_tunnelled_dir_requests field, or if it has a routerstatus with the is_v2_dir field set. Finally, adjust router_pick_directory_server_impl() so that it checks node_is_dir() instead of status->dir_port.

(Then, audit the rest of the code that uses these functions, and test that it won't do anything funny, like try to make connections to port 0. Test, test, test.)

Anybody want to take a stab at some or all of this?

comment:12 Changed 5 years ago by arma

Summary: A relay is automatically a dir cacheMake all relays automatically be dir caches

comment:13 Changed 5 years ago by Sebastian

Should this maybe be "All guards should be dir caches"? We're adding a nuch of diskspace requirements for dir caches with the consensus diff stuff

comment:14 Changed 5 years ago by arma

See the above comments on why making only guards do this behavior is tricky -- how does a relay know whether it's a guard?

I haven't been paying attention, but I hope the new diskspace requirements shouldn't be more than tens of megabytes, right?

comment:15 in reply to:  14 ; Changed 5 years ago by Sebastian

Replying to arma:

See the above comments on why making only guards do this behavior is tricky -- how does a relay know whether it's a guard?

I meant the other way around - you have to cache dir info to become a guard, and being a dir cache is a config option (default on?)

I haven't been paying attention, but I hope the new diskspace requirements shouldn't be more than tens of megabytes, right?

One week of consensuses worth, however much that turns out to be

comment:16 in reply to:  15 Changed 5 years ago by asn

Replying to Sebastian:

I haven't been paying attention, but I hope the new diskspace requirements shouldn't be more than tens of megabytes, right?

One week of consensuses worth, however much that turns out to be

microdescriptor consensuses currently take around 1.2MB each. one week worth of consensuses will be around 190 to 210 MBs.

comment:17 Changed 5 years ago by sysrqb

I pushed bug12538_prop_squash to my pub repo. It was tested with Chutney in a network containing relays both with and without dirports, and also in a network with zero relays with dirports (but authorities had them, as required). After the first consensus period clients successfully sent tunnelled directory requests in both networks.

Note, counter to proposal 237, clients prefer sending tunnelled requests over direct. I forgot this was the existing functionality, and I decided against changing it (soley so we match the proposal).

Slightly bikesheddy, I named the function which decides if a relay accepts directory request as "is_directory_server()", I feel like this should be changed to "am_directory_server()", because it's asking itself, so it's almost more grammatically correct. Thoughts/preferences?

comment:18 Changed 5 years ago by sysrqb

Status: newneeds_review

Forgot to set it.

comment:19 Changed 5 years ago by dgoulet

I'm wondering if it would be useful to have "is_directory_server()" function that could take a routerinfo_t and test that object to see if it's a directory instead of the "options" object parsing?

It could then be used with "desc_routerinfo" object representing our router information thus fixing the semantic "issue".

Also, I'm a big fan of namespacing function with the file name so this function is in "router.c" and public thus maybe should be called "router_is_directory_server()" ? (unless there is a coding rules I'm not aware of)

comment:20 Changed 5 years ago by nickm

  • In set_routerstatus_from_routerinfo, shouldn't it also set is_v2_dir to true if ri->dir_port is nonzero? That's what the proposal says, I think.
  • dir_server_mode() or running_as_dir_server() would be a better name for is_directory_server()
  • Do we have a test for router_pick_directory_server_impl()? If not, should we get one somehow?

comment:21 Changed 5 years ago by sysrqb

Pushed bug12538_prop_squash_1, adding function rename and unit test.


Replying to dgoulet:

I'm wondering if it would be useful to have "is_directory_server()" function that could take a routerinfo_t and test that object to see if it's a directory instead of the "options" object parsing?

It could then be used with "desc_routerinfo" object representing our router information thus fixing the semantic "issue".

I considered taking a routerinfo_t as a parameter, but I decided against it because 1) I wanted to match the signature of server_mode() as closely as possible, and 2) this function is only called when a router must decide if it serves dir requests, so an or_options_t should be sufficient.

Until this changes in the future, simply checking the is_v2_dir field of another router's routerinfo_t is all that is needed to determine if it is a dir server. I think it would be unnecessarily complicated otherwise, so I think it's okay that we have different interfaces for answering the questions "am I a directory server?" verses "is she a directory server?".

Also, I'm a big fan of namespacing function with the file name so this function is in "router.c" and public thus maybe should be called "router_is_directory_server()" ? (unless there is a coding rules I'm not aware of)

I agree, but I'll follow Nick's suggestions below in this case.

Replying to nickm:

  • In set_routerstatus_from_routerinfo, shouldn't it also set is_v2_dir to true if ri->dir_port is nonzero? That's what the proposal says, I think.
@@ -2090,6 +2090,7 @@ set_routerstatus_from_routerinfo(routerstatus_t *rs,
   strlcpy(rs->nickname, ri->nickname, sizeof(rs->nickname));
   rs->or_port = ri->or_port;
   rs->dir_port = ri->dir_port;
+  rs->is_v2_dir = ri->supports_tunnelled_dir_requests;

I don't think it's necessary because supports_tunnelled_dir_requests is only set when the descriptor is parsed in router_parse_entry_from_string and we set it true if either the dirport is set or "tunnelled-dir-server" was in the descriptor. Is there a case I'm missing?

+  /* This router accepts tunnelled directory requests via begindir if it has
+   * an open dirport or it included "tunnelled-dir-server". */
+  if (find_opt_by_keyword(tokens, K_DIR_TUNNELLED) || router->dir_port > 0) {
+    router->supports_tunnelled_dir_requests = 1;
+  }
  • dir_server_mode() or running_as_dir_server() would be a better name for is_directory_server()

Agreed, let's use the former so it matches server_mode() and public_server_mode().

  • Do we have a test for router_pick_directory_server_impl()? If not, should we get one somehow?

No, and yes.

It's not a very elegant solution (directly including routerlist.c). Alternatively I can make router_pick_directory_server_impl and routerlist_insert STATIC or create TOR_UNIT_TESTS-specific wrappers around them. The existing non-static wrappers added unnecessary complexity considering I'm trying to test a single function, but if using a wrapper is more mergeable then I can expand the unit test so that will work. Which do you prefer?

comment:22 Changed 5 years ago by nickm

Is there a case I'm missing?

Oh. I guess not.

Alternatively I can...

Making them STATIC seems like a fine idea to me, if the goal is to be able to call them from the unit tests. That's what STATIC is for, after all.

comment:23 in reply to:  22 Changed 5 years ago by sysrqb

Replying to nickm:

Alternatively I can...

Making them STATIC seems like a fine idea to me, if the goal is to be able to call them from the unit tests. That's what STATIC is for, after all.

Great, done. Pushed bug12538_prop_squash_3 which squashed the changed into the original commits and rebases on current master. I should also mention I ifdefs a new function networkstatus_set_current_consensus_from_ns in networkstatus on TOR_UNIT_TESTS. It's a bit of a hack which allows setting the global current_consensus without trying to fetch new descriptors at the end. Let me know if there's a better way to do this.

comment:24 Changed 5 years ago by nickm

Reviewing from scratch, with dgoulet:

Less important, but should get fixed:

  • I think the log_info(LD_BUG) should be a log_warn ? Otherwise, we'll never see it.
  • Isn't there a define for the length of the output of tor_addr_to_str? Also, doesn't it always nul-terminate?
  • #endif //TOR_UNIT_TESTING should refer to TOR_UNIT_TESTS instead.
  • If we use a switch in networkstatus_set_current_consensus_from_ns() , then we'll get a warning if we forget to add more cases in the future.
  • current_ns_consensus = NULL and current_md_consensus=NULL are both redundant.
  • Why return -2 instead of -1 ?

More important:

  • We need to look at something like decide_to_advertise_dirport() [maybe a modified version of it] when deciding whether to set tunneled-dir-server. The bandwidthrate stuff is particularly important.
  • The directory_fetches_from_authorities() function should also return 0 if we decided not to set dir_port or tunneled-dir-requests. Otherwise low-bandwidth ORs will hit the authorities rather than the caches, which is not what we want.

comment:25 Changed 5 years ago by nickm

Status: needs_reviewneeds_revision

comment:26 Changed 5 years ago by nickm

Oh. Also, make sure you try compiling with --enable-gcc-warnings , and fix all the warnings this finds. (I think there's one in the unused arg of test_nodelist_node_is_dir().)

In the tests:

  • + memset(rs3->status.descriptor_digest, 181818, DIGEST_LEN) makes no sense; the second argument to memset needs to be a byte.
  • The new test code seems to be substantially similar to other tests in test_dir. Can the common code be factored out? Copy-and-paste is a bad practice in programming; that's what we have functions for.

comment:27 Changed 4 years ago by sysrqb

Status: needs_revisionneeds_review

thanks for the reviews! These issues should be resolved now.

I also tested this branch in mixed chutney nets with 0.2.5 - one network had five authdirs running this patch and another network had three 0.2.5 and two authdirs running this patch, both nets had a mix of clients and relays running 0.2.5 and this patch. No warning during voting and flags were assigned correctly.

When all five dirauths ran this patch, all relays which declared "tunnelled-dir-server" were assigned v2dir, regardless of dirport configuration and patched clients chose them and successfully used them. 0.2.5 clients only chose the dirauths and relays with advertised dirports.

In the net with three 0.2.5 dirauths and two patched dirauths, the relays were assigned v2dir only if they had an open dirport. Patched relays and clients fellback to the current functionality - only choosing dircaches with open dirports.

There are two branches. feature12538_rebased_2 has subsequent changes/corrections as new commits. feature12538_rebased_2_fixup squashed and fixed up the changes with the original commits. Both are rebased and apply cleanly on master.

https://git.torproject.org/user/sysrqb/tor.git

comment:28 Changed 4 years ago by nickm

Re-reviewing based on the git diff master... output:

  • directory_fetches_from_authorities() has a semantic change that seems wrong in the replacement of me->dir_port. But now I see that dir_server_mode() is not quite the same...
  • In the router.c changes, are we at any risk of having bridges _not_ be directory caches? Bridges all need to be dircaches.
  • Shouldn't the documentation for supports_tunnelled_dir_requests say it's also set if dir_port > 0?

Otherwise this looks ok to me now.

comment:29 Changed 4 years ago by Sebastian

To restate my concern from above, doing this would mean requiring all relays need to have ~200 MB of disk around to cache old documents, without need because we mainly want guards to be caches.

comment:30 Changed 4 years ago by nickm

Keywords: sebastian-review added

comment:31 in reply to:  28 Changed 4 years ago by sysrqb

Replying to nickm:

Re-reviewing based on the git diff master... output:

  • directory_fetches_from_authorities() has a semantic change that seems wrong in the replacement of me->dir_port. But now I see that dir_server_mode() is not quite the same...

yes and yes. The semantic change was wrong, I think I made it correct now, though. dir_server_mode did change slightly since the last the you looked at this - now we confirm we think we have enough available bandwidth before we advertise we're a dircache.

  • In the router.c changes, are we at any risk of having bridges _not_ be directory caches? Bridges all need to be dircaches.

No, I don't think so. I re-reviewed the changes and I don't see anything that impacts the bridge logic. As a sanity check I ran a bridges and it accepted and responded to tunneled dir connections without a problem.

  • Shouldn't the documentation for supports_tunnelled_dir_requests say it's also set if dir_port > 0?

Yes. Corrected.

Otherwise this looks ok to me now.

Great! But now things are changing slightly, but not too much, I hope.

comment:32 in reply to:  29 ; Changed 4 years ago by sysrqb

Replying to Sebastian:

To restate my concern from above, doing this would mean requiring all relays need to have ~200 MB of disk around to cache old documents, without need because we mainly want guards to be caches.

Sorry, I see now you mentioned this in comment 15.

Sebastian and I discussed this, and the current design is not-quite-right. We can certainly make every relay a dircache, but that doesn't avoid unnecessary consequences. As an example, this would have a significant impact on memory-constrained systems. As such, more precisely, our goal is that, after the network has sufficiently upgraded, whenever a client chooses a guard it should always be a dircache. This leaves two options, either all guards *become* a dircache (when they get the guard flag) or being a dircache is a *requirement* for getting the guard flag. The former is tricky, as arma said above, the later requires a spec change.

The current plan, given this, is implementing the latter option based on my current patch. Step 0, add a torrc option so relay operators can control whether or not their relay is a dircache. Step 1, get this merged into 0.2.6, as planned. Step 2, write proposal amending Guard flag criteria. Step 3, implement proposal and merge into 0.2.7.

Does any of this sound crazy or wrong?

comment:33 Changed 4 years ago by karsten

Not sure if this is the right place to ask, but I'm receiving warnings from CollecTor about bridge descriptors containing unrecognized "tunnelled-dir-server\n" lines. Is that line going to be included in the spec and using that format (no arguments, just the keyword)? If so, I'll modify CollecTor to keep that line. Right now, these descriptors are discarded by CollecTor. Thanks! (And if there's a better place to ask this, please let me know.)

comment:34 in reply to:  33 Changed 4 years ago by Sebastian

Replying to karsten:

Not sure if this is the right place to ask, but I'm receiving warnings from CollecTor about bridge descriptors containing unrecognized "tunnelled-dir-server\n" lines. Is that line going to be included in the spec and using that format (no arguments, just the keyword)? If so, I'll modify CollecTor to keep that line. Right now, these descriptors are discarded by CollecTor. Thanks! (And if there's a better place to ask this, please let me know.)

It is the right line for now, but the code hasn't been merged. This argues strongly for not merging anything until we have the spec updated.

comment:35 Changed 4 years ago by karsten

Understood. I decided to merge the code anyway, because we shouldn't discard these bridge descriptors because of this new line. If the format changes in the future, we'll have to add the new format as well. Either let me know when that happens, or expect me to ask whenever CollecTor starts shouting at me. Thanks for the information, and sorry for the noise.

comment:36 in reply to:  33 Changed 4 years ago by sysrqb

Replying to karsten:

Not sure if this is the right place to ask, but I'm receiving warnings from CollecTor about bridge descriptors containing unrecognized "tunnelled-dir-server\n" lines. Is that line going to be included in the spec and using that format (no arguments, just the keyword)? If so, I'll modify CollecTor to keep that line. Right now, these descriptors are discarded by CollecTor. Thanks! (And if there's a better place to ask this, please let me know.)

Ah, Yikes! Sorry Karsten! That was my test bridge :(

I wasn't sure if the spec should be updated before or after the patch was merged, but now I think Sebastian is correct. Spec update on prop237_spec_update in https://git.torproject.org/user/sysrqb/torspec.git.

comment:37 in reply to:  32 Changed 4 years ago by nickm

Replying to sysrqb:

Replying to Sebastian:

To restate my concern from above, doing this would mean requiring all relays need to have ~200 MB of disk around to cache old documents, without need because we mainly want guards to be caches.

Sorry, I see now you mentioned this in comment 15.

Sebastian and I discussed this, and the current design is not-quite-right. We can certainly make every relay a dircache, but that doesn't avoid unnecessary consequences. As an example, this would have a significant impact on memory-constrained systems. As such, more precisely, our goal is that, after the network has sufficiently upgraded, whenever a client chooses a guard it should always be a dircache. This leaves two options, either all guards *become* a dircache (when they get the guard flag) or being a dircache is a *requirement* for getting the guard flag. The former is tricky, as arma said above, the later requires a spec change.

The current plan, given this, is implementing the latter option based on my current patch. Step 0, add a torrc option so relay operators can control whether or not their relay is a dircache. Step 1, get this merged into 0.2.6, as planned. Step 2, write proposal amending Guard flag criteria. Step 3, implement proposal and merge into 0.2.7.

Hm. If there's a special flag you need to set in order to become a guard, I think that would probably be bad for the network. I wonder if we couldn't have high-bandwidth relays (as determined by some criteria) automatically turn this on?

We should also adjust our planned timing for amending the Guard flag criteria so that we don't cause a large number of Guards to disappear because they were running older versions or hadn't set the flag.

comment:38 Changed 4 years ago by Sebastian

The idea is rather that the flag is turned on by default but you can turn it off. We looked at the numbers yesterday, even if we upgraded right away we'd only lose about 15% of all guards (that's way too much to do it, but not so much to make this unfeasible imo)

comment:39 in reply to:  38 Changed 4 years ago by sysrqb

Yes, the idea is the flag will be on by default, but operators may opt-out, if they want. I also agree 0.2.7 is almost certainly too soon, but the timing of that change can happen when the conditions are correct, no rush.

This implements a new configuration option, named DirCache. Manpage and some tests are included. Changes file is slightly amended so it mentions this new option. A chutney net seemed to operate without any surprises, and bridges should not be affected and do not seem affected based on testing. During configuration validation tor explicitly rejects configs which both disable DirCache and enable a DirPort or BridgeRelay. During validation tor also emits a warning if its detected the system has less than 300MB of memory. 300MB is a bit arbitrary. I originally considered warning at 400MB, but decided to decrease it.

Same a last time, two branches, feature12538_rebased_3 was append-only and _fixup was squashed/fixed-up. Rebased on master as of ~10 hours ago.
feature12538_rebased_3
feature12538_rebased_3_fixup

comment:40 Changed 4 years ago by Sebastian

Hey, I'm not finding much here. :)

In the spec, you should be concrete with the version that introduced it, like "versions until 0.2.6.2-alpha don't recognize this".

In the code, I noticed this comment:

+  /** True iff this router included "tunnelled-dir-server" in its descriptor,
+   * implies it accepts tunnelled directory requests, or it advertised
+   * dir_port > 0. */

should the "implies" be implying? When I read it first I wonder if there are three ways this can be true.

I haven't yet review the test changes, only the code. Will hopefully get to the tests tonight

comment:41 Changed 4 years ago by Sebastian

Ok, for a bunch of stuff you're using free() instead of tor_free(). That needs to be fixed (not just in the unit tests, also in the code).

comment:42 Changed 4 years ago by Sebastian

I'm concerned about the routerlist_free_all() call because what happens with the node_t things that we created along with it? Just calling nodelist_free_all() would be a workaround, but I think this may have exposed a deeper problem with routerlist_free_all(). Hrm.

comment:43 Changed 4 years ago by Sebastian

Status: needs_reviewneeds_revision

That concludes my review. Needs revision.

comment:44 in reply to:  42 Changed 4 years ago by sysrqb

Replying to Sebastian:

should the "implies" be implying?

Yes, fixed.

Replying to Sebastian:

Ok, for a bunch of stuff you're using free() instead of tor_free().

Fixed

Replying to Sebastian:

I'm concerned about the routerlist_free_all() call because what happens with the node_t things that we created along with it? Just calling nodelist_free_all() would be a workaround, but I think this may have exposed a deeper problem with routerlist_free_all(). Hrm.

I added the workaround. Good catch! After thinking about this some more, if it were the case where routerlist_free_all() were called at any other time, excluding its call during cleanup prior to exit and in tests, I wonder if we should explicitly call any other _free_all(), such as for the microdescs. At this point, I don't *think* we need to because I don't see any other dangling pointers remaining after calling routerlist_free_all() and nodelist_free_all(). The microdesc referenced by the node_t should still be ref'd in the microdesc cache, and the routerstatus should still be ref'd in the latest consensus. Is this not the case in some situations?

Thanks for the review, Sebastian!

Rebased on master and pushed the changes:
feature12538_rebased_4
feature12538_rebased_4_fixup

comment:45 Changed 4 years ago by sysrqb

Milestone: Tor: 0.2.6.x-finalTor: 0.2.7.x-final

comment:46 Changed 4 years ago by cypherpunks

This invalidates some parts of proposal 237, we need to update that too it seems. -- Sebastian

comment:47 Changed 4 years ago by ln5

Cc: ln5 added

comment:48 Changed 4 years ago by nickm

Note: This would supersede #11103 .

comment:49 Changed 4 years ago by nickm

Keywords: 027-triaged-1-out added

Marking triaged-out items from first round of 0.2.7 triage.

comment:50 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:51 Changed 4 years ago by asn

Anyone knows what's the current status here?

Are there unresolved protocol issues (that need proposal changes)? Or is it just a simple matter of programming? If the latter, what's wrong with sysrqb's latest branches?

comment:52 Changed 4 years ago by someone_else

Per current consensus:

Guards: 1533
Guards with V2Dir flag: 1178 (77%)
Guard Bandwidth: 26208661
V2Dir Guard Bandwidth: 21596681 (82%)

comment:53 Changed 4 years ago by mikeperry

Keywords: 0.2.8.x-triage added

comment:54 Changed 4 years ago by mikeperry

Keywords: 028-triage added; 0.2.8.x-triage removed

comment:55 Changed 4 years ago by teor

Status: needs_revisionneeds_review

comment:56 Changed 4 years ago by nickm

Keywords: 028-triaged added

comment:57 Changed 4 years ago by nickm

Points: medium/large

comment:58 Changed 4 years ago by mikeperry

Cc: mikeperry added

If this stalls out due to needing more changes after review and sysrqb is unavailable, I can pick this up. I want to solve #10969, and I think it will depend on this.

comment:59 Changed 4 years ago by mikeperry

Keywords: mike-can added

comment:60 Changed 4 years ago by nickm

Priority: MediumHigh

comment:61 Changed 4 years ago by nickm

Keywords: pre028-patch added

comment:62 Changed 4 years ago by isis

Cc: isis added
Severity: Normal

comment:63 Changed 4 years ago by nickm

I see a feature_12538_rebased_6 in sysrqb's repository. I guess that's the one to look at now.

comment:64 Changed 4 years ago by nickm

Reviewing feature_12538_rebased_6. The CHECK notes below are for me. :)

fa54b983901c396911d8e0732a7cf1ba8135443a Authorities must set a router's V2Dir flag if it supports tunnelled reqs

  • Looks okay.

05f95c8ec50b81ff6e7a7b3cc09f51971fd4f14d A relay now advertises "tunnelled-dir-server" in its descriptor

  • [FIXED] The new warning in directory_initiate_command_rend() should be LD_BUG.
  • CHECK: There should be no more instances of DirPort_set! Grep, just in case. (Yes, looks okay. The remaining instances seem fine.)
  • I worry about the definition of dir_server_mode(). Possibly, if we have advertised that we're a dirserver, we should actually be a dirserver?
  • [FIXED] The check for dir_server_mode(options) in router_dump_to_string() is wrong: we should be looking at router->supports_tunneled_dir_requests, not at our configuration. We should set that field based on our actual settings in router_build_fresh_descriptor().
  • Are we making sure to download information and be a directory cache whenever dir_server_mode() is true? (Yes, because directory_caches_dir_info() already calls dir_server_mode().)

d4c08da145564a03ffca9ef234808342613eb2d8 Client should check if dir server has open dir port or handles tunnelled requests

  • Looks okay.

91d1600328f4f6d9a0c3dcb5ab3558e23af891b4 Add unit test for router_pick_directory_server_impl

  • Looks okay.

90f2cfb81a383481cdf6e31d623c564840de7133 Add new DirCache configuration option

  • Looks okay.

be7d522274d21db2bb6bbc8d422fa35b59a4da12 Add NotDir status flag and dir-cache server entry

  • CHECK: re-read the thread above; why are we doing this? Not sure it's an improvement.

763d03da5a87727fc44a1369650694ec91197619 Add the BelieveNotDir consensus parameter

  • CHECK: re-read the thread above; why are we doing this? Not sure it's an improvement.

8a9ac52a9e41ba2db3d7045124f7c055ced85721 Fix wide line in or.h
b1efeb2c693042c7a7a01520e8b713c1cbae0960 Correct widelines in tests

  • trivially correct

02ae35e59c12920e0ba6a8d9c1712d30ca1a7ffa Rebuild descriptor when DirCache is {dis,en}abled
ae1647fd1565e0ed5718f6d82244b8ea2d80d4ee {dis,en}abling DirCache is a semantic change

  • Yup.

a832b7e12cb553dd49b69817ac7d534a6dc8aa07 A router must be a dir cache before it may be HSDir

  • Yup.

5f294acfcfe5915dc1d040e3f431446e9c862d90 Let make_consensus_method_list be used in tests

  • looks ok.

239be79713ac388861728bc1a09f815bf348eddd Automatically generate md-con method vers in test

  • [FIXED] Memory leak here?

8cd810633a8cf8a911bc652c6d405608c04bfe58 Assert rs are added in con and con_md tests

  • Looks ok.
Last edited 4 years ago by nickm (previous) (diff)

comment:65 Changed 4 years ago by nickm

feature12538_rebased_6 in my own public repository has a few commits to fix some of the (marked) issues from my review above. Only a few issues remain. Comments invited on my open questions!

comment:66 in reply to:  64 Changed 4 years ago by asn

Replying to nickm:

Reviewing feature_12538_rebased_6. The CHECK notes below are for me. :)

be7d522274d21db2bb6bbc8d422fa35b59a4da12 Add NotDir status flag and dir-cache server entry

  • CHECK: re-read the thread above; why are we doing this? Not sure it's an improvement.

763d03da5a87727fc44a1369650694ec91197619 Add the BelieveNotDir consensus parameter

  • CHECK: re-read the thread above; why are we doing this? Not sure it's an improvement.

I think the rationale for the NotDir flag was mentioned here:
https://lists.torproject.org/pipermail/tor-dev/2015-May/008883.html
however I still don't understand the rationale completely.

comment:67 Changed 4 years ago by nickm

18:17 < nickm> sysrqb: So, I'd be more comfortable just taking it without the 
               NotDir/BelieveNotDir stuff, since it doesn't do any harm to keep 
               a flag around (thanks to compression)
18:17 < nickm> I think I'll patch those out unless you're really set on them, 
               since it's extra complexity?
18:17 < sysrqb> nickm: nope, i've no objection to you deleting those. my main 
                concern was the continued use of V2Dir
18:18 < sysrqb> (which is becoming increasingly outdated)

comment:68 Changed 4 years ago by nickm

feature12538_rebased_7 in my public repository is rebased on master, with the NotDir/BelieveNotDir stuff removed, and the bugs mentioned above fixed.

My only remaining issue is:

  • I worry about the definition of dir_server_mode(). Possibly, if we have advertised that we're a dirserver, we should actually be a dirserver?

Also we should test this on chutney to verify that everything works correctly for servers without DirPort support.

comment:69 in reply to:  68 Changed 4 years ago by teor

Replying to nickm:

Also we should test this on chutney to verify that everything works correctly for servers without DirPort support.

We can do this in chutney by commenting out the line:

DirPort $dirport

in chutney/torrc_templates/relay-non-exit.tmpl

It's important that these relays work in mixed dirport / no dirport networks, which probably means creating a new torrc template and a new network, then testing that network as part of make test-network-all in tor.

The torrc_templates include structure is currently:

  • common.i
    • relay-non-exit.tmpl
      • relay.tmpl (also includes exit-v4.i and exit-v6.i)
      • relay-exit-v4-only.tmpl (also includes exit-v4.i)
      • relay-exit-v6-only.tmpl (also includes exit-v6.i)

It could be changed to:

  • common.i
    • relay-non-dirport.tmpl (contains the previous content of relay-non-exit.tmpl without DirPort)
      • relay-non-exit.tmpl (also includes dirport.i, which contains DirPort $dirport)
        • relay.tmpl (also includes exit-v4.i and exit-v6.i)
        • relay-exit-v4-only.tmpl (also includes exit-v4.i)
        • relay-exit-v6-only.tmpl (also includes exit-v6.i)

But then we would also need to make templates for exit relays without a dirport:

  • common.i
    • relay-non-dirport.tmpl (contains the previous content of relay-non-exit.tmpl without DirPort)
      • relay-non-dirport-exit-v4-v6.tmpl (also includes exit-v4.i and exit-v6.i)
      • relay-non-dirport-exit-v4-only.tmpl (also includes exit-v4.i)
      • relay-non-dirport-exit-v6-only.tmpl (also includes exit-v6.i)

(We need to do a better job of turning features like this on and off in chutney 2.0.)

comment:70 in reply to:  68 ; Changed 4 years ago by asn

Replying to nickm:

feature12538_rebased_7 in my public repository is rebased on master, with the NotDir/BelieveNotDir stuff removed, and the bugs mentioned above fixed.

My only remaining issue is:

  • I worry about the definition of dir_server_mode(). Possibly, if we have advertised that we're a dirserver, we should actually be a dirserver?

What are we worrying about here? What do you mean "actually be a dirserver"?
dir_server_mode() is:

int
dir_server_mode(const or_options_t *options)
{
  if (!options->DirCache)
    return 0;
  return (server_mode(options) || options->DirPort_set) &&
          router_should_be_directory_server(options, 0);
}

An awkward thing I noticed is that in router_build_fresh_descriptor() we do:

  ri->supports_tunnelled_dir_requests = dir_server_mode(options);

which could be false if our router had a DirPort but no support for tunneled dir conns. However, looking at the code, I think there is no way for this to happen, since all relays with DirCache enabled also support tunneled dir conns. So I think the code is OK here.

---

BTW, do we have plans for deprecating the DirPort completely? So changing the client logic to use BEGIN_DIR for everything, and eventually ripping off the DirPort code from tor?

comment:71 Changed 4 years ago by nickm

My issue is that dir_server_mode() can tell us "true" when we build a descriptor, and tell us "false" later on when we decide whether to serve directory information.

BTW, do we have plans for deprecating the DirPort completely? So changing the client logic to use BEGIN_DIR for everything, and eventually ripping off the DirPort code from tor?

Plausible, but I don't think we've made any such plan yet.

comment:72 in reply to:  70 Changed 4 years ago by arma

Replying to asn:

BTW, do we have plans for deprecating the DirPort completely? So changing the client logic to use BEGIN_DIR for everything, and eventually ripping off the DirPort code from tor?

I would want to explore the load (cpu and bandwidth) from relay descriptor uploads on the dir auths before taking that step. Making all the relays do a full v3-or-whatever handshake just to send us their little blob of text is not really great.

That said, maybe we should turn that around and argue for breaking out dir auth functionality as a whole.

comment:73 in reply to:  68 ; Changed 4 years ago by sysrqb

Replying to nickm:

feature12538_rebased_7 in my public repository is rebased on master, with the NotDir/BelieveNotDir stuff removed, and the bugs mentioned above fixed.

My only remaining issue is:

  • I worry about the definition of dir_server_mode(). Possibly, if we have advertised that we're a dirserver, we should actually be a dirserver?

Alternatively,

1) if we have the V2Dir flag then dir_server_mode() returns 1, else the function is unchanged.
2) when options->DirCache is disabled, dir_server_mode() only uses options->DirCache after the next consensus is published. Serve the documents we currently have without caching new docs until then.
3) split dir_server_mode() into dir_server_mode() and dir_cache_mode(), where dir_server_mode() looks at our is_v2_dir flag rather than options->DirCache shortcircuit; and dir_cache_mode() is:
return options->DirCache && dir_server_mode(); . Then directory_permits_begindir_requests() depends on dir_server_mode() and we continue serving clients until we notice we lost the V2Dir flag.

I worry because each of these is slightly surprising for relay operators in different ways. Tweaking the man page's description of DirCache so it says the relay stops caching new directory documents and stops accepting client dir connections after it loses its V2Dir flag (if it earned it), is easy. Or, we differentiate between caching and serving, always serving if we have V2Dir but only cache if DirCache is set. This could be sad for client's though, depending on how often they see cache misses, but maybe this is a general improvement and not something to be overly concerned about.

Also we should test this on chutney to verify that everything works correctly for servers without DirPort support.


Replying to teor:

It's important that these relays work in mixed dirport / no dirport networks, which probably means creating a new torrc template and a new network, then testing that network as part of make test-network-all in tor.

When I ran previous branches in chutney with mixed versions and mixed dirport, I modified them manually. By this I mean I created 100 nodes, then disabled the dirport on a handful of them, then I started the test network. When all nodes were running, I stopped a few of the dirport-enabled instances and restarted them with a binary of a different tor version. It wasn't the most elegant, but it worked.

comment:74 in reply to:  73 ; Changed 4 years ago by teor

Replying to sysrqb:

Replying to nickm:

feature12538_rebased_7 in my public repository is rebased on master, with the NotDir/BelieveNotDir stuff removed, and the bugs mentioned above fixed.

My only remaining issue is:

  • I worry about the definition of dir_server_mode(). Possibly, if we have advertised that we're a dirserver, we should actually be a dirserver?

Alternatively,

1) if we have the V2Dir flag then dir_server_mode() returns 1, else the function is unchanged.
2) when options->DirCache is disabled, dir_server_mode() only uses options->DirCache after the next consensus is published. Serve the documents we currently have without caching new docs until then.
3) split dir_server_mode() into dir_server_mode() and dir_cache_mode(), where dir_server_mode() looks at our is_v2_dir flag rather than options->DirCache shortcircuit; and dir_cache_mode() is:
return options->DirCache && dir_server_mode(); . Then directory_permits_begindir_requests() depends on dir_server_mode() and we continue serving clients until we notice we lost the V2Dir flag.

I worry because each of these is slightly surprising for relay operators in different ways. Tweaking the man page's description of DirCache so it says the relay stops caching new directory documents and stops accepting client dir connections after it loses its V2Dir flag (if it earned it), is easy. Or, we differentiate between caching and serving, always serving if we have V2Dir but only cache if DirCache is set. This could be sad for client's though, depending on how often they see cache misses, but maybe this is a general improvement and not something to be overly concerned about.

I think if an operator sets DirCache 0, caching *and* serving should stop.
This is analogous with setting DirPort 0, and obeys the "principle of least surprise".
Otherwise, what can an operator do to stop serving directory documents right away?
(For example, what if they are close to quota for the month?)

If DirCache is 1 but we don't yet have the V2Dir flag:
To be consistent with DirPort directories, we could serve as soon as DirCache is 1.
(This also allows self-testing. Do we do that for DirCache directories in this patch?)
Just like DirPort directories, we could trust clients to obey the consensus and not contact us until we get the V2Dir flag. (Is there any harm here?)

Clients are already pretty tolerant of directory misses.
#4483 will make clients much more tolerant of directory misses during initial bootstrap.
(Clients have very little server information during bootstrap, so misses on the few servers they have are more of an issue.)

Also we should test this on chutney to verify that everything works correctly for servers without DirPort support.


Replying to teor:

It's important that these relays work in mixed dirport / no dirport networks, which probably means creating a new torrc template and a new network, then testing that network as part of make test-network-all in tor.

When I ran previous branches in chutney with mixed versions and mixed dirport, I modified them manually. By this I mean I created 100 nodes, then disabled the dirport on a handful of them, then I started the test network. When all nodes were running, I stopped a few of the dirport-enabled instances and restarted them with a binary of a different tor version. It wasn't the most elegant, but it worked.

Yes, chutney works well with manual modification - I do it all the time to test new features.

But this feature is so important I want to make sure we test it regularly. If we integrate it into chutney's test suite, and then into tor's "make test-network-all", it will get tested on a regular basis.

comment:75 in reply to:  74 ; Changed 4 years ago by nickm

Replying to teor:

I think if an operator sets DirCache 0, caching *and* serving should stop.
This is analogous with setting DirPort 0, and obeys the "principle of least surprise".
Otherwise, what can an operator do to stop serving directory documents right away?
(For example, what if they are close to quota for the month?)

Right, I agree that DirCache 0 should always mean "don't cache."

The issue I was thinking about is another part of the definition of dir_server_mode:

int
dir_server_mode(const or_options_t *options)
{
  if (!options->DirCache)
    return 0;
  return (server_mode(options) || options->DirPort_set) &&
          router_should_be_directory_server(options, 0);
}

This means that if router_should_be_directory_server() returns false, we'll stop serving directory information, even if DirCache was set to 1 and/or we advertised a dir_port.

And the router_should_be_directory_server() function looks not only at the configuration, but also at the current accounting state state, which is the part that makes me a little concerned.

comment:76 in reply to:  75 ; Changed 4 years ago by teor

Replying to nickm:

Replying to teor:

I think if an operator sets DirCache 0, caching *and* serving should stop.
This is analogous with setting DirPort 0, and obeys the "principle of least surprise".
Otherwise, what can an operator do to stop serving directory documents right away?
(For example, what if they are close to quota for the month?)

Right, I agree that DirCache 0 should always mean "don't cache."

The issue I was thinking about is another part of the definition of dir_server_mode:

int
dir_server_mode(const or_options_t *options)
{
  if (!options->DirCache)
    return 0;
  return (server_mode(options) || options->DirPort_set) &&
          router_should_be_directory_server(options, 0);
}

This means that if router_should_be_directory_server() returns false, we'll stop serving directory information, even if DirCache was set to 1 and/or we advertised a dir_port.

And the router_should_be_directory_server() function looks not only at the configuration, but also at the current accounting state state, which is the part that makes me a little concerned.

There's an existing Tor feature that disables the DirPort if a relay operator has bandwidth accounting limits enabled. This is intended to save upload bandwidth for (symmetrical) circuit traffic, rather than (upload-heavy) directory responses.

Are you suggesting we remove this feature, and, if so, for relays using the new all-cache feature, or all directories?

comment:77 in reply to:  76 ; Changed 4 years ago by nickm

Replying to teor:

Replying to nickm:

Replying to teor:

I think if an operator sets DirCache 0, caching *and* serving should stop.
This is analogous with setting DirPort 0, and obeys the "principle of least surprise".
Otherwise, what can an operator do to stop serving directory documents right away?
(For example, what if they are close to quota for the month?)

Right, I agree that DirCache 0 should always mean "don't cache."

The issue I was thinking about is another part of the definition of dir_server_mode:

int
dir_server_mode(const or_options_t *options)
{
  if (!options->DirCache)
    return 0;
  return (server_mode(options) || options->DirPort_set) &&
          router_should_be_directory_server(options, 0);
}

This means that if router_should_be_directory_server() returns false, we'll stop serving directory information, even if DirCache was set to 1 and/or we advertised a dir_port.

And the router_should_be_directory_server() function looks not only at the configuration, but also at the current accounting state state, which is the part that makes me a little concerned.

There's an existing Tor feature that disables the DirPort if a relay operator has bandwidth accounting limits enabled. This is intended to save upload bandwidth for (symmetrical) circuit traffic, rather than (upload-heavy) directory responses.

Does this feature turn off the DIrPort entirely, or only prevent us from advertising it?

Are you suggesting we remove this feature, and, if so, for relays using the new all-cache feature, or all directories?

I was under the assumption that it made us stop advertising our DirPort, but that it left the DirPort enabled. If it turns the DirPort off entirely, then it's probably okay to leave the feature as-is in this patch, and I am just being a forgetful old hacker.

comment:78 in reply to:  77 Changed 4 years ago by teor

Replying to nickm:

Replying to teor:

Replying to nickm:

Replying to teor:

I think if an operator sets DirCache 0, caching *and* serving should stop.
This is analogous with setting DirPort 0, and obeys the "principle of least surprise".
Otherwise, what can an operator do to stop serving directory documents right away?
(For example, what if they are close to quota for the month?)

Right, I agree that DirCache 0 should always mean "don't cache."

The issue I was thinking about is another part of the definition of dir_server_mode:

int
dir_server_mode(const or_options_t *options)
{
  if (!options->DirCache)
    return 0;
  return (server_mode(options) || options->DirPort_set) &&
          router_should_be_directory_server(options, 0);
}

This means that if router_should_be_directory_server() returns false, we'll stop serving directory information, even if DirCache was set to 1 and/or we advertised a dir_port.

And the router_should_be_directory_server() function looks not only at the configuration, but also at the current accounting state state, which is the part that makes me a little concerned.

There's an existing Tor feature that disables the DirPort if a relay operator has bandwidth accounting limits enabled. This is intended to save upload bandwidth for (symmetrical) circuit traffic, rather than (upload-heavy) directory responses.

Does this feature turn off the DIrPort entirely, or only prevent us from advertising it?

It leaves it on, but prevents us from advertising it.

And it only happens if:

  • AccountingMax is turned on, and
  • we predict that we will hibernate before the end of this interval.


Are you suggesting we remove this feature, and, if so, for relays using the new all-cache feature, or all directories?

I was under the assumption that it made us stop advertising our DirPort, but that it left the DirPort enabled. If it turns the DirPort off entirely, then it's probably okay to leave the feature as-is in this patch, and I am just being a forgetful old hacker.

No, you're correct, I've never had to look into it that closely. (I think that makes me a outspoken young hacker.)

To match the DirPort behaviour, all relays should be directory caches by default.
If they think they'll run out of AccoountingMax bandwidth, they shouldn't advertise that they are a directory cache. (That is, they should drop the line from their descriptor that makes authorities give them the V2Dir flag in the consensus.)

comment:79 in reply to:  77 ; Changed 4 years ago by sysrqb

Replying to nickm:

Replying to teor:

Are you suggesting we remove this feature, and, if so, for relays using the new all-cache feature, or all directories?

I was under the assumption that it made us stop advertising our DirPort, but that it left the DirPort enabled. If it turns the DirPort off entirely, then it's probably okay to leave the feature as-is in this patch, and I am just being a forgetful old hacker.

Your less-youngness is not affecting your memory in this case. This is something that I changed, although unintentionally, in the original branch.

@@ -1153,7 +1153,7 @@ directory_caches_dir_info(const or_options_t *options)
 int
 directory_permits_begindir_requests(const or_options_t *options)
 {
-  return options->BridgeRelay != 0 || options->DirPort_set;
+  return options->BridgeRelay != 0 || dir_server_mode(options);
 }

0061807334d54a56ba10d2ae956961291e8183d6 in feature12538_rebased_7.

And the accounting thresholds were only used when publishing a new descriptor. It doesn't seem like there's a good reason for changing this behavior, so changing directory_permits_begindir_requests() so it doesn't depend on router_should_be_directory_server() seems like a good idea.

I'm now wondering if we don't want to use dir_server_mode() other places, as well. In particular, currently the behavior is now different for caching directory info, too. Relays won't cache them if they're near their bandwidth usage limit. Maybe this is better than the old behavior? I'm thinking we should stick with what we currently do unless there's a better reason for doing something different.

comment:80 in reply to:  79 Changed 4 years ago by teor

Replying to sysrqb:

I'm now wondering if we don't want to use dir_server_mode() other places, as well. In particular, currently the behavior is now different for caching directory info, too.

I'm confused. Current / different as in "your latest patch", or "the latest master"?

Relays won't cache them if they're near their bandwidth usage limit. Maybe this is better than the old behavior? I'm thinking we should stick with what we currently do unless there's a better reason for doing something different.

It would be helpful to match the current behaviour for DirPort directories, so that operators aren't surprised by this new feature. If we feel there's a need for a change to directory caching behaviour in general, let's do it in a separate ticket.

comment:81 Changed 4 years ago by nickm

I've added a commit to feature_12538_rebased_7 to (almost) restore the old behavior. What do we think now?

comment:82 Changed 4 years ago by teor

Keywords: TorCoreTeam201511 added

comment:83 Changed 4 years ago by dgoulet

Status: needs_reviewneeds_revision

Here is an issue I found:

  • In src/or/or.h:

node_is_dir() checks both the tunnelled support and dir_port which can by pass the fact that we shouldn't be advertising to be a dircache:

+    return ri->supports_tunnelled_dir_requests ||
+           ri->dir_port > 0;

It's possible for supports_tunnelled_dir_requests to be 0, for instance if the Accounting is enabled and we've reached our max. But if we have a DirPort, it will bypass it and return 1. Seems to me that we maybe don't want to be used as a directory cache in that case?

I think just testing supports_tunnelled_dir_requests is enough since it will be 1 if the DirPort is set and if the rest of the requirements are met (Accounting for instance).

  • Also fun fact, this below will make that a relay can opt-out of being an HSDir once it's accounting has reached the max or it's bandwitdh has changed dynamically (because of router_should_be_directory_server()):
+  return (router->wants_to_be_hs_dir &&
+          router->supports_tunnelled_dir_requests &&
...

which could lead to client reachability issue (I think it won't be severe) if an HSDir can come and go at each consensus in a 24 hour period. _BUT_ it could also explain why we are seeing HSDir responding NOT_FOUND when they are suppose to have the descriptor because Accounting max was reached in that time period?...

The rest lgtm;

comment:84 in reply to:  83 Changed 4 years ago by teor

Replying to dgoulet:

Here is an issue I found:

  • In src/or/or.h:

node_is_dir() checks both the tunnelled support and dir_port which can by pass the fact that we shouldn't be advertising to be a dircache:

+    return ri->supports_tunnelled_dir_requests ||
+           ri->dir_port > 0;

It's possible for supports_tunnelled_dir_requests to be 0, for instance if the Accounting is enabled and we've reached our max. But if we have a DirPort, it will bypass it and return 1. Seems to me that we maybe don't want to be used as a directory cache in that case?

Definitely - AccountingMax needs to work properly.

I think just testing supports_tunnelled_dir_requests is enough since it will be 1 if the DirPort is set and if the rest of the requirements are met (Accounting for instance).

I agree. (See my teor for the behaviour we're trying for - trying to match the current directory-with-DirPort behaviour.)

  • Also fun fact, this below will make that a relay can opt-out of being an HSDir once it's accounting has reached the max or it's bandwitdh has changed dynamically (because of router_should_be_directory_server()):
+  return (router->wants_to_be_hs_dir &&
+          router->supports_tunnelled_dir_requests &&
...

which could lead to client reachability issue (I think it won't be severe) if an HSDir can come and go at each consensus in a 24 hour period. _BUT_ it could also explain why we are seeing HSDir responding NOT_FOUND when they are suppose to have the descriptor because Accounting max was reached in that time period?...

Unfortunately, we need AccountingMax to work reliably, even if it causes reachability issues for hidden services. But we could fix this behaviour in a different ticket so the HSDir flag is more stable. But is there anything we could change here to make it more stable?

The rest lgtm;

I agree, it's time!

comment:85 Changed 4 years ago by nickm

Keywords: TorCoreTeam201512 201511-deferred added; TorCoreTeam201511 removed

Bulk-move uncompleted items to december. :/

comment:86 Changed 4 years ago by dgoulet

Status: needs_revisionneeds_review

I've fixed the latest concerns: feature12538_028_01 (based on nickm's branch feature12538_rebased_7)

Two commits on top, one fixup and one that is fixing a line flagged by teor in a commit on master. This is also rebased on latest master for 028 merge.

Note for the HS subsystem, the new DirCache option is now the way to disable the relay to be an HSDir (which was before this patch not having a DirPort).

comment:87 Changed 4 years ago by teor

Looks good, let's get this merged!

Two comments that I don't think should delay this merge:

  • This will break a unit test fix I just made in #4483, but I don't know which order they'll merge, so I'm happy to fix this after they both merge.
  • This will exacerbate #17848 (tor doesn't check for tunneled directory downloads when checking if relays are busy), but we can work on that later.

comment:88 in reply to:  87 Changed 4 years ago by teor

Replying to teor:

Looks good, let's get this merged!

One more thing:

In router_pick_directory_server_impl, we check !status->dir_port and skip the server. This should change to the routerstatus_t flag for directory servers introduced by this patch.

comment:89 Changed 4 years ago by nickm

teor says:

In router_pick_directory_server_impl, we check !status->dir_port and skip the server. This should change to the routerstatus_t flag for directory servers introduced by this patch.

I don't see that code there any more; what am I missing?

comment:90 in reply to:  89 Changed 4 years ago by teor

Replying to nickm:

teor says:

In router_pick_directory_server_impl, we check !status->dir_port and skip the server. This should change to the routerstatus_t flag for directory servers introduced by this patch.

I don't see that code there any more; what am I missing?

Oops, I must have been looking at the wrong branch.

comment:91 Changed 4 years ago by nickm

okay, then I'm good to squash and merge. I'm planning to wait a little while, though, for people to try my #17752 fix, before I merge anything of appreciable size.

comment:92 Changed 4 years ago by nickm

After squashing and merging to master, these tests failed:

nodelist/node_is_dir
config/directory_fetch
routerlist/pick_directory_server_impl

I took a shot at fixing them, but I'd like somebody else to sanity-check whether my changes make sense in light of the test code.

Please see the 3 latest commits on my branch "bug12538_merged":

f0a4282 fix routerlist/pick_directory_server_impl in light of 12538
f5f35e9 Fix config/directory_fetch after 12538 merge
a6c9fcc Fix nodelist/node_is_dir test wrt 12538.

They're all very short.

comment:93 in reply to:  92 Changed 4 years ago by teor

Replying to nickm:

After squashing and merging to master, these tests failed:

nodelist/node_is_dir
config/directory_fetch
routerlist/pick_directory_server_impl

I took a shot at fixing them, but I'd like somebody else to sanity-check whether my changes make sense in light of the test code.

Please see the 3 latest commits on my branch "bug12538_merged":

f0a4282 fix routerlist/pick_directory_server_impl in light of 12538
f5f35e9 Fix config/directory_fetch after 12538 merge
a6c9fcc Fix nodelist/node_is_dir test wrt 12538.

They're all very short.

f0a4282 I can't understand why we're inverting the sense of the tests here. But I'm not familiar enough with the code or the tests to make a better suggestion.

f5f35e9 is almost exactly what I expected to have to do once this merged.

a6c9fcc looks plausible.

comment:94 Changed 4 years ago by nickm

wrt f0a4282, I don't think we're changing the sense of the test -- rather, we're changing the example-data that the test uses in order to make clients behave right. Clients used to believe anything was a directory if it had a dirport -- now they require the is_v2_dir flag. Also, when we fixed the bug that let us pick directory guards that weren't guards, I think that did something to mess up the other test, which was testing for bad behavior. :/ But I don't know why that didn't show up in master. Maybe I/we should investigate more?

Does that all seem right?

comment:95 in reply to:  94 Changed 4 years ago by teor

Replying to nickm:

wrt f0a4282, I don't think we're changing the sense of the test -- rather, we're changing the example-data that the test uses in order to make clients behave right. Clients used to believe anything was a directory if it had a dirport -- now they require the is_v2_dir flag. Also, when we fixed the bug that let us pick directory guards that weren't guards, I think that did something to mess up the other test, which was testing for bad behavior. :/ But I don't know why that didn't show up in master. Maybe I/we should investigate more?

There is a unit test that makes sure we only choose guards with the guard flag, but it only applies to OR guards, not directory guards. I think that's why it didn't show up, as the bug was about directory guards.

Does that all seem right?

That seems OK.

comment:96 Changed 4 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Okay, merged!

comment:97 Changed 4 years ago by cypherpunks

Resolution: implemented
Status: closedreopened

Jenkins has two complaints.

../src/test/test_dir.c:29:29: fatal error: test_dir_common.h: No such file or directory
 #include "test_dir_common.h"
                             ^

This seems to be a header missing from the build configuration.

src/or/config.c:4102:28: error: implicit conversion loses integer precision: 'const uint64_t' (aka 'const unsigned long long') to 'size_t' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32]
      total_mem = options->MaxMemInQueues;
                ~ ~~~~~~~~~^~~~~~~~~~~~~~

After looking at the have_enough_mem_for_dircache function where this error occurred i have some questions.

  • Why overwrite the total_mem parameter instead of using a separate variable? It makes it a bit harder to read.
  • Why does the get_total_system_memory function have a size_t parameter instead of an uint64_t? Using an uint64_t would simplify the function and, after changing the relevant type signatures, would solve the error.
    • Also, what is up with the yoda statements in get_total_system_memory and get_total_system_memory_impl?

comment:98 Changed 4 years ago by nickm

Resolution: fixed
Status: reopenedclosed

Fixed the compilation problems and added a comment. Not sure what you mean about the comments, or which comments you mean.

comment:99 in reply to:  98 ; Changed 4 years ago by cypherpunks

Replying to nickm:

Fixed the compilation problems and added a comment.

Thanks

Not sure what you mean about the comments, or which comments you mean.

I'll rephrase to add clarity.

The have_enough_mem_for_dircache function modifies the total_mem parameter internally instead of using a separate variable throughout the function. Your added comment makes my comment on this obsolete because of the possibility that that piece of code will be removed. (A quick look at how the options_validate function assigns MaxMemInQueues suggests that it already does the necessary checks by calling compute_real_max_mem_in_queues making the checks in have_enough_mem_for_dircache redundant).

My second point was about the get_total_system_memory function having a pointer to size_t parameter. Its internals use get_total_system_memory_impl which returns an uint64_t. The result is clamped to fit within the size_t range. My suggestion is to have get_total_system_memory have a pointer to uint64_t parameter so the result does not need to be clamped down. It would also remove the need for clamping in have_enough_mem_for_dircache.

My last sub-point was about statements such as if (0 == m) { in get_total_system_memory and if (-1 == (fd = tor_open_cloexec("/proc/meminfo",O_RDONLY,0))) in get_total_system_memory_impl. These are called Yoda conditions and have some disadvantages as described in the Criticism section.

If it's still not clear, I'm willing to submit patches for these points.

comment:100 in reply to:  99 Changed 3 years ago by nickm

Replying to cypherpunks:

Replying to nickm:

Fixed the compilation problems and added a comment.

Thanks

Not sure what you mean about the comments, or which comments you mean.

I'll rephrase to add clarity.

(Thanks, and sorry for the delay! It's time off for me.)

The have_enough_mem_for_dircache function modifies the total_mem parameter internally instead of using a separate variable throughout the function. Your added comment makes my comment on this obsolete because of the possibility that that piece of code will be removed. (A quick look at how the options_validate function assigns MaxMemInQueues suggests that it already does the necessary checks by calling compute_real_max_mem_in_queues making the checks in have_enough_mem_for_dircache redundant).

My second point was about the get_total_system_memory function having a pointer to size_t parameter. Its internals use get_total_system_memory_impl which returns an uint64_t. The result is clamped to fit within the size_t range. My suggestion is to have get_total_system_memory have a pointer to uint64_t parameter so the result does not need to be clamped down. It would also remove the need for clamping in have_enough_mem_for_dircache.

Hm. Seems like a good idea, but possibly another ticket.

My last sub-point was about statements such as if (0 == m) { in get_total_system_memory and if (-1 == (fd = tor_open_cloexec("/proc/meminfo",O_RDONLY,0))) in get_total_system_memory_impl. These are called Yoda conditions and have some disadvantages as described in the Criticism section.

Hm. I don't find the arguments on that page persuasive; I'm happy to let C programmers write 0 == m or NULL == x or whatever, if that's what they're used to.

Note: See TracTickets for help on using tickets.