In launch_descriptor_downloads() if we are missing between 1 to 15 mds (MAX_DL_TO_DELAY), Tor will delay the md download for 10 mins (or until we are missing >= 16 mds). See TestingClientMaxIntervalWithoutRequest for the 10 min delay.
This is bad when comboed with #21969 (moved) since if one of the 15 missing mds is for one of your top two primary guards, tor will hang for 10 mins with missing descriptor for primary guards before bootstrapping.
The probability of this happening is small (about 0.004 I think for 6.4k mds in total) but given the amount of clients we have this is bound to happen for some of them.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Perhaps the fix here is to disable this delay functionality if the missing descriptors are delaying bootstrap (e.g. if missing descs are primary guard descs)?
Perhaps the fix here is to disable this delay functionality if the missing descriptors are delaying bootstrap (e.g. if missing descs are primary guard descs)?
Before we do that, let's consider the purpose of waiting here: we don't want anybody to be able to force us to use a secondary guard by denying us descriptors for any primary guards.
Perhaps the fix here is to disable this delay functionality if the missing descriptors are delaying bootstrap (e.g. if missing descs are primary guard descs)?
Before we do that, let's consider the purpose of waiting here: we don't want anybody to be able to force us to use a secondary guard by denying us descriptors for any primary guards.
Hm. I meant to suggest we disable the TestingClientMaxIntervalWithoutRequest delay functionality if we are missing descs of primary guards. Aka not wait 10 mins before fetching the primary guard descriptors.
I think perhaps you understood that I suggested we disable the "waiting for primary guard descriptors" functionality, which was certainly not my intention. Sorry for the non-clear wording above.
Hm. I meant to suggest we disable the TestingClientMaxIntervalWithoutRequest delay functionality if we are missing descs of primary guards. Aka not wait 10 mins before fetching the primary guard descriptors.
Ah! You mean something like
diff --git a/src/or/routerlist.c b/src/or/routerlist.cindex f04e2ca160331b..f587bfadcef1a1 100644--- a/src/or/routerlist.c+++ b/src/or/routerlist.c@@ -5010,6 +5010,11 @@ launch_descriptor_downloads(int purpose, log_debug(LD_DIR, "There are enough downloadable %ss to launch requests.", descname);+ } else if (! router_have_minimum_dir_info()) {+ log_debug(LD_DIR,+ "We are only missing %d %ss, but we'll fetch anyway, since "+ "we don't yet have enough directory info.",+ n_downloadable, descname); } else { /* should delay */
Perhaps we can also put the whole if (!directory_fetches_dir_info_early(options)) block into a function called static int should_delay_descriptor_downloads() to make it a bit cleaner too.
Sebastian points out that this MAX_DL_TO_DELAY delay functionality might also be performed by dirservers which makes the problem worse, since it means that dirservers will wait even longer before they get to 100% mds.
Sebastian points out that this MAX_DL_TO_DELAY delay functionality might also be performed by dirservers which makes the problem worse, since it means that dirservers will wait even longer before they get to 100% mds.
That n_downloadable >= MAX_DL_TO_DELAY should only be checked if !directory_fetches_dir_info_early().
And in theory (there might be bugs), dir mirrors fetch dir info early.