Our choice of how many microdescs to fetch per directory request is really totally irrational since the network has grown so much since we chose those parameters.
Oct 11 13:32:44.779 [info] launch_descriptor_downloads(): Launching 50 requests for 4574 microdescs, 92 at a time
Seriously, we're making 50 requests and spreading them over 3 guards? I guess it's nice that we can get partial answers, but generally partial answers don't help us.
We should fetch many more at a time if it's going over a begindir channel.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Right. The original reasons to request no more than X objects per directory connection were:
Because sending a huge list of digests before we get any reply seemed questionable.
Because some HTTP proxies limited the URL length.
Incidentally, we limit the URL length implicitly with the MAX_HEADERS_LEN field.
I think we should increase the number of objects we're willing to request per tunneled directory connection. Additionally, we should avoid having multiple inflight simultaneous connections to download the same info from the same directory guard.
Would it help if we implemented /tor/micro/all which is mentioned in dir-spec section 5.2 "Downloading router descriptors or microdescriptors" but which is not implemented yet? Of course, then clients would download the bulk of microdescriptors from a single directory.
Do we have to include full digests in requests, or would it be sufficient to ask for the first few digest bytes? Assuming that clients would only accept descriptors matching locally stored full digests. For example, requests could contain only the first 4 (or 8) base64 chars representing the first 3 (or 6) digest bytes. Directories could accept any multiple of 4 base64 chars.
Mixing the two ideas, how about we add a way to ask for 1/2, 1/4, etc. of all microdescriptors in a single request? The request could be /tor/micro/all/<base64-prefix>/<bits>, so that /tor/micro/all/A/1 means all digests starting with 0 binary, /tor/micro/all/w/2 means all digests starting with 11 binary, etc. Clients could decide how many requests to send from the number of descriptors they need, which may change over time.
Each of these ideas requires us to upgrade authorities and caches before clients will be able to use them.
Those are cool ideas that will need new proposals. I'm thinking that the second two ideas are more or less equivalent. I'm opening a new ticket (#10871 (moved)) for this, since the issues are nontrivial, and I'd like this one to be about hunting for things we can do with the existing caches in order to make clients behave less nuttily.
I don't like those complicated conditional expressions. In particular, every time I see something like the following, I need to go look at a table of operator precedence. I think that in this case it is wrong: bitwise OR binds more tightly than ?, so the result of the conditional will always be "MICRODESC_DIRINFO"... but the problem here is probably that conditional expressions are Just Too Error-Prone.
Not all directory requests are tunneled over Tor. (Some connections from nodes to authorities in particular are not tunneled.) Should we perhaps retain this warning in the non-tunneled case?
In initiate_descriptor_downloads:
Changing from 8 to 4 seems wrong; you've left off the space for the terminating NUL character.
Perhaps we should refactor this thing to use a smartlist and smartlist_join_strings(), to avoid the possibility of further such mistakes in the code.
In get_min_requests:
This function probably needs a more descriptive name
Why is the minimum higher now? We want to be launching fewer requests. Two per guard still seems like a high minimum. This feels more like a maximum than a minimum
Overall:
This needs unit tests; let me know if you need any help writing them.
Are we doing anything to make sure that all of the requests don't all go to the same guard?
Are we doing anything to lower the maximum number of concurrent requests?
I don't like those complicated conditional expressions. In particular, every time I see something like the following, I need to go look at a table of operator precedence. I think that in this case it is wrong: bitwise OR binds more tightly than ?, so the result of the conditional will always be "MICRODESC_DIRINFO"... but the problem here is probably that conditional expressions are Just Too Error-Prone.
{{{
return V3_DIRINFO |
(resource && !strcmp(resource,"microdesc") ? MICRODESC_DIRINFO :
NO_DIRINFO);
}}}
Ummm, no, the parentheses guard against the situation you described. But I wholeheartedly agree that if you had to think about it, it's already too complicated.
In initiate_descriptor_downloads:
Changing from 8 to 4 seems wrong; you've left off the space for the terminating NUL character.
I only counted 4 characters. "d/" + ".z"
The NULL terminator is accounted for when you backtrack to remove the last separator, so is included in the +1 * n.
memcpy(cp-1, ".z", 3);
Why is the minimum higher now? We want to be launching fewer requests. Two per guard still seems like a high minimum. This feels more like a maximum than a minimum
Hmm, I thought that's what we had agreed on but I admit to being very tired during that talk.
Thanks for reviewing. I'll take your questions back to the drawing board.
Hm. That patch is less complicated than I feared, since it has a few other things in it that are beyond what this ticket needs. I think I should split it up into logical parts before we merge it.
I should check whether there's a faster mechanism here than connection_get_by_type_addr_port_purpose. If there is, we should use it. If not, we shouldn't necessarily add one unless this shows up in profiles.
In router_pick_directory_server_impl, I think n_busy needs to get reset to 0 before the "goto retry_without_exclude", or things get double-counted.
The "busy_out" field for router_pick_directory_server_impl() needs documentation.
I like the changes in initiate_descriptor_downloads()... but I'm not so sure about "digest_len += 1". What "NULL" are we talking about? 'digest' isn't NUL-terminated.
Gotta make sure it compiles with --enable-gcc-warnings on gcc and clang.
How long does it take us to retry the directory fetch after we decide not to launch one because the server is busy? It would be bad to re-check too fast, and it would be bad to re-check too slow. We'd better look at this.
Can max_dl_per_request return a higher number for TunnelDirConns for SERVERDESC too?
Does max_dl_per_request need to look at whether a particular connection will be tunneled, or does options->TunnelDirConns ensure that all these dir conns actually will be tunneled? I should check.
This is still on my plate, unless somebody else wants to pick up some/all of the above.
I think I should split it up into logical parts before we merge it.
I can rebase and split up the patch if it's helpful.
I like the changes in initiate_descriptor_downloads()... but I'm not so sure about "digest_len += 1". What "NULL" are we talking about? 'digest' isn't NUL-terminated.
The difference is that before you were building up one long string so when you base16_encode'd, you didn't want its NUL terminator. In the patch, we're making a smartlist of individual NUL terminated base16_encode'd strings, and then calling smartlist_join_strings on it, which will drop all the NULs.
Hope that makes sense. It's been 6 months since I've looked at this, so still paging it back in.
I think I should split it up into logical parts before we merge it.
I can rebase and split up the patch if it's helpful.
Up to you; I'd be happy to do it too.
I like the changes in initiate_descriptor_downloads()... but I'm not so sure about "digest_len += 1". What "NULL" are we talking about? 'digest' isn't NUL-terminated.
The difference is that before you were building up one long string so when you base16_encode'd, you didn't want its NUL terminator. In the patch, we're making a smartlist of individual NUL terminated base16_encode'd strings, and then calling smartlist_join_strings on it, which will drop all the NULs.
Hope that makes sense. It's been 6 months since I've looked at this, so still paging it back in.
But digest_len is the length of the input, not the length of the output, right? It gets used like this:
Fixed most of the above issues. In response to my questions:
I should check whether there's a faster mechanism here than connection_get_by_type_addr_port_purpose. If there is, we should use it. If not, we shouldn't necessarily add one unless this shows up in profiles.
There isn't a faster one. Let's leave it for now.
How long does it take us to retry the directory fetch after we decide not to launch one because the server is busy? It would be bad to re-check too fast, and it would be bad to re-check too slow. We'd better look at this.
It looks like we do a reasonable job here.
Can max_dl_per_request return a higher number for TunnelDirConns for SERVERDESC too?
I think we can. Making the fix.
Does max_dl_per_request need to look at whether a particular connection will be tunneled, or does options->TunnelDirConns ensure that all these dir conns actually will be tunneled? I should check.
This one is still up in the air. It looks like directory_command_should_use_begindir() can return 0 for several reasons other than having TunnelDirConns==0.