I've been monitoring the entrynodes.c logs of my tor client for the past 3 days. I've been running my Tor with NumEntryGuards=1 for those 3 days, and everything seems to be working reasonably well.
However, yesterday I noticed that my Tor skipped my main guard, and started connecting to the second one in the list.
This happened like this:
My main guard is not a DirCache. So every now and then, Tor connects to the second guard in my guard list (my dirguard) to fetch directory documents. This means that both guards are usually marked as 'up'.
I had a very short network downtime (only a few seconds), so Tor could not connect to my main guard. Tor then tried to connect to the next 'up' guard node in my list, which is my dirguard. The network was up by that time, so Tor managed to connect to my dirguard, which became my main guard node for that session.
Since my dirguard was not a freshly added guard node, it didn't trigger the first_contact behavior of entry_guard_register_connect_status(), which would have fixed the guard skip (because all the previous guard nodes would have been retried).
We probably need to switch NumEntryGuards to 1 and pump up NumDirectoryGuards to 3, to follow Nick's suggestion. At the same time we need to start thinking of the ideas from here:
https://lists.torproject.org/pipermail/tor-dev/2014-June/006944.html
in case we can eventually bring the number of directory guards to 1 too.
Also, maybe we should add more code to strictly prefer our circuit guard to fetch directory documents, and only resort to the directory guards if our circuit guard is not a DirCache or if it doesn't have the descriptor we asked for.
We should also reconsider the way that compute_frac_paths_availableshould work in a 1-guard world. Do we want to keep the requirement as"don't build circuits until you have the microdescriptors necessary tobuild X% of the paths through the network?" Or should we change it torequire X% of all paths _through your guard_? Or raise the X?