hs-v3: Client can re-pick bad intro points
Ok this one took me a while to figure out!
This is sorta related to #25882 (moved) as it is a bug that makes the client retry constantly the same intro point even though it was flagged as having an error.
Problem lies in client_get_random_intro()
which randomly selects an intro point from the descriptor. Sparing the detail, this is where it goes wrong:
ip = smartlist_get(usable_ips, idx);
smartlist_del(usable_ips, idx);
... and then we use ip
to check if usable and if yes, we connect to it.
We can't del()
the value from the list until we are done with the ip
object. The smartlist_get()
returns a pointer to location in the list so if we change the list order right after acquiring the reference, we are accessing bad things, possibly junk.
So basically, junk can be used, the wrong IP can be used even though it would not pass the intro_point_is_usable()
if it was correct pointer.
I was able to find this using a pathological case where the HS pins its intro point to 3 specific nodes. So, even upon a restart or desc rotation, the same IPs are re-used but with different auth key.
If a client had already connected to that service and thus had those IPs in its failure cache, it will fail to eternity to connect to the service because it basically never realize it needs to refetch a new descriptor.
Even though there is a catch all with hs_client_any_intro_points_usable()
before extending to an intro point, the problem lies with the above which can make the code NEVER pick a certain intro point so the catch all always validate to true since there is one intro point in the list that was never tried.
This is very bad for reachability, network load, but also simple "code safety". I strongly recommend backport to our maintained version >= 032.
Fortunately for us, the fix is trivial, we should only del()
when done with the IP object.