Tor keeps opening circuits while waiting for bridge descriptors
From someone in #tor-dev (Tovok7
), they migrated an HS from 029 to 034 and some feature broke.
Tor, configured with an HS, starts fine, bootstraps and all is good.
Then, through the control port, setconf UseBridge=1 Bridge=...
created these logs:
2018-10-12 16:18:54.440 I/TorPlugin: Enabling network, using bridges
2018-10-12 16:18:54.456 I/TorPlugin: NOTICE Switching to guard context "bridges" (was using "default")
2018-10-12 16:18:54.608 I/TorPlugin: NOTICE Delaying directory fetches: No running bridges
2018-10-12 16:18:55.031 I/TorPlugin: WARN Error launching circuit to node [scrubbed] for service [scrubbed].
2018-10-12 16:18:55.032 I/TorPlugin: WARN Error launching circuit to node [scrubbed] for service [scrubbed].
2018-10-12 16:18:55.616 I/TorPlugin: OR connection LAUNCHED $02069A3C5362476936B62BA6F5ACC41ABD573A9B
2018-10-12 16:18:55.616 I/TorPlugin: OR connection LAUNCHED $5A2D2F4158D0453E00C7C176978D3F41D69C45DB
2018-10-12 16:18:55.616 I/TorPlugin: OR connection LAUNCHED $B31A7DAD9AACEDDB9915A16617BB8F06BA429D6B
2018-10-12 16:18:55.642 I/TorPlugin: WARN Hidden service [scrubbed] exceeded launch limit with 13 intro points in the last 13 seconds. Intro circuit launches are limited to 10 per 300 seconds.
2018-10-12 16:18:55.642 I/TorPlugin: WARN Service configured in [EPHEMERAL]:
2018-10-12 16:18:55.642 I/TorPlugin: WARN Intro point 0 at [scrubbed]: no circuit
2018-10-12 16:18:55.643 I/TorPlugin: WARN Intro point 1 at [scrubbed]: no circuit
2018-10-12 16:18:55.643 I/TorPlugin: WARN Intro point 2 at [scrubbed]: no circuit
2018-10-12 16:18:55.643 I/TorPlugin: WARN Intro point 3 at [scrubbed]: no circuit
2018-10-12 16:18:55.643 I/TorPlugin: WARN Intro point 4 at [scrubbed]: no circuit
2018-10-12 16:18:56.652 I/TorPlugin: OR connection CONNECTED $02069A3C5362476936B62BA6F5ACC41ABD573A9B
2018-10-12 16:18:57.013 I/TorPlugin: NOTICE new bridge descriptor 'pointingRespighi' (fresh): [scrubbed]
2018-10-12 16:18:57.014 I/TorPlugin: NOTICE Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6457/6457).
2018-10-12 16:18:57.080 I/TorPlugin: OR connection CONNECTED $5A2D2F4158D0453E00C7C176978D3F41D69C45DB
2018-10-12 16:18:57.184 I/TorPlugin: OR connection CONNECTED $B31A7DAD9AACEDDB9915A16617BB8F06BA429D6B
2018-10-12 16:18:57.586 I/TorPlugin: NOTICE new bridge descriptor 'AlliumGermanicus' (fresh): [scrubbed]
2018-10-12 16:18:57.641 I/TorPlugin: NOTICE Bridge 'nonLinearGeometry' has both an IPv4 and an IPv6 address. Will prefer using its IPv4 address ([scrubbed]) based on the configured Bridge address.
2018-10-12 16:18:57.641 I/TorPlugin: NOTICE new bridge descriptor 'nonLinearGeometry' (fresh): [scrubbed]
2018-10-12 16:18:57.641 I/TorPlugin: NOTICE We now have enough directory information to build circuits.
2018-10-12 16:19:00.201 I/TorPlugin: NOTICE Tor has successfully opened a circuit. Looks like client functionality is working.
It appears that the HS tried to open intro points even though tor didn't have bridge descriptors (guard state got switched).
The HS subsystem is safeguarded by this check (for circuit events):
if (router_have_consensus_path() == CONSENSUS_PATH_UNKNOWN ||
!have_completed_a_circuit()) {
return;
}
In other words, if we can't open circuits, tor will never proceed with HS service circuits.
The main theory, discussed with armadev, can be deduced with the three first log line:
2018-10-12 16:18:54.456 I/TorPlugin: NOTICE Switching to guard context "bridges" (was using "default")
2018-10-12 16:18:54.608 I/TorPlugin: NOTICE Delaying directory fetches: No running bridges
2018-10-12 16:18:55.031 I/TorPlugin: WARN Error launching circuit to node [scrubbed] for service [scrubbed].
First line: Guard context switched to bridges. All is good.
Second line: router_have_minimum_dir_info()
is called from, actually wherever... It is used quite often in many places including our mainloop. The point is that within that function, we do look at should_delay_dir_fetches()
which is the one creating that notice. However, because 1 was returned, we never went to update_router_have_minimum_dir_info()
which would have mark that we can't complete circuits (with note_that_we_maybe_cant_complete_circuits()
).
Third line: Circuit launch failure.
Once the guard context was switched, all circuits were marked as unusable (normal) so the HS service has to rebuild all its intro points but the have_completed_a_circuit()
was still returning true.
Whatever is the cause, there is a clear issue that when we switch guard context, we should always stop circuit creation until the guard state is usable.