Dec 11 13:08:59.000 [notice] We'd like to launch a circuit to handle aconnection, but we already have 32 general-purpose client circuitspending. Waiting until some finish. [268 similar message(s) suppressedin last 600 seconds]
His network seems to be flaky so this might be the result of crappy network. However, we might want to investigate a bit further, since that message was supressed 250 times.
I can imagine situations in very busy hidden services, where 32 clients try to access them at the same time, which means that it tries to establish 32 circuits at the same time which might cause this problem.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I have a feeling that it's linked with this error also (see first log in the email).
Dec 12 18:10:27.000 [notice] Your Guard SECxFreeBSD64($D7DB8E82604F806766FC3F80213CF719A0481D0B) is failing more circuitsthan usual. Most likely this means the Tor network is overloaded.Success counts are 199/285. Use counts are 101/101. 253 circuitscompleted, 0 were unusable, 54 collapsed, and 15 timed out. Forreference, your timeout cutoff is 60 seconds.
If the guard if having trouble to keep up with the traffic, that could explain why the HS can be stalled on circuits? Though the 600 seconds time is a bit worrying, 10 minutes seems much for the Guard to fail that long?...
I am using Donncha's OnionBalance to scrape the descriptors of 72x Tor Onion Services (spread over 6x machines) for a series of massive bandwidth experiments.
I, too, am getting this message, on a separate, standalone machine/daemon:
Dec 19 12:32:09.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [1675 similar message(s) suppressed in last 600 seconds]
Dec 19 12:42:10.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [1375 similar message(s) suppressed in last 600 seconds]
Dec 19 12:52:10.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [1256 similar message(s) suppressed in last 600 seconds]
Is there a number I can bump, please?
Trac: Sponsor: N/AtoN/A Severity: N/Ato Normal Reviewer: N/AtoN/A
I am using Donncha's OnionBalance to scrape the descriptors of 72x Tor Onion Services (spread over 6x machines) for a series of massive bandwidth experiments.
I, too, am getting this message, on a separate, standalone machine/daemon:
Dec 19 12:32:09.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [1675 similar message(s) suppressed in last 600 seconds]
Dec 19 12:42:10.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [1375 similar message(s) suppressed in last 600 seconds]
Dec 19 12:52:10.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [1256 similar message(s) suppressed in last 600 seconds]
Is there a number I can bump, please?
Here is the code:
const int n_pending = count_pending_general_client_circuits(); /* Do we have too many pending circuits? */ if (n_pending >= options->MaxClientCircuitsPending) { static ratelim_t delay_limit = RATELIM_INIT(10*60); char *m; if ((m = rate_limit_log(&delay_limit, approx_time()))) { log_notice(LD_APP, "We'd like to launch a circuit to handle a " "connection, but we already have %d general-purpose client " "circuits pending. Waiting until some finish.%s", n_pending, m); tor_free(m); } return 0; }
You can try bumping MaxClientCircuitsPending from 32 to something bigger.
However, without understanding what these pending circuits are and why they are there, it's hard to fix the root cause of this issue. Perhaps with some tactical logging we can get more information about the nature of these circuits.
We should also consider whether we want to teach count_pending_general_client_circuits() to ignore CIRCUIT_STATE_GUARD_WAIT circuits as well, since post-prop271 we might have a few of those lying around and I'm not sure if we want to consider them pending.
"Dec 25 19:26:18.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [218775 similar message(s) suppressed in last 600 seconds]"
Service is running maybe 150 domains, not DDoSed, maybe 50-100 users, CPU usage up to 5% based on htop or 10% based on ps but it is decreasing. This issue was displayed after Tor service restart. Some of domains that worked fine before restart are not available (yet?).
This message is likely due to the network being overloaded. There's not much you can do about it, we're trying to fix it on the relay side over the next few weeks.
This kind of network overload is one reason people shouldn't increase MaxClientCircuitsPending.
Or oh hey, what about general-purpose circuits to upload new onion descriptors? We launch 6 or 8 of those at a time, and if there are several onion services being managed by this Tor... we can get to 32 right quick?
Now on 3.2.9 number of suppressed messages "we already have 32 general-purpose client circuits" is now twice as it was on 3.1.9.
Jan 21 10:27:42.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [82275 similar message(s) suppressed in last 600 seconds]
Jan 21 10:37:42.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [215959 similar message(s) suppressed in last 600 seconds]
Jan 21 10:47:53.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [173631 similar message(s) suppressed in last 600 seconds]
Jan 21 10:53:40.000 [warn] Giving up launching first hop of circuit to rendezvous point $9844B981A80B3E4B50897098E2D65167E6AEF127$9844B981A80B3E4B50 at 62.138.7.171 for service eb3w4t.....
Jan 21 10:53:43.000 [warn] Giving up launching first hop of circuit to rendezvous point $ECDC405E49183B2EAF579ACD42B443AEA2CF3729$ECDC405E49183B2EAF at 185.81.109.2 for service eb3w4t.....
Jan 21 10:53:48.000 [warn] Giving up launching first hop of circuit to rendezvous point $C6ED9929EBBD3FCDFF430A1D43F5053EE8250A9B$C6ED9929EBBD3FCDFF at 188.214.30.126 for service eb3w4t.....
Jan 21 10:57:52.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [85573 similar message(s) suppressed in last 600 seconds]
Jan 21 11:00:04.000 [warn] Hidden service g4e42twrg... exceeded launch limit with 10 intro points in the last 206 seconds. Intro circuit launches are limited to 10 per 300 seconds. [350 similar message(s) suppressed in last 300 seconds]
Jan 21 11:00:11.000 [warn] Couldn't relaunch rendezvous circuit to '$AF1D8F02C0949E9755C0DF9C6761FBBF7AAB62C2$AF1D8F02C0949E9755 at 178.62.33.87'.
Jan 21 11:06:01.000 [notice] Your network connection speed appears to have changed. Resetting timeout to 60s after 18 timeouts and 1000 buildtimes.
Jan 21 11:07:52.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [294348 similar message(s) suppressed in last 600 seconds]
Jan 21 11:16:54.000 [warn] Requested exit point '$9AF9554365A51E6CE0804C32C4C4DC513FBFEF4D' is not known. Closing.
Jan 21 11:16:54.000 [warn] Requested exit point '$9AFAD70A59C60A0CEB63E4344E429DB0415FE29C' is not known. Closing.
Jan 21 11:16:54.000 [warn] Requested exit point '$9B2298757C56305D875F24051461A177B542A286' is not known. Closing.
Jan 21 11:16:54.000 [warn] Requested exit point '$43B89E0565B1D628DACB862F99D85B95B43AEAB8' is not known. Closing.
......
Jan 21 11:17:52.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [442355 similar message(s) suppressed in last 600 seconds]
Jan 21 11:27:52.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [348069 similar message(s) suppressed in last 600 seconds]
Or oh hey, what about general-purpose circuits to upload new onion descriptors? We launch 6 or 8 of those at a time, and if there are several onion services being managed by this Tor... we can get to 32 right quick?
Yes that is a problem. v2 uses 6 HSDirs so at 6 configured HS, you reach 32 circuits quickly. v3 uses hsdir_spread_store which is currently 4 meaning 8 HSDirs for every service. You configure 4 services and boom 32 circuits are launched.
But bumping MaxClientCircuitsPending is not really a good idea just for services.
The thing is that once the services have bootstrapped that is descriptor uploaded, after that they will re-upload at random timings between each other. But that one time at startup, we need the service to upload in mass. And this is for tor to try as fast as possible to make the service reachable.
So could we either:
Allow a burst at service startup if you have num_services * num_hsdirs > MaxClientCircuitsPending. I say service startup because one could do 10 ADD_ONION at once ;).
Have a special limit just for HS like MaxHSCircuitsPending and bump it to something bigger than 32.
Leave everything like this and after a while, once tor will be able to launch circuits, the descriptor will get uploaded. The operator just needs to deal with the delay.
Or oh hey, what about general-purpose circuits to upload new onion descriptors? We launch 6 or 8 of those at a time, and if there are several onion services being managed by this Tor... we can get to 32 right quick?
Yes that is a problem. v2 uses 6 HSDirs so at 6 configured HS, you reach 32 circuits quickly. v3 uses hsdir_spread_store which is currently 4 meaning 8 HSDirs for every service. You configure 4 services and boom 32 circuits are launched.
But bumping MaxClientCircuitsPending is not really a good idea just for services.
The thing is that once the services have bootstrapped that is descriptor uploaded, after that they will re-upload at random timings between each other. But that one time at startup, we need the service to upload in mass. And this is for tor to try as fast as possible to make the service reachable.
So could we either:
Allow a burst at service startup if you have num_services * num_hsdirs > MaxClientCircuitsPending. I say service startup because one could do 10 ADD_ONION at once ;).
Have a special limit just for HS like MaxHSCircuitsPending and bump it to something bigger than 32.
Leave everything like this and after a while, once tor will be able to launch circuits, the descriptor will get uploaded. The operator just needs to deal with the delay.
I think what I would prefer here is for Tor to rate-limit itself when building onion service circuits. Especially so when it has multiple onion services, but maybe even when it has only a single one. So instead of building all its onion circuits (IPs + hsdir circs) at once, it waits a randomized time (around a second?) before building each one.
That will slightly delay the bootup of HSes, but not by too much, and it's better for the health of the network. Not sure if this will be a PITA to engineer tho. I'm not sure if this is isomorphic to your (3) idea above, but if it is ten the warning message is not useful since it's intended.
I think what I would prefer here is for Tor to rate-limit itself when building onion service circuits. Especially so when it has multiple onion services, but maybe even when it has only a single one. So instead of building all its onion circuits (IPs + hsdir circs) at once, it waits a randomized time (around a second?) before building each one.
The problem with adding a random delay at startup is that it won't solve the "32 general purpose circuits are pending" issue. If those circuits a really stuck being built, the delay won't help much as they will all end up queued up and stuck at some point.
A wise rate limit is probably what we want so we never go above that 32 limit and thus no need for a cryptic warning that makes it that you just can't do anything but wait or/and panic.
Now ok, looking a bit closely to the logs above, notice:
Jan 21 10:53:40.000 [warn] Giving up launching first hop of circuit to rendezvous point $9844B981A80B3E4B50897098E2D65167E6AEF127~$9844B981A80B3E4B50 at 62.138.7.171 for service eb3w4t.....
The above is a service trying to open a circuit to a rendezvous point... So I think the bigger issue here is that we have 32 circuits stuck in a non OPEN state and just never expire for some reasons? Or they do but we open 32 new ones very quickly and they get stalled again in a non OPEN state.
My money is on the later due to the amount of suppressed log (see below). This looks to me like a service getting a ridiculous amount of rendezvous requests, the Guard is chocking so we keep reaching that 32 limit.
Jan 21 10:37:42.000 [notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [215959 similar message(s) suppressed in last 600 seconds]
Quick skim, I don't see anything in circuit_expire_building() that would make a circuit be ignored in the GUARD_WAIT state so they should in theory expire even though they are waiting for the guard to be usable?
I'm getting indeed more and more convinced that we need a rate limit both client and service side. That is a bit like we do with DoS mitigation now (#24902 (moved)) which is some per second rate with burst. Busy hidden service will suffer reachability but at least won't break the network. The point is that DoS mitigation will prevent as much as possible a client DDoS towards a single service and that service will by itself prevent to DDoS the network.