Opened 23 months ago

Last modified 16 months ago

#22210 new defect

circuit_is_acceptable is slow due to IP and fingerprint parsing

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.8.12
Severity: Normal Keywords: tor-hs, 032-unreached
Cc: Actual Points:
Parent ID: #15251 Points: 1
Reviewer: Sponsor:

Description

(I received this bug report from special. Please feel free to contact me (teor) with questions.)

Using Tor version 0.3.1.0-alpha-dev (git-04f1ddaa2a33194b), also applies to at least all 0.2.8.x and above.

Testcase is a ricochet client with a very large number of contacts. Effectively, it’s a client that attempts to build circuits to >100 <500 hidden services almost simultaneously, and retries over time with increasing delays.

This tor client will constantly use 100% of a CPU core for >15 minutes after start. There’s a variety of buggy behavior going on, and a lot of circuits, so it’s not entirely surprising to see some high load. More interesting is where that load actually comes from.

The vast majority of CPU time — 2.1 minutes out of a 3 minute trace — is spent under circuit_is_acceptable, as called from connection_ap_attach_pending (via circuit_get_open_circ_or_launch and circuit_get_best). The breakdown from there is:

2.10 min 100.0% 6.26 s circuit_is_acceptable

58.93 s 46.7% 1.05 s tor_addr_parse
56.34 s 44.7% 1.84 s tor_inet_pton
52.53 s 41.7% 1.68 s tor_inet_aton
50.71 s 40.2% 914.00 ms tor_sscanf
49.74 s 39.4% 13.78 s tor_vsscanf
29.30 s 23.2% 20.11 s scan_unsigned
5.29 s 4.2% 5.29 s TOR_ISDIGIT
1.13 s 0.8% 1.13 s digit_to_num

43.76 s 34.7% 1.30 s connection_ap_can_use_exit
39.51 s 31.3% 574.00 ms node_get_by_nickname
38.93 s 30.9% 764.00 ms node_get_by_hex_id
23.82 s 18.9% 869.00 ms hex_digest_nickname_decode
21.61 s 17.1% 10.95 s base16_decode
10.07 s 7.9% 10.07 s hex_decode_digit_
14.32 s 11.3% 156.00 ms node_get_by_id

16.39 s 13.0% 1.47 s build_state_get_exit_node
14.93 s 11.8% 119.00 ms node_get_by_id

That is super hard to read in this format, but essentially a majority of time is spent on:

  1. Parsing IP addresses from strings out of node descriptors
  2. Looking up nodes by ID
  3. Decoding hex digests to get node IDs

This implies to me that these could be cached in a much more useful way in the node struct, the algorithmic complexity of some of the circuit selection functions is out of control, and I’m also guessing that the circuit attach functions are being called way too often.

Without circuit_is_acceptable being considered, the hottest path is under choose_good_middle_server for node selection, which seems much more reasonable.

The profile quoted here had —enable-expensive-hardening (including ASAN) and slightly older tor, but I’ve rerun with 04f1ddaa and without hardening with the same conclusions, including 2 out of 3 minutes of CPU time being spent under circuit_is_acceptable.

Child Tickets

Change History (6)

comment:1 in reply to:  description ; Changed 23 months ago by arma

Replying to teor:

2.10 min 100.0% 6.26 s circuit_is_acceptable
58.93 s 46.7% 1.05 s tor_addr_parse

  1. Parsing IP addresses from strings out of node descriptors

To be precise, it looks like this one is from

    const int family = tor_addr_parse(&addr, conn->socks_request->address);

which is actually parsing IP addresses from strings in the socks request, not the node descriptor.

For how that code came to be that way, see commit 2b22c0ae from #7582.

comment:2 in reply to:  1 Changed 23 months ago by arma

Replying to arma:

    const int family = tor_addr_parse(&addr, conn->socks_request->address);

It looks like we could pretty easily only do that tor_addr_parse call in the two places below it where we would actually use the value of family -- that is,

  • (a) for a one-hop circuit where we've set chosen_exit_name and we don't have a digest for the exit, which should be rare if it ever happens, and
  • (b) when origin_circ->prepend_policy is set, meaning we've already received a surprising exit policy failure on that circuit and called adjust_exit_policy_from_exitpolicy_failure() on it.

comment:3 in reply to:  description Changed 23 months ago by arma

Replying to teor:

43.76 s 34.7% 1.30 s connection_ap_can_use_exit
39.51 s 31.3% 574.00 ms node_get_by_nickname
38.93 s 30.9% 764.00 ms node_get_by_hex_id

This case happens because they're hidden service descriptor lookups, so they're general (exit) streams yet they have conn->chosen_exit_name set.

That's a surprising amount of time in hex_digest_nickname_decode(), but I guess with all the strlcpy's, etc, that it does, it shouldn't be that surprising.

You're right that it's inefficient here, since the logic is "consider a given circuit, and then reparse what exit node this conn says it's for, and then see if that circuit ends at that exit node", and in theory we don't need to keep reparsing. I would be tempted to try to take a step farther back though and figure out if there's a way to call the broader "how's it going, are any streams ready to be attached to circuits yet" function more sparingly.

comment:4 Changed 22 months ago by teor

I added a hs-intro chutney network in #22599 that might be a test for this.
It currently doesn't verify (see #22600), so it might be a chutney test for this issue.

comment:5 Changed 18 months ago by nickm

Keywords: 032-unreached added
Milestone: Tor: 0.3.2.x-finalTor: unspecified

Mark a large number of tickets that I do not think we will do for 0.3.2.

comment:6 Changed 16 months ago by teor

Parent ID: #15251

This probably blocks #15251

Note: See TracTickets for help on using tickets.