Simplify some costly Tor functions (by profile)
mo attached a few profiles to #7572 (moved), from a fast host with aesni The top functions overall are are:
44222784 8.0379 libcrypto.so.1.0.0 libcrypto.so.1.0.0 sha1_block_data_order_avx
39059344 7.0994 nf_conntrack nf_conntrack /nf_conntrack
35552271 6.4620 libcrypto.so.1.0.0 libcrypto.so.1.0.0 bn_sqr4x_mont
31025085 5.6391 libcrypto.so.1.0.0 libcrypto.so.1.0.0 aesni_cbc_sha1_enc_avx
17425081 3.1672 tor tor circuit_get_by_rend_token_and_purpose.constprop.11
17185351 3.1236 libc-2.15.so libc-2.15.so /lib/x86_64-linux-gnu/libc-2.15.so
15106522 2.7458 tor tor circuit_unlink_all_from_channel
13422467 2.4397 libevent-2.0.so.5.1.4 libevent-2.0.so.5.1.4 /usr/lib/libevent-2.0.so.5.1.4
9045536 1.6441 libcrypto.so.1.0.0 libcrypto.so.1.0.0 bn_mul4x_mont_gather5
8295787 1.5078 libcrypto.so.1.0.0 libcrypto.so.1.0.0 aesni_ctr32_encrypt_blocks
7454822 1.3550 e1000e e1000e /e1000e
6667075 1.2118 tor tor circuitmux_find_map_entry
And the top functions, considering Tor only, are:
17545411 13.9182 circuitlist.c:1116 tor circuit_get_by_rend_token_and_purpose.constprop.11
15232931 12.0838 circuitlist.c:1028 tor circuit_unlink_all_from_channel
6729424 5.3382 circuitmux.c:698 tor circuitmux_find_map_entry
3802661 3.0165 buffers.c:2468 tor assert_buf_ok
3344356 2.6530 circuitlist.c:980 tor circuit_get_by_circid_channel
3217962 2.5527 relay.c:2094 tor channel_flush_from_first_active_circuit
2927776 2.3225 buffers.c:520 tor buf_datalen
2367670 1.8782 connection.c:2512 tor connection_bucket_refill
2210529 1.7535 connection.c:2824 tor connection_handle_read
2200952 1.7459 relay.c:168 tor circuit_receive_relay_cell
2095244 1.6621 container.c:167 tor smartlist_isin
1618112 1.2836 crypto.c:1364 tor crypto_cipher_crypt_inplace
The crypto's about what we'd expect, but it could be helpful to see if we can optimize some of the remaining Tor things. In particular:
- circuit_get_by_rend_token_and_purpose should have a map backing it; it appears that this might be one of those costly linear searches.
- I wonder if circuit_unlink_all_from_channel could be taught to walk a list of circuits on the channel rather than the list of all circuits.
- I don't see a good way to make circuitmux_find_map_entry faster without tweaking data structures.
- assert_buf_ok(), we can call less.
- circuit_get_by_circid_channel would also need data structure impreovements.
- buf_datalen() should just be made into an inline function.