on my Fast Guard, Tor spends about 25% (!) of its user CPU time in _int_malloc and _int_free. I tried switching to jemalloc, but I just got significantly worse memory fragmentation.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
compression pegs the CPU (of course), but consensus updates are pretty uncommon. malloc and free waste time I believe for every single packet forwarded, probably mainly because AFAICT there is no fast path that avoids memory allocation (or epoll waiting) in the case where the outgoing channel is free.
I tried heaptrack, which seems pretty useful, but I found that there are no obvious culprits for either number of allocations or peak memory usage. it looks like a lot of time is spent in memmove through connection_or_process_cells_from_inbuf though, and it seems plausible that that mallocs buffers. maybe it's possible to avoid those if the outgoing channel is unblocked? might be complicated... I can do another heaptrack profile if you want though.
Circling around to this ticket again now that 0.3.3 is feature-frozen.
The biggest offender seems to have been channel_rsa_id_group_set_badness, which should have been fixed a lot bug #24119 (moved). So that's good.
There are some things I'm surprised to see in the profile:
onion_skin_server_handshake (19.22%)
protocol_list_supports_protocol (10.95%)
outbuf_table_add (3.5%)
Let's have a look and see how much we can do there.
Another improvement on this area would be to see about making cell-handing inside Tor involve fewer copies; but that might be better handled as part of our Rust work.