tor spends a lot of time in malloc/free

changed milestone to %Tor: unspecified

added 034-removed-20180328 034-triage-20180328 component::core tor/tor milestone::Tor: unspecified priority::medium severity::normal status::new type::enhancement labels

See #23722 (moved) as well.

compression pegs the CPU (of course), but consensus updates are pretty uncommon. malloc and free waste time I believe for every single packet forwarded, probably mainly because AFAICT there is no fast path that avoids memory allocation (or epoll waiting) in the case where the outgoing channel is free.

Do you have any stack results on who the main callers of malloc() and free() are in your code? Do you have a more complete profile you can share?

Trac:
Milestone: N/A to Tor: 0.3.3.x-final

I tried heaptrack, which seems pretty useful, but I found that there are no obvious culprits for either number of allocations or peak memory usage. it looks like a lot of time is spent in memmove through connection_or_process_cells_from_inbuf though, and it seems plausible that that mallocs buffers. maybe it's possible to avoid those if the outgoing channel is unblocked? might be complicated... I can do another heaptrack profile if you want though.

If there's anything you can share from heaptrack, that would sure be helpful

Trac:
tor.heaptrack.flamegraph.sanitized.svg.gz

tor middle relay heaptrack allocations flamegraph

Circling around to this ticket again now that 0.3.3 is feature-frozen.

The biggest offender seems to have been channel_rsa_id_group_set_badness, which should have been fixed a lot bug #24119 (moved). So that's good.

There are some things I'm surprised to see in the profile:

onion_skin_server_handshake (19.22%)
protocol_list_supports_protocol (10.95%)
outbuf_table_add (3.5%)

Let's have a look and see how much we can do there.

Another improvement on this area would be to see about making cell-handing inside Tor involve fewer copies; but that might be better handled as part of our Rust work.

I think protocol_list_supports_protocol fixed by rust protover. I think dgoulet is working on outbuf_table_add. I looked at cell handling but I think a lot of that work overlaps with https://lists.torproject.org/pipermail/tor-dev/2017-October/012536.html.

excessive malloc/free could also be mitigated by switching to jemalloc on Linux #20424 (moved)-ish