Opened 5 months ago

Last modified 3 weeks ago

#23777 new enhancement

tor spends a lot of time in malloc/free

Reported by: Hello71 Owned by:
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: starlight@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

on my Fast Guard, Tor spends about 25% (!) of its user CPU time in _int_malloc and _int_free. I tried switching to jemalloc, but I just got significantly worse memory fragmentation.

Child Tickets

TicketTypeStatusOwnerSummary
#24119enhancementclosedHello71channel_rsa_id_group_set_badness spends a lot of time in malloc/free
#25007enhancementnewSee if we can allocate less for HMAC in Tor relays
#25008defectclosednickmCall protocol_list_supports_protocol less often to save time and allocation
#25009defectclosednickmI think KIST can use hash tables much less.
#25150enhancementclosednickmAvoid malloc/free on each server-side ntor handshake

Attachments (1)

tor.heaptrack.flamegraph.sanitized.svg.gz (38.7 KB) - added by Hello71 4 months ago.
tor middle relay heaptrack allocations flamegraph

Download all attachments as: .zip

Change History (11)

comment:1 Changed 5 months ago by cypherpunks

See #23722 as well.

comment:2 Changed 5 months ago by Hello71

compression pegs the CPU (of course), but consensus updates are pretty uncommon. malloc and free waste time I believe for every single packet forwarded, probably mainly because AFAICT there is no fast path that avoids memory allocation (or epoll waiting) in the case where the outgoing channel is free.

comment:3 Changed 4 months ago by nickm

Milestone: Tor: 0.3.3.x-final

Do you have any stack results on who the main callers of malloc() and free() are in your code? Do you have a more complete profile you can share?

comment:4 Changed 4 months ago by Hello71

I tried heaptrack, which seems pretty useful, but I found that there are no obvious culprits for either number of allocations or peak memory usage. it looks like a lot of time is spent in memmove through connection_or_process_cells_from_inbuf though, and it seems plausible that that mallocs buffers. maybe it's possible to avoid those if the outgoing channel is unblocked? might be complicated... I can do another heaptrack profile if you want though.

comment:5 Changed 4 months ago by nickm

If there's anything you can share from heaptrack, that would sure be helpful

Changed 4 months ago by Hello71

tor middle relay heaptrack allocations flamegraph

comment:6 Changed 4 weeks ago by nickm

Circling around to this ticket again now that 0.3.3 is feature-frozen.

The biggest offender seems to have been channel_rsa_id_group_set_badness, which should have been fixed a lot bug #24119. So that's good.

There are some things I'm surprised to see in the profile:

  • onion_skin_server_handshake (19.22%)
  • protocol_list_supports_protocol (10.95%)
  • outbuf_table_add (3.5%)

Let's have a look and see how much we can do there.

Another improvement on this area would be to see about making cell-handing inside Tor involve fewer copies; but that might be better handled as part of our Rust work.

comment:7 Changed 4 weeks ago by Hello71

I think protocol_list_supports_protocol fixed by rust protover. I think dgoulet is working on outbuf_table_add. I looked at cell handling but I think a lot of that work overlaps with https://lists.torproject.org/pipermail/tor-dev/2017-October/012536.html.

excessive malloc/free could also be mitigated by switching to jemalloc on Linux #20424-ish

comment:8 Changed 4 weeks ago by dgoulet

If I may point this ticket I opened some days ago: #24914

One dup() we could avoid at each cell right there.

(Oh... for which there is a patch! neat :)

Last edited 4 weeks ago by dgoulet (previous) (diff)

comment:9 Changed 3 weeks ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

comment:10 Changed 3 weeks ago by starlight

Cc: starlight@… added
Note: See TracTickets for help on using tickets.