Opened 8 years ago

Closed 8 years ago

#7087 closed defect (fixed)

memory leaks in new channel code

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: andrea Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

==2387== 104 bytes in 1 blocks are definitely lost in loss record 21 of 39
==2387==    at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==2387==    by 0x2259F7: _tor_malloc (util.c:144)
==2387==    by 0x227035: _tor_malloc_zero (util.c:170)
==2387==    by 0x18DE19: channel_tls_start_listener (channeltls.c:190)
==2387==    by 0x1D4B54: connection_tls_start_handshake (connection_or.c:1201)
==2387==    by 0x1C48E7: connection_handle_listener_read (connection.c:1303)
==2387==    by 0x1C52E4: connection_handle_read (connection.c:2689)
==2387==    by 0x11CB90: conn_read_callback (main.c:720)
==2387==    by 0x52C9343: event_base_loop (in /usr/lib/libevent-1.4.so.2.1.3)
==2387==    by 0x11A510: do_main_loop (main.c:1986)
==2387==    by 0x11A8CC: tor_main (main.c:2694)
==2387==    by 0x5EFCC8C: (below main) (libc-start.c:228)
==2387==
==2387== 672 (56 direct, 616 indirect) bytes in 1 blocks are definitely lost in loss record 29 of 39
==2387==    at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==2387==    by 0x2259F7: _tor_malloc (util.c:144)
==2387==    by 0x227035: _tor_malloc_zero (util.c:170)
==2387==    by 0x1A220E: circuitmux_alloc (circuitmux.c:361)
==2387==    by 0x18E153: channel_tls_connect (channeltls.c:132)
==2387==    by 0x19795B: circuit_handle_first_hop (circuitbuild.c:159)
==2387==    by 0x19A796: circuit_establish_circuit (circuitbuild.c:1957)
==2387==    by 0x1AB0C2: circuit_launch_by_extend_info (circuituse.c:1364)
==2387==    by 0x1ACD9D: circuit_build_needed_circs (circuituse.c:741)
==2387==    by 0x11DBBD: second_elapsed_callback (main.c:1495)
==2387==    by 0x52C9343: event_base_loop (in /usr/lib/libevent-1.4.so.2.1.3)
==2387==    by 0x11A510: do_main_loop (main.c:1986)
==2387== 1,344 (112 direct, 1,232 indirect) bytes in 2 blocks are definitely lost in loss record 35 of 39
==2387==    at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==2387==    by 0x2259F7: _tor_malloc (util.c:144)
==2387==    by 0x227035: _tor_malloc_zero (util.c:170)
==2387==    by 0x1A220E: circuitmux_alloc (circuitmux.c:361)
==2387==    by 0x18E153: channel_tls_connect (channeltls.c:132)
==2387==    by 0x19795B: circuit_handle_first_hop (circuitbuild.c:159)
==2387==    by 0x19A796: circuit_establish_circuit (circuitbuild.c:1957)
==2387==    by 0x1AB0C2: circuit_launch_by_extend_info (circuituse.c:1364)
==2387==    by 0x1ACA39: circuit_build_needed_circs (circuituse.c:711)
==2387==    by 0x11DBBD: second_elapsed_callback (main.c:1495)
==2387==    by 0x52C9343: event_base_loop (in /usr/lib/libevent-1.4.so.2.1.3)
==2387==    by 0x11A510: do_main_loop (main.c:1986)

on a normal relay

Child Tickets

Attachments (2)

tor-7087-test.valgrind.log.2.cmux-losses.txt (14.6 KB) - added by andrea 8 years ago.
Valgrind output of 7087 repro
debug.log.2.cleanup.txt (29.3 KB) - added by andrea 8 years ago.
Last 250 lines of debug output from repro run, showing cleanup

Download all attachments as: .zip

Change History (13)

comment:1 Changed 8 years ago by nickm

Cc: andrea added

Changed 8 years ago by andrea

Valgrind output of 7087 repro

comment:2 Changed 8 years ago by andrea

I have a repro for this; I've just attached valgrind output, edited to remove non-channel/cmux-related warnings and sort the detect leaks into functionally related groups. To summarize, there are three basic groups of a definite leak and a matching number of plainly related indirect leaks [1]. The first group flows through channel_tls_handle_incoming() (six channels appear to have leaked) and the second leaks in identical fashion but via channel_tls_connect() for seven channels. The third functional group consists of a single leaked channel_listener_t.

[1] All share a common stack trace prefix, except for one set of indirectly leaked blocks each with chanid_circid_muxinfo_map_HT_GROW() instead, apparently from the first circuit attached to each of the affected channels.

Changed 8 years ago by andrea

Attachment: debug.log.2.cleanup.txt added

Last 250 lines of debug output from repro run, showing cleanup

comment:3 Changed 8 years ago by andrea

As can be seen from the attached debug log, channel_free_all() tries to free the following 13 channels remaining active at shutdown time:

Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x9353e20 (global ID 705) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0xa59c4d0 (global ID 763) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0xa627d10 (global ID 753) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x75f1120 (global ID 752) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x9db3980 (global ID 575) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0xa229910 (global ID 704) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x8c59b70 (global ID 574) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x9c351c0 (global ID 790) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x8b76e90 (global ID 789) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0xa67e2a0 (global ID 771) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0xa344a40 (global ID 764) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x9a87c30 (global ID 746) in state open (2)
Oct 15 05:35:42.000 [debug] channel_free_list(): Cleaning up channel 0x9df56c0 (global ID 788) in state open (2)

I doubt the number 13 here is a coincidence; the leaked channels (and the listener, which was changed to state closed earlier in the debug log) were left over at shutdown time, and a bug must exist in channel_free_all(). The good news here, though, is that all the leaks occur at shutdown time and scale with the number of channels active at that moment rather than a progressive leak accumulating over the lifetime of the Tor process.

comment:4 Changed 8 years ago by andrea

Status: newneeds_review

Fix for channel_free_all()/channel_force_free() bug causing the 13 leaked channel_tls_t/circuitmux_t structures in my test run is in my bug7087 branch. Fix for channel_tls_listener_t leak pending.

comment:5 Changed 8 years ago by nickm

Looks okay to me; feel free to merge .

comment:6 in reply to:  5 Changed 8 years ago by andrea

Replying to nickm:

Looks okay to me; feel free to merge .

Okay, will do. Thanks.

comment:7 Changed 8 years ago by andrea

The channel_listener_t leak is caused by channel_tls_listener_close_method() unsetting channel_tls_listener so tbat channel_tls_free_all() then calls channel_listener_free() with NULL; fix is to cache the original value of channel_tls_listener. See patch in my bug7087 branch now.

comment:8 Changed 8 years ago by nickm

Status: needs_reviewneeds_revision

A nit: let's not name variables "l". (Becasue 1 l I wrote once looked like another letter.)

Other than that, looks good to me.

comment:9 in reply to:  8 Changed 8 years ago by andrea

Status: needs_revisionneeds_review

Replying to nickm:

A nit: let's not name variables "l". (Becasue 1 l I wrote once looked like another letter.)

Other than that, looks good to me.

See my bug7087_2 branch.

comment:10 Changed 8 years ago by nickm

Looks good; feel free to merge.

comment:11 Changed 8 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Looks like we merged this at ac227cf587c17fb2bdd10de4b320ea4df3632cd8.

Note: See TracTickets for help on using tickets.