Opened 3 months ago

Last modified 2 months ago

#32103 assigned defect

Subsystem "thread_cleanup" is never called

Reported by: opara Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 043-should, extra-review
Cc: Actual Points:
Parent ID: Points:
Reviewer: nickm Sponsor:

Description

Subsystems implement the interface of struct subsys_fns_t, with one of the optional function pointers being void (*thread_cleanup)(void). This thread_cleanup function is called for all subsystems by the subsystem manager function void subsystems_thread_cleanup(void), but the subsystems_thread_cleanup function is never called anywhere in the code.

At the moment, the only subsystem to implement the thread_cleanup interface is the crypto subsystem, which uses thread_cleanup for freeing the threadlocal crypto_fast_rng_t, as well as freeing the threadlocal error queue on old versions of OpenSSL. As far as I can tell, this is never run.

I think that the subsystems_thread_cleanup function should be run somewhere in the code, but it's not clear to me how this subsystems_thread_cleanup is expected to be used. It seems like there should also be subsystems_thread_init and thread_init functions as well for initializing threadlocal variables. Right now the crypto subsystem does an "initialize on first use" singleton pattern, but it might be useful to add this initialization interface function so that subsystems have the option of initializing all of their threadlocals in one place.

Child Tickets

Change History (13)

comment:1 Changed 3 months ago by nickm

Keywords: 042-should added
Milestone: Tor: 0.4.2.x-final

comment:2 Changed 3 months ago by opara

I made a PR on GitHub with some changes that should fix this problem. Feel free to use them if they're helpful.

This fixes ticket #32103. It extends the threadpool to allow a customizable thread spawn function. This allows us to use our own spawn function which calls subsystems_thread_cleanup and a new subsystems_thread_init. It also calls 'spawn_exit', which negates the need for 1412. Finally the above two functions are called in the main thread after subsystems_init and before subsystems_shutdown respectively.

Last edited 3 months ago by opara (previous) (diff)

comment:3 Changed 3 months ago by nickm

Status: newneeds_review

comment:4 Changed 3 months ago by dgoulet

Reviewer: catalyst

comment:5 Changed 3 months ago by catalyst

Keywords: extra-review added

comment:6 Changed 3 months ago by catalyst

Reviewer: catalystnickm

Thanks for the patches!

Overall, it looks like a good start. There are a few changes that I think should happen before merging. I'm also asking nickm to do some extra review in case there are subtle things related to how we use threads.

The new wrapper in tor_threads.c apparently changes the contract of spawn_func(), which says func should not return, but rather should call spawn_exit(). The documentation of spawn_func() isn't changed accordingly. Maybe the wrapper for the thread-local shutdown should go in spawn_exit(), instead of expecting the thread-start function to return? It seems like this would have been caught by a unit test that tests the new functionality. (I'm not quite sure how difficult such a test would be to write. Please let me know if it would be exceptionally difficult.)

At an architectural level, I would prefer to avoid adding a new general interface that only has one user. It seems to add considerable complexity: new fields and parameters in several places, new files, etc.

Why not call subsystems_thread_init() from spawn_func() and subsystems_thread_cleanup() from spawn_exit()?

Minor: The patches expand tor_run_main() beyond the practracker limit of 100 lines. (This is causing CI to fail.) We can consider refactoring, or add a new exception for that function.

Minor: The loop direction fixup should probably be squashed before merge.

Minor: There should probably be a changes file (see doc/HACKING/CodingStandards.md). We can help with this if you like.

You noticed that the crypto subsystem uses a singleton pattern currently. Your patches don't seem to modify this. Is that intentional?

How will these changes interact with building tor as a library that possibly gets started and stopped, or loaded and unloaded?

comment:7 in reply to:  6 Changed 3 months ago by opara

Replying to catalyst:

Thanks for the patches!

Overall, it looks like a good start. There are a few changes that I think should happen before merging. I'm also asking nickm to do some extra review in case there are subtle things related to how we use threads.

Thanks for taking a look at it! I'll try to explain some of my design reasoning below. Let me know what changes you'd like me to make, and I can do that.


The new wrapper in tor_threads.c apparently changes the contract of spawn_func(), which says func should not return, but rather should call spawn_exit(). The documentation of spawn_func() isn't changed accordingly. Maybe the wrapper for the thread-local shutdown should go in spawn_exit(), instead of expecting the thread-start function to return? It seems like this would have been caught by a unit test that tests the new functionality. (I'm not quite sure how difficult such a test would be to write. Please let me know if it would be exceptionally difficult.)

At an architectural level, I would prefer to avoid adding a new general interface that only has one user. It seems to add considerable complexity: new fields and parameters in several places, new files, etc.

Why not call subsystems_thread_init() from spawn_func() and subsystems_thread_cleanup() from spawn_exit()?

My reason for this was to make the start_tor_thread functionality be an app-level abstraction on top of the original tor threading library. Calling the subsystem manager code (such as subsystems_thread_init()) from the threading library (such as within spawn_func()) causes the subsystem module to become a dependency of the threading library. My goal was to not have this library (in the lib/ directory) depend on application-specific code (in the app/ directory). This approach also means that someone using a thread (created with start_tor_thread()) does not need to worry about running spawn_exit() or any additional cleanup-code since it will happen automatically, hopefully reducing the possibility of bugs. For example, the threadpool already forgets to call spawn_exit() (see https://github.com/torproject/tor/pull/1412/), which means there would be a bug.

Additionally, in the future someone may wish to create a new thread which does something very simple, and may not want that thread interacting with the tor subsystem manager. Integrating the subsystem manager into the threading library would make this impossible.


Minor: The patches expand tor_run_main() beyond the practracker limit of 100 lines. (This is causing CI to fail.) We can consider refactoring, or add a new exception for that function.

Yes, I didn't want to refactor tor's main method in this PR since that could probably be a whole new PR on its own, but also didn't want to reduce the readability of this PR to satisfy the CI system. (Aside: practracker seems to include empty lines in the limit, but it would be nice if it ignored them since empty lines can make the code more readable). Let me know what approach you would prefer and I can make that change.


Minor: The loop direction fixup should probably be squashed before merge.

I'm not sure how GitHub handles push --force when people have left comments on commits (I don't want to overwrite teor's comment), so I haven't done this yet. Please let me know if you'd like me to squash now or later.


Minor: There should probably be a changes file (see doc/HACKING/CodingStandards.md). We can help with this if you like.

Okay thanks, I'll add one and then just let me know if you'd like me to change it.


You noticed that the crypto subsystem uses a singleton pattern currently. Your patches don't seem to modify this. Is that intentional?

I didn't modify that in this PR since I didn't want to make too many changes at once (although I have modified it in my own personal changes to tor).


How will these changes interact with building tor as a library that possibly gets started and stopped, or loaded and unloaded?

I'm not sure, I have no experience building tor as a library. Currently threads are only created when tor is running as a relay (in server mode). Does anyone use tor as a library when running a relay? I feel like this would already break things, such as the threadpool which is never freed, and whose work entries are also not freed when shutting down.

Last edited 3 months ago by opara (previous) (diff)

comment:8 Changed 3 months ago by nickm

Status: needs_reviewneeds_revision

I've done a secondary review at Catalyst's request. Comments are on the PR. I think I agree with Catalyst's suggestions.

comment:9 Changed 3 months ago by opara

Thanks for the review. If you want to call subsystems_thread_cleanup() from spawn_exit(), then I think it's best to close this PR, and make a new patch that does that instead since none of this code will be needed. The problem is that it make linking complicated, since now you need to link the tor code from app/ into the tools in src/tools, or to remove the lib/thread code from these tools. I'm not sure the best way to do this, so I'll let someone else who is more familiar with the tor build system take a look at this.

comment:10 Changed 3 months ago by teor

Keywords: 043-should added; 042-should removed
Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final

This fix is unlikely to make our 0.4.2 release.

Feel free to open a new PR, and put the link in this ticket.

comment:11 Changed 2 months ago by opara

The 'defect' status and milestone should probably be removed from this ticket. When I made the ticket I didn't realize that threads in tor never exit, so there is no reason to clean up any memory in these threads. It might be best just to remove the thread_cleanup function altogether.

Last edited 2 months ago by opara (previous) (diff)

comment:12 Changed 2 months ago by teor

Owner: set to nickm
Status: needs_revisionassigned

Do we need a thread cleanup function, nickm?

comment:13 Changed 2 months ago by nickm

I believe we do. Threads _do_ exit (at shutdown), and thread-local storage is indeed a thing we use.

Note: See TracTickets for help on using tickets.