Opened 5 weeks ago

Last modified 2 weeks ago

#31736 merge_ready defect

Stop using mutex_destroy(), when multiple threads can still access the mutex

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.3.5.1-alpha
Severity: Normal Keywords: consider-backport-after-042-stable, consider-backport-if-needed, diagnostics, 042-should, 035-backport-maybe, 040-backport-maybe, 041-backport-maybe, regression, BugSmashFund
Cc: nickm, dgoulet Actual Points: 0.5
Parent ID: Points: 0.2
Reviewer: nickm Sponsor:

Description

Part of #31614, alternative to #31735.

If we can't join all the threads before destroying a mutex (#31735), and we can't otherwise prevent multiple thread access, we should stop destroying that mutex. (Because destroying a locked thread invokes undefined behaviour.)

There may be some other pattern that helps us destroy all but one mutex. But that involves a "mutex-destruction" mutex. Which is terribly complex.

Child Tickets

Change History (15)

comment:1 Changed 4 weeks ago by teor

I spoke with nickm on IRC, and he agreed:

  • we will stop calling mutex destroy on static pthread mutexes
    • need to check the windows rules
  • we'll have to make sure asan and valgrind don't think it's a leak
  • we should leave a comment on the functions that still need to call mutex destroy, and the places where we remove the mutex destroys

comment:2 Changed 4 weeks ago by teor

Cc: nickm dgoulet added
Status: newneeds_review

Here is a draft pull request, I'd like a quick review before continuing:

Here are my questions:

  • Are the comments correct?
  • Have I missed any calls?
  • I skipped some calls that looked local, am I correct?

I still need to:

  • write a changes file
  • split into multiple commits
  • check for any extra uses of these functions after 0.3.5:
    • pthread_mutex_destroy()
    • tor_mutex_uninit() - non-Windows
    • tor_mutex_free() - non-Windows
    • atomic_counter_destroy() - non-Windows, but with our atomic counter compat implementation

comment:3 Changed 4 weeks ago by teor

Actual Points: 0.2
Points: 0.2

comment:4 Changed 3 weeks ago by nickm

Owner: set to teor
Status: needs_reviewassigned

Hi! This all looks plausible to me at first glance. Here's another question to check on, however:

  • Have you thought about what happens on re-initialization?

(Ideally, if Tor finishes, and then is re-initialized in the same process, we would reuse the existing mutexes rather than creating new ones. Is that achievable?)

comment:5 Changed 3 weeks ago by teor

Is there a unit test for re-initialisation?
Or should I write one?

I can try to reason about the behaviour, but tests are much more reliable :-)

comment:6 Changed 3 weeks ago by nickm

There's some code that could be used for an integration test, but it would need work. See the configure option --enable-restart-debugging and the corresponding environment variable TOR_DEBUG_RESTART, both added in 97d9ba2380e0c894a1b611bdb4f35d0fe98a837a.

comment:7 in reply to:  4 Changed 3 weeks ago by teor

Actual Points: 0.20.4
Status: assignedneeds_review

Replying to nickm:

Hi! This all looks plausible to me at first glance. Here's another question to check on, however:

  • Have you thought about what happens on re-initialization?

We must destroy the mutexes, or re-initialising them triggers undefined behaviour. (On both pthreads and Windows threads.) And there is no way to determine if the mutex has already been initialised.

Therefore, we should continue to use the old code, but:

  • avoid freeing some locked mutexes by acquiring and releasing the lock in tor_mutex_uninit(), and
  • add comments explaining the issue, so we can diagnose any crashes or deadlocks more easily.

Here is my PR:

I think we should leave restart debugging for #31735, because I am out of time on this issue.

comment:8 Changed 3 weeks ago by teor

Reviewer: nickm

comment:9 Changed 3 weeks ago by nickm

Keywords: asn-merge added
Status: needs_reviewmerge_ready

Seems okay. Should we keep this ticket open after we merge this and/or open a new one? I vote for "yes".

comment:10 Changed 3 weeks ago by teor

We should keep this ticket open and consider-backport-if-needed.

I think #31735 covers all the follow-up we need to do, please feel free to open another ticket if it doesn't.

comment:11 Changed 3 weeks ago by teor

Actual Points: 0.40.5
Status: merge_readyneeds_review

I found another fix that might be much safer: using a sentinel variable to only initialise global mutexes once, regardless of the number of times that tor is re-initialised. (We already do this with the logs, so that we can keep seeing logs during shutdown.)

I updated the usage comments in this branch, and added a note to #31735.

comment:12 Changed 3 weeks ago by nickm

Status: needs_reviewmerge_ready

I like the sentinel-variable fix; this branch seems okay to merge at this point. I'll comment more in #31735.

comment:13 Changed 2 weeks ago by asn

Keywords: asn-merge removed

Merged. Leaving open for backports.

comment:14 Changed 2 weeks ago by asn

Milestone: Tor: 0.4.2.x-finalTor: 0.4.1.x-final

comment:15 Changed 2 weeks ago by teor

Parent ID: #31614

This ticket can be merged independently of its parent.

Note: See TracTickets for help on using tickets.