Opened 6 weeks ago

Last modified 6 days ago

#30187 assigned defect

100% cpu usage in winthreads tor_cond_wait

Reported by: bolvan Owned by: ahf
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor:
Severity: Normal Keywords: windows 035-backport 042-proposed
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


For years I run relay using self-compiled win64 version of tor.
Compiler mingw64.
Relay runs well for some time but suddenly starts using 100% cpu all cores.
I traced where it happens. The following loop never ends :

  do {
    DWORD res;
    res = WaitForSingleObject(cond->event, ms);
    if (cond->n_to_wake &&
        cond->generation != generation_at_start) {
      result = 0;
      waiting = 0;
      goto out;
    } else if (res != WAIT_OBJECT_0) {
      result = (res==WAIT_TIMEOUT) ? 1 : -1;
      waiting = 0;
      goto out;
    } else if (ms != INFINITE) {
      endTime = GetTickCount();
      if (startTime + ms_orig <= endTime) {
        result = 1; /* Timeout */
        waiting = 0;
        goto out;
      } else {
        ms = startTime + ms_orig - endTime;
    /* If we make it here, we are still waiting. */
    if (cond->n_to_wake == 0) {
      /* There is nobody else who should wake up; reset
       * the event. */
  } while (waiting);

res = WAIT_OBJECT_0;

it means no path with "goto out" ever execute
more than one thread run this loop and each one eat separate core

Some people I shared binaries with report same problem.
Pls check

Child Tickets

Change History (15)

comment:1 Changed 6 weeks ago by ahf

Keywords: windows added; winthreads tor_cond_wait removed
Owner: set to ahf
Status: newassigned

Interesting, I have not seen this yet myself, but I also never ran a Tor relay on Windows.

Have you been able to reproduce this with 0.4.x/master?

comment:2 Changed 6 weeks ago by bolvan

For me it never happened in linux. Problem seem to be winthreads specific.
I'll build 0.4.x and check

comment:3 Changed 6 weeks ago by bolvan

Yes, same bug

comment:4 Changed 6 weeks ago by ahf

Have you been able to debug which call to tor_cond_wait() that is being problematic?

And this is only when running as a relay, right? You have not seen this condition when running as a client?

comment:5 Changed 6 weeks ago by bolvan

only worker_thread_main calls tor_cond_wait
personally i dont run as a client but another person who does reports client mode do not cause problem, only relay does

comment:6 Changed 6 weeks ago by nickm

Keywords: 035-backport added
Milestone: Tor: 0.4.0.x-final
Priority: MediumHigh

comment:7 Changed 6 weeks ago by nickm

One possible fix here would be to use ConditionVariable instead; it's been in Windows since Vista.

If we don't that route, here is a part that looks suspicious to me: the generation count getting stuck at 28 suggests to me that we are using generation wrong. In any case, we should really be either waking up or sleeping with each time through the loop, I think.

comment:8 Changed 6 weeks ago by cypherpunks

can you describe howto trace this down or link to info about howto? i also have a reproduceable "100% cpu all cores" problem. thanks and thanks for running relay.

comment:9 Changed 6 weeks ago by bolvan

You can use gdb. I'm not too good in gdb so I used to convert dwarf debug info to pdb and then used visual studio.
It will ask source file location first. It will be able to set breakpoint although it could not watch vars. I used disassemble and register window to read value. May be this problem caused by gcc optimizations. Remove -O2, -O , ... from Makefile. I havent checked

Last edited 6 weeks ago by bolvan (previous) (diff)

comment:10 Changed 11 days ago by nickm

Keywords: 042-proposed added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

comment:11 Changed 11 days ago by nickm

(This is worth doing, but it is not in scope for stable.)

comment:12 Changed 6 days ago by bolvan

This bugs makes tor relay unusable under windows.
All windows relay operators should stop their nodes.
Is it serious enough ?

comment:13 Changed 6 days ago by ahf

It is serious and we plan on fixing this bug. Right now (end of May 2019) we are finishing off a sponsor and trying to get 0.4.1 shipped. Once we are done with that work, we will get to this. The priority of this bug is still considered high.

comment:14 Changed 6 days ago by cypherpunks

Is this reproducible with MinGW-W64 trunk?

comment:15 Changed 6 days ago by bolvan

It doesnt seem to be related to the compiler. Its a bug in the windows specific code.
I compiled in recent wingw-w64 on windows and bug was there

Note: See TracTickets for help on using tickets.