Opened 2 months ago

Closed 4 weeks ago

#32778 closed defect (fixed)

Initialise pubsub in Windows NT service mode

Reported by: cypherpunks Owned by: nickm
Priority: High Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.4.1.1-alpha
Severity: Normal Keywords: consider-backport-after-0431, 041-backport 042-backport extra-review crash regression 043-must BugSmashFund
Cc: Actual Points: .2
Parent ID: Points:
Reviewer: catalyst Sponsor:

Description

i don't know what this error is about. Seen in logs many thousands of many of this log lines (650k), even if they tell to not warn about it again.

Dec 17 01:17:40.000 [warn] {HANDSHAKE} 20190 connections have failed:
Dec 17 01:17:40.000 [warn] {HANDSHAKE}  20190 connections died in state connect()ing with SSL state (No SSL object)
Dec 17 01:17:40.000 [warn] {BUG} tor_bug_occurred_(): Bug: pubsub_publish.c:37: pubsub_pub_: Non-fatal assertion !(! d) failed. (Future instances of this warning will be silenced.) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {BUG} Bug: Tor 0.4.2.5: Non-fatal assertion !(! d) failed in pubsub_pub_ at pubsub_publish.c:37. (Stack trace not available) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {BUG} tor_bug_occurred_(): Bug: pubsub_publish.c:37: pubsub_pub_: Non-fatal assertion !(! d) failed. (Future instances of this warning will be silenced.) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {BUG} Bug: Tor 0.4.2.5: Non-fatal assertion !(! d) failed in pubsub_pub_ at pubsub_publish.c:37. (Stack trace not available) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {CONTROL} Problem bootstrapping. Stuck at 0% (starting): Starting. (Connection timed out [WSAETIMEDOUT ]; TIMEOUT; count 20192; recommendation warn; host CE3FE883C6C9EF475EA097DC3E33A6F32B852DA1 at 78.129.218.56:443)

Child Tickets

Attachments (1)

notices.log (37.9 KB) - added by Vort 8 weeks ago.

Download all attachments as: .zip

Change History (32)

comment:1 Changed 2 months ago by nickm

Milestone: Tor: 0.4.2.x-final
Priority: MediumHigh

comment:2 Changed 2 months ago by nickm

Hm. What platform is this on? Do you have anything unusual in your configuration?

comment:3 Changed 2 months ago by cypherpunks

platform windows/mingw64

comment:4 Changed 2 months ago by nickm

Okay. Does this happen right when you start, or after a while. Does it happen every time you try to run Tor, or only with certain configuration options?

comment:5 Changed 8 weeks ago by Vort

I confirm lots of

Dec 30 12:19:49.000 [warn] tor_bug_occurred_(): Bug: pubsub_publish.c:37: pubsub_pub_: Non-fatal assertion !(! d) failed. (Future instances of this warning will be silenced.) (on Tor 0.4.2.5 )
Dec 30 12:19:49.000 [warn] Bug: Tor 0.4.2.5: Non-fatal assertion !(! d) failed in pubsub_pub_ at pubsub_publish.c:37. (Stack trace not available) (on Tor 0.4.2.5 )

lines after upgrading from 0.4.1.7 to 0.4.2.5.

That was for my relay.
When I tried to run the same 0.4.2.5 binaries with clean configuration inside VirtualBox, there was no such errors.
Will try to dig deeper.

Changed 8 weeks ago by Vort

Attachment: notices.log added

comment:6 in reply to:  4 Changed 8 weeks ago by Vort

@nickm: looks like problem is again related to Windows service mode of operation.
I was able to reproduce the problem inside VM with clean data directory: notices.log.

comment:7 Changed 8 weeks ago by nickm

Owner: set to nickm
Status: newaccepted

So, it happens when running as a service, but not otherwise? That might help narrow it down.

comment:8 Changed 8 weeks ago by Vort

Yes, that was happened for me. Please recheck. If this is not the case, I will continue investigation.

comment:9 Changed 7 weeks ago by cypherpunks

as the error was reported, my tor was run as Windows service too and not otherwise yet. this is confirmable by me. But since i never saw this happening again. I sadly cannot tell more about it here.

comment:10 Changed 7 weeks ago by nickm

Actual Points: .1
Keywords: 041-backport 042-backport added
Status: acceptedneeds_review

Likely fix in branch ticket32778_041 with PR at https://github.com/torproject/tor/pull/1634 . Patch is against 0.4.1 but should merge cleanly to master.

This is a minimal fix to avoid possible stability issues. For a longer-term fix, see #32883

comment:11 Changed 7 weeks ago by dgoulet

Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final
Reviewer: catalyst

comment:12 in reply to:  10 Changed 7 weeks ago by catalyst

Keywords: extra-review added

Replying to nickm:

Likely fix in branch ticket32778_041 with PR at https://github.com/torproject/tor/pull/1634 . Patch is against 0.4.1 but should merge cleanly to master.

This is a minimal fix to avoid possible stability issues. For a longer-term fix, see #32883

Thanks! Looks good to me, but I haven't tested it on Windows. Maybe someone else who knows how to do that could do so? If not, I'm also ok with merging this as-is because it seems like a less-common use case.

comment:13 Changed 7 weeks ago by catalyst

Status: needs_reviewneeds_information

Is anyone who encountered this problem willing to verify that this patch works for them?

comment:14 Changed 7 weeks ago by nickm

From this week's team meeting:

23:15 <+nickm> catalyst: if we can't find a tester for #32778, do you think we 
               should go ahead and merge?
23:15 -zwiebelbot:#tor-meeting- tor#32778: pubsub_pub_ - [needs_information] - 
          https://bugs.torproject.org/32778
23:16 < catalyst> nickm: yeah, it seems reasonable by inspection, and appveyor 
                  seems to be ok with it
23:16 < catalyst> i think we lack automated testing for that functionality

But it would be really good if we could get some testing on this before we ship it.

comment:15 Changed 7 weeks ago by nickm

Keywords: crash added

comment:16 Changed 7 weeks ago by nickm

Keywords: regression added

comment:17 Changed 7 weeks ago by nickm

Keywords: 043-must added

comment:18 Changed 6 weeks ago by catalyst

Status: needs_informationmerge_ready

It's been about a week and no responses. I'm ok with merging this now, but am also happy to delay if people want to wait longer for feedback.

comment:19 Changed 6 weeks ago by teor

Actual Points: .1.2
Keywords: consider-backport-after-043-stable added
Milestone: Tor: 0.4.3.x-finalTor: 0.4.2.x-final
Version: Tor: 0.4.2.5Tor: 0.4.1.1-alpha

There was a conflict with #32883, which replaces the pubsub calls added in this ticket, with calls to tor_run_main() (which then calls the pubsub functions). Therefore, I did an "ours" merge, to avoid taking any changes from this PR in master. (But resolve future conflicts when we backport this change.)

Unfortunately, the "ours" merge means that this code won't be tested in master.

But this PR seems reasonable to me for backport. The pattern of calls in nt_service_main() and nt_service_body() matches the pubsub call pattern used by other platforms in tor_run_main() before #32883. (And used by all platforms after #32883.)

Given the lack of testing, I'm marking it for backport after 0.4.3-stable.

Merged to master with #32705.

comment:20 Changed 6 weeks ago by Vort

4025cbada6ab01247275faec8dd32ae857ef0fd5 does not work at all. Looks like no one tested these fixes, sorry.

comment:21 Changed 6 weeks ago by nickm

Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final
Status: merge_readyneeds_revision

Can you be more specific, Vort? What happens when you try to use it?

comment:22 Changed 6 weeks ago by Vort

When I try to launch it in service mode, process is created, but nothing happens next: no lines in log file, no ports are opened and very low RAM consumption. Looks like complete hang.

comment:23 in reply to:  20 Changed 6 weeks ago by catalyst

Replying to Vort:

4025cbada6ab01247275faec8dd32ae857ef0fd5 does not work at all. Looks like no one tested these fixes, sorry.

Thanks for the information. That commit seems to be on master and includes the more extensive fix for #32883. Would you please try the more minimal fix at https://github.com/torproject/tor/pull/1634 ?

comment:24 Changed 6 weeks ago by Vort

tor-0.4.2.5 (52d386c9) + nmathewson:ticket32778_041 (54eec534) looks good so far: no pubsub_pub errors in logs.

comment:25 in reply to:  24 Changed 6 weeks ago by catalyst

Replying to Vort:

tor-0.4.2.5 (52d386c9) + nmathewson:ticket32778_041 (54eec534) looks good so far: no pubsub_pub errors in logs.

Thanks for the info! I've reopened #32883 so we can work on the issue in master.

comment:26 Changed 6 weeks ago by catalyst

Status: needs_revisionneeds_information

Let's maybe wait a week to see if there's any more feedback on the smaller patch.

comment:27 in reply to:  26 Changed 5 weeks ago by cypherpunks

Hello privacy friends!
I'm the opener. I was slow to respond on ticket.
Replying to catalyst:

wait a week to see if there's any more feedback on the smaller patch.

I now compiled the linked minimal patch "Initialize publish/subscribe code when running as an NT service. https://github.com/torproject/tor/pull/1634" applied against tor-0.4.2.5 running MINGW64_NT-10.0-18363 and run tor x64 as NT Service mode.

No more log message appears for me so far. Looks good. :)

This is no review in any way. Just a short feedback.

Replying to catalyst:

because it seems like a less-common use case.

Thanks for carrying anyway!

comment:28 Changed 5 weeks ago by teor

Keywords: BugSmashFund added
Milestone: Tor: 0.4.3.x-finalTor: 0.4.2.x-final
Status: needs_informationmerge_ready

We did the master reverts and applied this patch in #32984.
In this ticket, we'll track backporting the change to 0.3.5 and later.

comment:29 Changed 5 weeks ago by teor

Summary: pubsub_pub_Initialise pubsub in Windows NT service mode

comment:30 Changed 4 weeks ago by teor

Keywords: consider-backport-after-0431 added; consider-backport-after-043-stable removed

Since this breaks NT services, let's backport it sooner.

comment:31 Changed 4 weeks ago by teor

Milestone: Tor: 0.4.2.x-finalTor: 0.4.1.x-final
Resolution: fixed
Status: merge_readyclosed

Merged to 0.4.1 and later.

Merged with #29819.

Note: See TracTickets for help on using tickets.