Opened 8 months ago

Closed 6 months ago

Last modified 4 months ago

#32778 closed defect (fixed)

Initialise pubsub in Windows NT service mode

Reported by: cypherpunks Owned by: nickm
Priority: High Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.4.1.1-alpha
Severity: Normal Keywords: consider-backport-after-0431, 041-backport 042-backport extra-review crash regression 043-must BugSmashFund
Cc: dgoulet, nickm Actual Points: .2
Parent ID: Points:
Reviewer: catalyst Sponsor:

Description

i don't know what this error is about. Seen in logs many thousands of many of this log lines (650k), even if they tell to not warn about it again.

Dec 17 01:17:40.000 [warn] {HANDSHAKE} 20190 connections have failed:
Dec 17 01:17:40.000 [warn] {HANDSHAKE}  20190 connections died in state connect()ing with SSL state (No SSL object)
Dec 17 01:17:40.000 [warn] {BUG} tor_bug_occurred_(): Bug: pubsub_publish.c:37: pubsub_pub_: Non-fatal assertion !(! d) failed. (Future instances of this warning will be silenced.) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {BUG} Bug: Tor 0.4.2.5: Non-fatal assertion !(! d) failed in pubsub_pub_ at pubsub_publish.c:37. (Stack trace not available) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {BUG} tor_bug_occurred_(): Bug: pubsub_publish.c:37: pubsub_pub_: Non-fatal assertion !(! d) failed. (Future instances of this warning will be silenced.) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {BUG} Bug: Tor 0.4.2.5: Non-fatal assertion !(! d) failed in pubsub_pub_ at pubsub_publish.c:37. (Stack trace not available) (on Tor 0.4.2.5 )
Dec 17 01:17:40.000 [warn] {CONTROL} Problem bootstrapping. Stuck at 0% (starting): Starting. (Connection timed out [WSAETIMEDOUT ]; TIMEOUT; count 20192; recommendation warn; host CE3FE883C6C9EF475EA097DC3E33A6F32B852DA1 at 78.129.218.56:443)

Child Tickets

Attachments (1)

notices.log (37.9 KB) - added by Vort 7 months ago.

Download all attachments as: .zip

Change History (34)

comment:1 Changed 8 months ago by nickm

Milestone: Tor: 0.4.2.x-final
Priority: MediumHigh

comment:2 Changed 8 months ago by nickm

Hm. What platform is this on? Do you have anything unusual in your configuration?

comment:3 Changed 8 months ago by cypherpunks

platform windows/mingw64

comment:4 Changed 8 months ago by nickm

Okay. Does this happen right when you start, or after a while. Does it happen every time you try to run Tor, or only with certain configuration options?

comment:5 Changed 7 months ago by Vort

I confirm lots of

Dec 30 12:19:49.000 [warn] tor_bug_occurred_(): Bug: pubsub_publish.c:37: pubsub_pub_: Non-fatal assertion !(! d) failed. (Future instances of this warning will be silenced.) (on Tor 0.4.2.5 )
Dec 30 12:19:49.000 [warn] Bug: Tor 0.4.2.5: Non-fatal assertion !(! d) failed in pubsub_pub_ at pubsub_publish.c:37. (Stack trace not available) (on Tor 0.4.2.5 )

lines after upgrading from 0.4.1.7 to 0.4.2.5.

That was for my relay.
When I tried to run the same 0.4.2.5 binaries with clean configuration inside VirtualBox, there was no such errors.
Will try to dig deeper.

Changed 7 months ago by Vort

Attachment: notices.log added

comment:6 in reply to:  4 Changed 7 months ago by Vort

@nickm: looks like problem is again related to Windows service mode of operation.
I was able to reproduce the problem inside VM with clean data directory: notices.log.

comment:7 Changed 7 months ago by nickm

Owner: set to nickm
Status: newaccepted

So, it happens when running as a service, but not otherwise? That might help narrow it down.

comment:8 Changed 7 months ago by Vort

Yes, that was happened for me. Please recheck. If this is not the case, I will continue investigation.

comment:9 Changed 7 months ago by cypherpunks

as the error was reported, my tor was run as Windows service too and not otherwise yet. this is confirmable by me. But since i never saw this happening again. I sadly cannot tell more about it here.

comment:10 Changed 7 months ago by nickm

Actual Points: .1
Keywords: 041-backport 042-backport added
Status: acceptedneeds_review

Likely fix in branch ticket32778_041 with PR at https://github.com/torproject/tor/pull/1634 . Patch is against 0.4.1 but should merge cleanly to master.

This is a minimal fix to avoid possible stability issues. For a longer-term fix, see #32883

comment:11 Changed 7 months ago by dgoulet

Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final
Reviewer: catalyst

comment:12 in reply to:  10 Changed 7 months ago by catalyst

Keywords: extra-review added

Replying to nickm:

Likely fix in branch ticket32778_041 with PR at https://github.com/torproject/tor/pull/1634 . Patch is against 0.4.1 but should merge cleanly to master.

This is a minimal fix to avoid possible stability issues. For a longer-term fix, see #32883

Thanks! Looks good to me, but I haven't tested it on Windows. Maybe someone else who knows how to do that could do so? If not, I'm also ok with merging this as-is because it seems like a less-common use case.

comment:13 Changed 7 months ago by catalyst

Status: needs_reviewneeds_information

Is anyone who encountered this problem willing to verify that this patch works for them?

comment:14 Changed 7 months ago by nickm

From this week's team meeting:

23:15 <+nickm> catalyst: if we can't find a tester for #32778, do you think we 
               should go ahead and merge?
23:15 -zwiebelbot:#tor-meeting- tor#32778: pubsub_pub_ - [needs_information] - 
          https://bugs.torproject.org/32778
23:16 < catalyst> nickm: yeah, it seems reasonable by inspection, and appveyor 
                  seems to be ok with it
23:16 < catalyst> i think we lack automated testing for that functionality

But it would be really good if we could get some testing on this before we ship it.

comment:15 Changed 7 months ago by nickm

Keywords: crash added

comment:16 Changed 7 months ago by nickm

Keywords: regression added

comment:17 Changed 7 months ago by nickm

Keywords: 043-must added

comment:18 Changed 7 months ago by catalyst

Status: needs_informationmerge_ready

It's been about a week and no responses. I'm ok with merging this now, but am also happy to delay if people want to wait longer for feedback.

comment:19 Changed 7 months ago by teor

Actual Points: .1.2
Keywords: consider-backport-after-043-stable added
Milestone: Tor: 0.4.3.x-finalTor: 0.4.2.x-final
Version: Tor: 0.4.2.5Tor: 0.4.1.1-alpha

There was a conflict with #32883, which replaces the pubsub calls added in this ticket, with calls to tor_run_main() (which then calls the pubsub functions). Therefore, I did an "ours" merge, to avoid taking any changes from this PR in master. (But resolve future conflicts when we backport this change.)

Unfortunately, the "ours" merge means that this code won't be tested in master.

But this PR seems reasonable to me for backport. The pattern of calls in nt_service_main() and nt_service_body() matches the pubsub call pattern used by other platforms in tor_run_main() before #32883. (And used by all platforms after #32883.)

Given the lack of testing, I'm marking it for backport after 0.4.3-stable.

Merged to master with #32705.

comment:20 Changed 7 months ago by Vort

4025cbada6ab01247275faec8dd32ae857ef0fd5 does not work at all. Looks like no one tested these fixes, sorry.

comment:21 Changed 7 months ago by nickm

Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final
Status: merge_readyneeds_revision

Can you be more specific, Vort? What happens when you try to use it?

comment:22 Changed 7 months ago by Vort

When I try to launch it in service mode, process is created, but nothing happens next: no lines in log file, no ports are opened and very low RAM consumption. Looks like complete hang.

comment:23 in reply to:  20 Changed 7 months ago by catalyst

Replying to Vort:

4025cbada6ab01247275faec8dd32ae857ef0fd5 does not work at all. Looks like no one tested these fixes, sorry.

Thanks for the information. That commit seems to be on master and includes the more extensive fix for #32883. Would you please try the more minimal fix at https://github.com/torproject/tor/pull/1634 ?

comment:24 Changed 7 months ago by Vort

tor-0.4.2.5 (52d386c9) + nmathewson:ticket32778_041 (54eec534) looks good so far: no pubsub_pub errors in logs.

comment:25 in reply to:  24 Changed 7 months ago by catalyst

Replying to Vort:

tor-0.4.2.5 (52d386c9) + nmathewson:ticket32778_041 (54eec534) looks good so far: no pubsub_pub errors in logs.

Thanks for the info! I've reopened #32883 so we can work on the issue in master.

comment:26 Changed 7 months ago by catalyst

Status: needs_revisionneeds_information

Let's maybe wait a week to see if there's any more feedback on the smaller patch.

comment:27 in reply to:  26 Changed 7 months ago by cypherpunks

Hello privacy friends!
I'm the opener. I was slow to respond on ticket.
Replying to catalyst:

wait a week to see if there's any more feedback on the smaller patch.

I now compiled the linked minimal patch "Initialize publish/subscribe code when running as an NT service. https://github.com/torproject/tor/pull/1634" applied against tor-0.4.2.5 running MINGW64_NT-10.0-18363 and run tor x64 as NT Service mode.

No more log message appears for me so far. Looks good. :)

This is no review in any way. Just a short feedback.

Replying to catalyst:

because it seems like a less-common use case.

Thanks for carrying anyway!

comment:28 Changed 7 months ago by teor

Keywords: BugSmashFund added
Milestone: Tor: 0.4.3.x-finalTor: 0.4.2.x-final
Status: needs_informationmerge_ready

We did the master reverts and applied this patch in #32984.
In this ticket, we'll track backporting the change to 0.3.5 and later.

comment:29 Changed 7 months ago by teor

Summary: pubsub_pub_Initialise pubsub in Windows NT service mode

comment:30 Changed 7 months ago by teor

Keywords: consider-backport-after-0431 added; consider-backport-after-043-stable removed

Since this breaks NT services, let's backport it sooner.

comment:31 Changed 6 months ago by teor

Milestone: Tor: 0.4.2.x-finalTor: 0.4.1.x-final
Resolution: fixed
Status: merge_readyclosed

Merged to 0.4.1 and later.

Merged with #29819.

comment:32 Changed 4 months ago by cypherpunks

The: https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases/042Retrospective references mention this ticket with this "Bug Could be prevented by ... Dropping support for NT services"
Dropping its feature entirely can't be the final solution?
Don't drop my windows relays please. 😓 I wouldn't report this if i had knew before.

comment:33 in reply to:  32 Changed 4 months ago by teor

Cc: dgoulet nickm added

Replying to cypherpunks:

The: https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases/042Retrospective references mention this ticket with this "Bug Could be prevented by ... Dropping support for NT services"
Dropping its feature entirely can't be the final solution?
Don't drop my windows relays please. 😓 I wouldn't report this if i had knew before.

Don't worry, we're not planning on dropping Windows relay support any time soon.

We brainstorm lots of different ideas during our release retrospectives. If you want to see what we're actually going to do, look at the "POSSIBLE LESSONS TO TAKE FROM THIS RETROSPECTIVE" section.

In particular, we said:

Some parts of our code base are identifiably under-tested or badly held together. They include:

...
Our Windows testing and QA is severely underpowered.

You can help us test our Windows code by running Windows relays, and reporting bugs. If you can, run our alpha releases, so you catch bugs early. (If people don't report bugs in a feature, then we might assume no-one is using it. That makes it more likely to be dropped.)

I didn't do this detailed analysis. I've CC'd the people who did, in case they want to comment more.

Note: See TracTickets for help on using tickets.