Opened 15 months ago

Closed 8 months ago

#31634 closed enhancement (fixed)

Check .may_include order and tor subsystem init order are compatible

Reported by: teor Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: diagnostics, practracker
Cc: gaba Actual Points: 1
Parent ID: Points: 2
Reviewer: teor Sponsor:

Description

In #31615, we discovered that our module init order doesn't match their dependency order.

Let's use practracker and tor to make sure that doesn't happen again. We'll probably need a new make check target, because we'll need a compiled tor and practracker output. See #31615 for details.

Gaba, this is Sponsor 31-can, because it helps us catch refactoring bugs when we create new modules.

Child Tickets

TicketTypeStatusOwnerSummary
#33315defectclosedHave a way, for every subsystem, to list which directory it belongs to
#33316defectclosednickmRe-order early subsystem init order to match dependency order

Change History (21)

comment:1 Changed 15 months ago by teor

We might also want to check against the order in subsystem_list.c.

comment:2 in reply to:  1 ; Changed 15 months ago by nickm

Replying to teor:

We might also want to check against the order in subsystem_list.c.

Yes, but have a look at check_and_setup() in subsysmgr.c -- that causes tor to exit with an assertion failure on startup if the subsystems are not sorted by priority.

comment:3 in reply to:  2 Changed 15 months ago by teor

Replying to nickm:

Replying to teor:

We might also want to check against the order in subsystem_list.c.

Yes, but have a look at check_and_setup() in subsysmgr.c -- that causes tor to exit with an assertion failure on startup if the subsystems are not sorted by priority.

So we don't need to check the order in subsystem_list.c, because tor does it for us. That's good.

comment:4 Changed 15 months ago by nickm

Type: defectenhancement

Mark a number of current 0.4.2.x "defects" as "enhancements."

comment:5 Changed 15 months ago by teor

Parent ID: #31615

comment:6 Changed 14 months ago by ahf

Owner: set to nickm
Status: newassigned

Distributing 0.4.2 tickets between network team members.

comment:7 Changed 14 months ago by nickm

Keywords: 042-should? added; 042-should removed
Reviewer: teor
Status: assignedneeds_review

I have started this, but I believe it is bigger than we should do in 0.4.2, and we should defer part of it till 0.4.3. Here is why:

  • Our subsystems do not currently to our directory structures. Notably, these subsystems are defined in with names that do not match their locations: btrack, network, ocirc_event, oconn_event, relay, threads, tortls, winprocess.
  • Our subsystem order does not currently match the topology very well at all: there are four discontinuities on the ordering. I think they are caused by: btrack, compress, winprocess, threads. We should re-order these, but doing so will require us to re-organize some of our code. That seems like a stability risk.

I have checked in a work-in-progress version of the code to do these checks as ticket31634. Do you think it's reasonable to defer to 0.4.3 based on the above reasoning? If not, please re-assign to me for 0.4.2, but this just got bigger than we anticipated.

comment:8 Changed 14 months ago by nickm

(Right now I am not asking for review on the code per se, since it is not ready. I'm asking for review on the decision to defer this to 0.4.3 when we willing to take more stability risks.)

comment:9 Changed 14 months ago by teor

Keywords: 043-should added; 042-should? removed
Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final
Status: needs_reviewneeds_revision

Yes, I think we should make big changes like this in 0.4.3.

And in 0.4.2, we should avoid:

  • changes to .may_include
  • changes to the module init order

comment:10 Changed 10 months ago by gaba

Sponsor: Sponsor31-can

No more sponsor 31. All this tickets remained open after sponsor 31 ended.

comment:11 Changed 10 months ago by nickm

Keywords: 043-should removed
Milestone: Tor: 0.4.3.x-finalTor: unspecified

I think this belongs in 'unspecified' for now. The changes we need here are significant refactoring, and don't belong in a cleanup phase.

comment:12 Changed 10 months ago by nickm

I'm making child tickets for 044 for the parts that we can try without too much effort, though.

comment:13 Changed 10 months ago by nickm

Actual Points: 0.5
Milestone: Tor: unspecifiedTor: 0.4.4.x-final
Status: needs_revisionneeds_review

I have a working script to do this that actually passes now, in ticket31634. There is a PR in https://github.com/torproject/tor/pull/1741, based the code in #33316 -- we should review and merge #33316 first.

comment:14 Changed 9 months ago by teor

Status: needs_reviewneeds_revision

Sorry it's taken me a while to review this, I've been busy with some urgent things the past few weeks.

The code looks good, I added comments about a few typos and tweaks.

There are a few conflicts between this PR and master.

I think I ran tor --dbg-dump-subsystem-list and practracker/includes.py --check-subsystem-order the way you expect, and got the expected results. But I'm not sure.

Can we add the relevant commands to make check?
Then I can run it locally, and in CI, and confirm we got the expected results.

comment:15 Changed 8 months ago by nickm

Status: needs_revisionneeds_review

Okay, I've made an updated branch incorporating your suggestions, completely removing a script that apparently isn't what I had working, and rebasing on master to fix merge issues.

It's ticket31634_v2 with PR at https://github.com/torproject/tor/pull/1838

comment:16 Changed 8 months ago by teor

Status: needs_reviewneeds_revision

Travis distcheck CI failed with:

Traceback (most recent call last):
  File "/home/travis/build/torproject/tor/tor-0.4.4.0-alpha-dev/_build/sub/../../scripts/maint/practracker/includes.py", line 348, in <module>
    main(sys.argv)
  File "/home/travis/build/torproject/tor/tor-0.4.4.0-alpha-dev/_build/sub/../../scripts/maint/practracker/includes.py", line 345, in main
    check_subsystem_order=args.check_subsystem_order)
  File "/home/travis/build/torproject/tor/tor-0.4.4.0-alpha-dev/_build/sub/../../scripts/maint/practracker/includes.py", line 309, in run_check_includes
    if check_subsys_file(check_subsystem_order, uses_dirs):
  File "/home/travis/build/torproject/tor/tor-0.4.4.0-alpha-dev/_build/sub/../../scripts/maint/practracker/includes.py", line 275, in check_subsys_file
    if fname in uses_closure[prev]:
KeyError: u'lib/llharden'
FAIL scripts/maint/run_check_subsystem_order.sh (exit status: 1)

Appveyor CI failed with a similar error:

Traceback (most recent call last):
  File "C:/projects/tor/x86_64-w64-mingw32/../scripts/maint/practracker/includes.py", line 348, in <module>
    main(sys.argv)
  File "C:/projects/tor/x86_64-w64-mingw32/../scripts/maint/practracker/includes.py", line 341, in main
    run_check_includes(topdir=args.topdir,
  File "C:/projects/tor/x86_64-w64-mingw32/../scripts/maint/practracker/includes.py", line 309, in run_check_includes
    if check_subsys_file(check_subsystem_order, uses_dirs):
  File "C:/projects/tor/x86_64-w64-mingw32/../scripts/maint/practracker/includes.py", line 275, in check_subsys_file
    if fname in uses_closure[prev]:
KeyError: 'lib/llharden'
FAIL scripts/maint/run_check_subsystem_order.sh (exit status: 1)

Otherwise the changes look good.

comment:17 Changed 8 months ago by nickm

Actual Points: 0.51
Status: needs_revisionneeds_review

Wow, there were a few issues there. I've added on commit to fix out-of-tree builds from git, and another to fix distcheck builds.

comment:18 Changed 8 months ago by teor

Status: needs_reviewneeds_revision

These changes look good, and distcheck now passes.

But I'm still seeing a similar error in appveyor.

I don't entirely understand the script. But I wonder if os.path.relpath() would work in normalize_srcdir(). It seems like the regular expressions depend on unix paths.

See:
https://docs.python.org/3/library/os.path.html#os.path.relpath

That's how add_c_file.py does it:
https://github.com/torproject/tor/blob/master/scripts/maint/add_c_file.py#L41

I'm not sure if you'll need os.path.normpath() here, because it modifies the slashes in the string:
https://docs.python.org/3/library/os.path.html#os.path.normpath

I don't want to block this ticket, so I would also accept:

  • opening an 044-should ticket to fix this check on Windows
  • adding a comment to any Windows-only subsystems, reminding us to double-check their order

comment:19 Changed 8 months ago by nickm

Status: needs_revisionneeds_review

Relpath wasn't quite right -- it wants to look at the actual filesystem and we didn't want that.

I went with a harsher approach, and now CI is passing

comment:20 in reply to:  19 Changed 8 months ago by teor

Status: needs_reviewmerge_ready

Replying to nickm:

Relpath wasn't quite right -- it wants to look at the actual filesystem and we didn't want that.

If you pass "start" to relpath(), it shouldn't look at the filesystem at all:
"This is a path computation: the filesystem is not accessed to confirm the existence or nature of path or start."
https://docs.python.org/3/library/os.path.html#os.path.relpath

I went with a harsher approach, and now CI is passing

Great! The new code looks good.

comment:21 Changed 8 months ago by teor

Resolution: fixed
Status: merge_readyclosed

Squashed and merged to master.

Thanks!

Note: See TracTickets for help on using tickets.