Opened 5 weeks ago

Closed 4 weeks ago

#23581 closed defect (fixed)

Die more helpfully if Schedulers option isn't compatible with platform

Reported by: pastly Owned by: pastly
Priority: Medium Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

KIST isn't supported on OS X. The user is being dumb and setting Schedulers to just KIST, so we can't fallback to KISTLite or even Vanilla. Therefore select_scheduler() doesn't set the_scheduler and thus the_scheduler->init() doesn't work in set_scheduler().

This isn't a problem during a HUP because we already have a scheduler. select_scheduler() doesn't change anything and we keep on running with whatever scheduler we were using before.

cranium:tor mtraudt$ cat torrc
DataDirectory data
PidFile data/tor.pid
Socks5Proxy 127.0.0.1:2343
Schedulers KIST
Log [sched]info [~sched]notice stdout
cranium:tor mtraudt$ ./src/or/tor -f torrc
Sep 19 09:33:26.370 [notice] Tor 0.3.2.1-alpha-dev (git-472858d629252ae8) running on Darwin with Libevent 2.1.8-stable, OpenSSL 1.0.2l, Zlib 1.2.5, Liblzma 5.2.3, and Libzstd N/A.
Sep 19 09:33:26.371 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Sep 19 09:33:26.371 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Sep 19 09:33:26.371 [notice] Read configuration file "/private/tmp/tor/torrc".
Sep 19 09:33:26.373 [warn] Path for DataDirectory (data) is relative and will resolve to /private/tmp/tor/data. Is this what you wanted?
Sep 19 09:33:26.374 [warn] Path for PidFile (data/tor.pid) is relative and will resolve to /private/tmp/tor/data/tor.pid. Is this what you wanted?

============================================================ T= 1505828006
Tor 0.3.2.1-alpha-dev (git-472858d629252ae8) died: Caught signal 11
0   tor                                 0x000000010a4e38d9 crash_handler + 57
1   libsystem_platform.dylib            0x00007fff9307d52a _sigtramp + 26
2   tor                                 0x000000010a4d2a84 set_scheduler + 164
3   tor                                 0x000000010a4d34eb scheduler_init + 347
4   tor                                 0x000000010a346115 options_act_reversible + 453
5   tor                                 0x000000010a345c24 set_options + 68
6   tor                                 0x000000010a350991 options_init_from_string + 1697
7   tor                                 0x000000010a34fa27 options_init_from_torrc + 1399
8   tor                                 0x000000010a42f833 tor_init + 1523
9   tor                                 0x000000010a430147 tor_main + 103
10  tor                                 0x000000010a2fb350 main + 48
11  libdyld.dylib                       0x00007fff9e1765ad start + 1
Abort trap: 6

Child Tickets

Change History (15)

comment:1 Changed 5 weeks ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.2.x-final

This is something we should try to do in 0.3.2 IMO; crashing is something we should try to avoid.

comment:2 Changed 5 weeks ago by dgoulet

I see two possible behaviors to solve this:

  1. We exit(1) if select_scheduler() can't set a scheduler. So in this case, you tried to put KIST only on a platform that doesn't support it, we stop and ask you to fix your torrc.
  1. We fallback to some default (KISTLite for instance) and warn the user.

The approach in (1) has a bad side effect of if at runtime the new type is not usable well tor will just stop... As for (2), it is very possible the user just don't notice the fallback and end up with a tor scheduler it didn't want (not sure that's really a big issue...)

For now, I'm for (2) ? Objections?

comment:3 Changed 5 weeks ago by pastly

If we've made it to runtime, we must have a the_scheduler. Tor shouldn't just stop.

I don't like (2) because I don't like fallback strategies that violate the Schedulers torrc option. We already do this in one place now[0], and I think that one place is a necessary evil.

What if we implicitly add "Vanilla" to the end of any Schedulers torrc option that doesn't already have "Vanilla" in it somewhere? Then we'd always have a lowest common denominator and guaranteed-to-run-on-your-system option to fallback to if worst comes to worst.

[0] KIST is working as usual, then boom support in the kernel drops out from under us. We log a notice about the sudden issue, but continue to call ourselves "KIST" (not "KISTLite"). We can do this fallback even if we don't have KISTLite in Schedulers

comment:4 Changed 5 weeks ago by nickm

I think that (1) is fine with me; if the user said "KIST or nothing", let's respect their wishes.

comment:5 Changed 5 weeks ago by dgoulet

Status: newneeds_review

(1) is the one.

Branch: bug23581_032_01

You'll see the interesting exit(1) in there because since the scheduler can be changed during runtime at any point, we do have to trigger the cleanup ourselves.

(This will conflicts with #23552 but at least we have a baseline now.)

comment:6 Changed 5 weeks ago by pastly

Tests fine for me on OS X. It does kill the Tor process if I switch to Schedulers KIST and HUP. Making Schedulers a consensus param would give the auths power to kill non-linux relays, so let's not do that. (It would require a ton of work anyway, because we can only do int consensus params currently).

comment:7 Changed 4 weeks ago by nickm

Owner: set to dgoulet
Status: needs_reviewassigned

setting owner

comment:8 Changed 4 weeks ago by nickm

Status: assignedneeds_review

comment:9 Changed 4 weeks ago by nickm

Status: needs_reviewneeds_revision

Minor issues:

  • The "unable to select a scheduler" message should be log_err(), since it is fatal.
  • This is going to conflict with #23552, right?

comment:10 Changed 4 weeks ago by pastly

Status: needs_revisionneeds_review

bug23581_032_02 on https://github.com/pastly/public-tor.git

Changed to log_err and rebased on current master.

Yes it will likely conflict with #23552. I think this should be merged first since it is simpler.

comment:11 Changed 4 weeks ago by pastly

Owner: changed from dgoulet to pastly
Status: needs_reviewassigned

comment:12 Changed 4 weeks ago by pastly

Status: assignedneeds_review

comment:13 Changed 4 weeks ago by dgoulet

Tiny nitpick, extra indent with the last two lines of the text:

+    log_err(LD_SCHED, "Tor was unable to select a scheduler type. Please "
+                       "make sure Schedulers is correctly configured with "
+                       "what Tor does support.");

Apart from that, lgtm. Once this is fixed , you can just merge_ready it. Thanks!

comment:14 Changed 4 weeks ago by pastly

Status: needs_reviewmerge_ready

comment:15 Changed 4 weeks ago by nickm

Resolution: fixed
Status: merge_readyclosed

merged!

Note: See TracTickets for help on using tickets.