Opened 18 months ago

Last modified 6 months ago

#24423 assigned defect

Fix STACK warnings in Tor

Reported by: nickm Owned by: catalyst
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: review-group-27, review-group-29, 033-triage-20180320, 033-removed-20180320
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Georg got STACK ( ​http://css.csail.mit.edu/stack/) to run, and ran it on Tor. Let's fix the warnings.

Child Tickets

Attachments (2)

pstack_tls.txt (1.4 KB) - added by gk 18 months ago.
pstack_test.txt (315 bytes) - added by gk 18 months ago.

Download all attachments as: .zip

Change History (47)

comment:1 Changed 18 months ago by nickm

I have fixes in my branch stack_fixes_032, with each issue getting a separate commit. I have not re-run this branch under STACK.

There seem to be two kinds of issues here:

1) We checked to see whether something that can't possibly be negative is negative. Sometimes this means that we had done the arithmetic wrong; sometimes it means that the check was erroneous.
2) We check for NULL-ness of a pointer that can't possibly be NULL. This usually means that the NULL check was redundant.

I'd like a second opinion on all the arithmetic fixes, especially to know if you think they represent a real bug that needs a separate ticket.

comment:2 Changed 18 months ago by nickm

Status: assignedneeds_review

comment:3 Changed 18 months ago by gk

FWIW: STACK does not like your fix in tortls.c (attached is the new output) all other warnings are gone.

Changed 18 months ago by gk

Attachment: pstack_tls.txt added

comment:4 Changed 18 months ago by gk

There is one warning in test you might want to look at while you are at it (see attached output). What about the code in ext? Is it worthwhile looking at it, too?

Changed 18 months ago by gk

Attachment: pstack_test.txt added

comment:5 Changed 18 months ago by pastly

I do not believe the KIST fix in e250ce27f8ca53003e35be125248b669d0291b30 needs backporting.

edit: Does "backporting" even make sense in the context of KIST? In any case, I don't think the KIST part is a big deal.

Last edited 18 months ago by pastly (previous) (diff)

comment:6 Changed 18 months ago by nickm

Thanks, pastly!

Thanks for looking at test, gk. Yes, I do think it's worthwhile looking at ext -- it's not our code, but we're shipping it, so we ought to make sure it's right.

comment:7 Changed 18 months ago by dgoulet

Commit e250ce27f8ca53003e35be125248b669d0291b30 is actually fixing an overflow I've observed with KIST so ++ in 032.

comment:8 in reply to:  6 Changed 18 months ago by gk

Replying to nickm:

Thanks, pastly!

Thanks for looking at test, gk. Yes, I do think it's worthwhile looking at ext -- it's not our code, but we're shipping it, so we ought to make sure it's right.

No warnings in ext.

comment:9 Changed 18 months ago by teor

They all seem sensible to me, but I haven't run them.
Does the new arithmetic pass --fragile-hardening on x86_64 and i386?

comment:10 Changed 18 months ago by nickm

Does the new arithmetic pass --fragile-hardening on x86_64 and i386?

It does for me on x86_64.

comment:11 Changed 18 months ago by dgoulet

Quick note, this has a very important fix for KIST so I'm "ACK-ing" that one for 032.

comment:12 Changed 18 months ago by nickm

Keywords: review-group-27 added

comment:13 Changed 18 months ago by catalyst

Reviewer: catalyst

comment:14 Changed 18 months ago by catalyst

According to today's meeting the KIST change should get its own ticket.

comment:15 Changed 18 months ago by catalyst

Status: needs_reviewmerge_ready

The code changes look good to me! Possible minor documentation improvements below. Feel free to merge without another round of review after any doc changes.

Spelling ("happend") in changes file.

In d8f0417799f8eaa0865cd8ba24e04f7689c99d78 maybe clarify in the commit message what the STACK warning is about? (Is it a potential signed underflow?)

In 8922986c3190bc3cb0b9036d7e024817c307bd6a "swapping over into unsigned integers" seems like it's mostly an issue for 32-bit platforms; should that be mentioned in the commit message?

In 370916f75e2f00ea4b6f0f1f1ccf3df04504aab7 maybe clarify in the commit message that new_element() can't return NULL because tor_malloc_zero() can't return NULL? Similarly for 4b2f8c35f69744632e14684e76dc5484202dc7ae.

comment:16 in reply to:  14 Changed 18 months ago by dgoulet

Replying to catalyst:

According to today's meeting the KIST change should get its own ticket.

#24590

comment:17 Changed 18 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final
Status: merge_readyaccepted

Actually, gk said that the tortls one didn't work, so I'm not merging that. I've made a new branch with the changes you suggested, and merged it to master. I removed the #24590 commit, since I merged that separately.

Putting back in accepted for the tortls issue. (gk, do I understand correctly that I never figured out how to make stack stop complaining about that?)

comment:18 in reply to:  17 Changed 18 months ago by gk

Replying to nickm:

Putting back in accepted for the tortls issue. (gk, do I understand correctly that I never figured out how to make stack stop complaining about that?)

Correct.

comment:19 Changed 18 months ago by catalyst

It looks like STACK is complaining about the comparison on line 498 if (earliest_start_time >= now) being simplified based on the assumption that the earliest_start_time computation doesn't overflow or underflow, and therefore the algebraic equivalences hold (cancelling the now from the comparison). The added tor_assert(cert_lifetime <= INT_MAX) adds some constraints to cert_lifetime but apparently that's not enough. STACK doesn't seem to know any constraints on now, so maybe as far as it's concerned, the subtraction could still underflow. (or the additions could overflow)

Perhaps the comparison on line 498 should be if (cert_lifetime < min_real_lifetime + start_granularity), with the original earliest_start_time computation moved into an else clause.

comment:20 Changed 18 months ago by catalyst

Hm maybe that should be if (cert_lifetime <= min_real_lifetime + start_granularity || cert_lifetime > now) because the subtraction for earliest_start_time could still underflow. (also fixed the comparison operator that I had wrong before)

Then again you could also start getting mixed signed/unsigned comparison warnings so maybe we need to use a signed intermediate.

comment:21 Changed 18 months ago by nickm

Status: acceptedneeds_review

I've tried the first approach you suggested, and didn't get signed/unsigned warnings, so lt's try it out.

gk, does my stack_again_032 branch fix the warning for you?

comment:22 in reply to:  21 ; Changed 18 months ago by gk

Replying to nickm:

I've tried the first approach you suggested, and didn't get signed/unsigned warnings, so lt's try it out.

gk, does my stack_again_032 branch fix the warning for you?

Looks good now. Just the warning in test is still there (see comment:4 and the respective attachment).

comment:23 in reply to:  22 Changed 17 months ago by catalyst

Replying to gk:

Replying to nickm:

I've tried the first approach you suggested, and didn't get signed/unsigned warnings, so lt's try it out.

gk, does my stack_again_032 branch fix the warning for you?

Looks good now. Just the warning in test is still there (see comment:4 and the respective attachment).

That looks like another instance of tor_malloc_zero() (called from new_fake_channel()) always returning non-NULL (assuming the line numbers on maint-0.3.2 are still correct), so this conditional in test_channel_flushmux() is always true:

    762   if (ch)
    763     circuitmux_free(ch->cmux);

Even better, there's also

    718   ch = new_fake_channel();
    719   tt_assert(ch);

so that's another reason why it can't be NULL, along with other unconditional dereferences.

comment:24 Changed 17 months ago by catalyst

Status: needs_reviewneeds_revision

back to needs_revision for the warning in the test

comment:25 Changed 17 months ago by nickm

merged stack_again_032.

comment:26 Changed 17 months ago by nickm

Status: needs_revisionneeds_review

The test_channel_flushmux() test no longer exists in 0.3.3, since channel queues no longer exist. (See #23709.)

If we want, we can remove all the redundant checks for whether new_fake_channel() has failed -- I have those in a branch called one_more_stack_fix, but I suggest that we don't actually take them, since these don't actually seem to cause stack warnings.

comment:27 Changed 17 months ago by catalyst

Status: needs_reviewneeds_revision

The one_more_stack_fix changes probably aren't necessary on master. I think STACK wouldn't complain about the tt_assert(ch) calls because they don't come after dereferences of ch. The redundant check in the cleanup of test_channel_flushmux() on 0.3.2 probably should still be deleted if we would like to make 0.3.2 STACK-clean.

comment:28 Changed 17 months ago by nickm

I think it's okay for us to leave things as they are then with the fixes on this ticket above. STACK-cleanness can be for master.

GeKo sent me another warning, though: making a _new_ patch for that...

comment:29 Changed 17 months ago by nickm

Here's the remaining issue:

bug: anti-dce
model: |
  sw.default.i:
  %.b.i =3D load i1* @networkstatus_get_flavor_name.warning_logged__, !dbg
!1116
  br i1 %.b.i, label %if.end.i, label %if.then.i, !dbg !1121, !macro !1123
stack:
  -
/home/thomas/Arbeit/hardening/stack/build36/tor_test/../../../../Tor/tor/src/or/networkstatus.c:2052:34
ncore: 1
core:
  -
/home/thomas/Arbeit/hardening/stack/build36/tor_test/../../../../Tor/tor/src/or/networkstatus.c:2049:28
    - buffer overflow

and here's the code, with the lines marked.

  for (i=0; i<N_CONSENSUS_FLAVORS; ++i) {
    consensus_waiting_for_certs_t *waiting = &consensus_waiting_for_certs[i];
    if (!waiting->consensus)
      continue;
    if (networkstatus_check_consensus_signature(waiting->consensus, 0)>=0) {
      char *waiting_body = waiting->body; // 2049
      if (!networkstatus_set_current_consensus(
                                 waiting_body,
                                 networkstatus_get_flavor_name(i), // 2052
                                 NSSET_WAS_WAITING_FOR_CERTS,
                                 source_dir)) {
        tor_free(waiting_body);
      }
    }

What's I think is happening here is that the compiler sees that waiting->body is computed, and so realizes that "i" must be in range 0 <= i < N_CONSENSUS_FLAVORS. This could be used to eliminate the assertion and default case in networkstatus_get_flavor_name() when it's inlined.

comment:30 Changed 17 months ago by nickm

Status: needs_revisionneeds_review

The trivial workaround here is to compute the flavor name first as in my stack_em_up branch. This doesn't actually change anything or fix anything real, so I'm not doing a changes file on it.

comment:31 Changed 17 months ago by catalyst

You're saying STACK is warning about the tor_fragile_assert() in networkstatus_get_flavor_name() being eliminated as dead code as a consequence of a buffer overflow not occurring when looping over N_CONSENSUS_FLAVORS? Is the problem partly that STACK doesn't see tor_fragile_assert() as intrinsically unreachable because by default it's nonfatal?

comment:32 Changed 17 months ago by nickm

I honestly can't tell, but the warning is about the warning_logged__ variable, which does seem to be part of the fragile assert.

Last edited 17 months ago by nickm (previous) (diff)

comment:33 Changed 17 months ago by catalyst

Do we get this STACK warning if we build with ALL_BUGS_ARE_FATAL? If not, I'm inclined to leave it as is and recommend that people run STACK with that defined.

comment:34 Changed 17 months ago by nickm

gk, question for you: (see above)

Last edited 17 months ago by nickm (previous) (diff)

comment:35 Changed 17 months ago by nickm

Keywords: review-group-29 added

comment:36 in reply to:  34 Changed 17 months ago by cypherpunks

Replying to nickm:

gk, question for you: (see above)

Cc him.

comment:37 in reply to:  33 Changed 16 months ago by gk

Replying to catalyst:

Do we get this STACK warning if we build with ALL_BUGS_ARE_FATAL? If not, I'm inclined to leave it as is and recommend that people run STACK with that defined.

No, that warning does not show up anymore. However, a new one pops up (with commit 29784d80faab3d8307916c7a945e89fe082e3db0) in that case:

---
bug: anti-simplify
model: |
  %tobool19 = icmp ne %struct.workerthread_s* %27, null, !dbg !315
  -->  true
stack: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:508:9
ncore: 1
core: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:337:3
    - null pointer dereference
---
bug: anti-simplify
model: |
  %tobool19 = icmp ne %struct.workerthread_s* %27, null, !dbg !315
  -->  true
stack: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:508:9
ncore: 1
core: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:337:3
    - null pointer dereference

The patch in stack_em_up on the other hand solves this problem without creating new warnings. I think that's preferable to keeping a list of false positives (because ALL_BUGS_ARE_FATAL wouldn't be defined in any release build anyway).

comment:38 Changed 16 months ago by nickm

Status: needs_reviewmerge_ready

okay; I'll merge stack_em_up unless somebody objects, and look for solutions for those other issues.

comment:39 Changed 16 months ago by catalyst

I think ALL_BUGS_ARE_FATAL is good to use with STACK because we use it with other static analysis tools for similar reasons. (It helps tools figure out what code is intentionally dead and decreases the noise level.)

I kind of understand why the new error is happening with ALL_BUGS_ARE_FATAL but I can't describe it well yet.

comment:40 Changed 16 months ago by nickm

I've merged stack_em_up. Tossing this one to catalyst for the ALL_BUGS_ARE_FATAL warnings; but please toss it back to me if it's not an easy fix.

comment:41 Changed 16 months ago by nickm

Owner: changed from nickm to catalyst
Status: merge_readyassigned

comment:42 Changed 14 months ago by nickm

Keywords: 033-triage-20180320 added

Marking all tickets reached by current round of 033 triage.

comment:43 Changed 14 months ago by nickm

Keywords: 033-removed-20180320 added

Mark all not-already-included tickets as pending review for removal from 0.3.3 milestone.

comment:44 Changed 14 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: unspecified

These tickets were marked as removed, and nobody has said that they can fix them. Let's remember to look at 033-removed-20180320 as we re-evaluate our triage process, to see whether we're triaging out unnecessarily, and to evaluate whether we're deferring anything unnecessarily. But for now, we can't do these: we need to fix the 033-must stuff now.

comment:45 Changed 6 months ago by catalyst

Reviewer: catalyst

Delete myself as reviewer now that this ticket is assigned to me.

Note: See TracTickets for help on using tickets.