Opened 6 months ago

Last modified 2 months ago

#24423 assigned defect

Fix STACK warnings in Tor

Reported by: nickm Owned by: catalyst
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: review-group-27, review-group-29, 033-triage-20180320, 033-removed-20180320
Cc: Actual Points:
Parent ID: Points:
Reviewer: catalyst Sponsor:

Description

Georg got STACK ( ​http://css.csail.mit.edu/stack/) to run, and ran it on Tor. Let's fix the warnings.

Child Tickets

Attachments (2)

pstack_tls.txt (1.4 KB) - added by gk 6 months ago.
pstack_test.txt (315 bytes) - added by gk 6 months ago.

Download all attachments as: .zip

Change History (46)

comment:1 Changed 6 months ago by nickm

I have fixes in my branch stack_fixes_032, with each issue getting a separate commit. I have not re-run this branch under STACK.

There seem to be two kinds of issues here:

1) We checked to see whether something that can't possibly be negative is negative. Sometimes this means that we had done the arithmetic wrong; sometimes it means that the check was erroneous.
2) We check for NULL-ness of a pointer that can't possibly be NULL. This usually means that the NULL check was redundant.

I'd like a second opinion on all the arithmetic fixes, especially to know if you think they represent a real bug that needs a separate ticket.

comment:2 Changed 6 months ago by nickm

Status: assignedneeds_review

comment:3 Changed 6 months ago by gk

FWIW: STACK does not like your fix in tortls.c (attached is the new output) all other warnings are gone.

Changed 6 months ago by gk

Attachment: pstack_tls.txt added

comment:4 Changed 6 months ago by gk

There is one warning in test you might want to look at while you are at it (see attached output). What about the code in ext? Is it worthwhile looking at it, too?

Changed 6 months ago by gk

Attachment: pstack_test.txt added

comment:5 Changed 6 months ago by pastly

I do not believe the KIST fix in e250ce27f8ca53003e35be125248b669d0291b30 needs backporting.

edit: Does "backporting" even make sense in the context of KIST? In any case, I don't think the KIST part is a big deal.

Last edited 6 months ago by pastly (previous) (diff)

comment:6 Changed 6 months ago by nickm

Thanks, pastly!

Thanks for looking at test, gk. Yes, I do think it's worthwhile looking at ext -- it's not our code, but we're shipping it, so we ought to make sure it's right.

comment:7 Changed 6 months ago by dgoulet

Commit e250ce27f8ca53003e35be125248b669d0291b30 is actually fixing an overflow I've observed with KIST so ++ in 032.

comment:8 in reply to:  6 Changed 6 months ago by gk

Replying to nickm:

Thanks, pastly!

Thanks for looking at test, gk. Yes, I do think it's worthwhile looking at ext -- it's not our code, but we're shipping it, so we ought to make sure it's right.

No warnings in ext.

comment:9 Changed 6 months ago by teor

They all seem sensible to me, but I haven't run them.
Does the new arithmetic pass --fragile-hardening on x86_64 and i386?

comment:10 Changed 6 months ago by nickm

Does the new arithmetic pass --fragile-hardening on x86_64 and i386?

It does for me on x86_64.

comment:11 Changed 6 months ago by dgoulet

Quick note, this has a very important fix for KIST so I'm "ACK-ing" that one for 032.

comment:12 Changed 6 months ago by nickm

Keywords: review-group-27 added

comment:13 Changed 5 months ago by catalyst

Reviewer: catalyst

comment:14 Changed 5 months ago by catalyst

According to today's meeting the KIST change should get its own ticket.

comment:15 Changed 5 months ago by catalyst

Status: needs_reviewmerge_ready

The code changes look good to me! Possible minor documentation improvements below. Feel free to merge without another round of review after any doc changes.

Spelling ("happend") in changes file.

In d8f0417799f8eaa0865cd8ba24e04f7689c99d78 maybe clarify in the commit message what the STACK warning is about? (Is it a potential signed underflow?)

In 8922986c3190bc3cb0b9036d7e024817c307bd6a "swapping over into unsigned integers" seems like it's mostly an issue for 32-bit platforms; should that be mentioned in the commit message?

In 370916f75e2f00ea4b6f0f1f1ccf3df04504aab7 maybe clarify in the commit message that new_element() can't return NULL because tor_malloc_zero() can't return NULL? Similarly for 4b2f8c35f69744632e14684e76dc5484202dc7ae.

comment:16 in reply to:  14 Changed 5 months ago by dgoulet

Replying to catalyst:

According to today's meeting the KIST change should get its own ticket.

#24590

comment:17 Changed 5 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final
Status: merge_readyaccepted

Actually, gk said that the tortls one didn't work, so I'm not merging that. I've made a new branch with the changes you suggested, and merged it to master. I removed the #24590 commit, since I merged that separately.

Putting back in accepted for the tortls issue. (gk, do I understand correctly that I never figured out how to make stack stop complaining about that?)

comment:18 in reply to:  17 Changed 5 months ago by gk

Replying to nickm:

Putting back in accepted for the tortls issue. (gk, do I understand correctly that I never figured out how to make stack stop complaining about that?)

Correct.

comment:19 Changed 5 months ago by catalyst

It looks like STACK is complaining about the comparison on line 498 if (earliest_start_time >= now) being simplified based on the assumption that the earliest_start_time computation doesn't overflow or underflow, and therefore the algebraic equivalences hold (cancelling the now from the comparison). The added tor_assert(cert_lifetime <= INT_MAX) adds some constraints to cert_lifetime but apparently that's not enough. STACK doesn't seem to know any constraints on now, so maybe as far as it's concerned, the subtraction could still underflow. (or the additions could overflow)

Perhaps the comparison on line 498 should be if (cert_lifetime < min_real_lifetime + start_granularity), with the original earliest_start_time computation moved into an else clause.

comment:20 Changed 5 months ago by catalyst

Hm maybe that should be if (cert_lifetime <= min_real_lifetime + start_granularity || cert_lifetime > now) because the subtraction for earliest_start_time could still underflow. (also fixed the comparison operator that I had wrong before)

Then again you could also start getting mixed signed/unsigned comparison warnings so maybe we need to use a signed intermediate.

comment:21 Changed 5 months ago by nickm

Status: acceptedneeds_review

I've tried the first approach you suggested, and didn't get signed/unsigned warnings, so lt's try it out.

gk, does my stack_again_032 branch fix the warning for you?

comment:22 in reply to:  21 ; Changed 5 months ago by gk

Replying to nickm:

I've tried the first approach you suggested, and didn't get signed/unsigned warnings, so lt's try it out.

gk, does my stack_again_032 branch fix the warning for you?

Looks good now. Just the warning in test is still there (see comment:4 and the respective attachment).

comment:23 in reply to:  22 Changed 5 months ago by catalyst

Replying to gk:

Replying to nickm:

I've tried the first approach you suggested, and didn't get signed/unsigned warnings, so lt's try it out.

gk, does my stack_again_032 branch fix the warning for you?

Looks good now. Just the warning in test is still there (see comment:4 and the respective attachment).

That looks like another instance of tor_malloc_zero() (called from new_fake_channel()) always returning non-NULL (assuming the line numbers on maint-0.3.2 are still correct), so this conditional in test_channel_flushmux() is always true:

    762   if (ch)
    763     circuitmux_free(ch->cmux);

Even better, there's also

    718   ch = new_fake_channel();
    719   tt_assert(ch);

so that's another reason why it can't be NULL, along with other unconditional dereferences.

comment:24 Changed 5 months ago by catalyst

Status: needs_reviewneeds_revision

back to needs_revision for the warning in the test

comment:25 Changed 5 months ago by nickm

merged stack_again_032.

comment:26 Changed 5 months ago by nickm

Status: needs_revisionneeds_review

The test_channel_flushmux() test no longer exists in 0.3.3, since channel queues no longer exist. (See #23709.)

If we want, we can remove all the redundant checks for whether new_fake_channel() has failed -- I have those in a branch called one_more_stack_fix, but I suggest that we don't actually take them, since these don't actually seem to cause stack warnings.

comment:27 Changed 5 months ago by catalyst

Status: needs_reviewneeds_revision

The one_more_stack_fix changes probably aren't necessary on master. I think STACK wouldn't complain about the tt_assert(ch) calls because they don't come after dereferences of ch. The redundant check in the cleanup of test_channel_flushmux() on 0.3.2 probably should still be deleted if we would like to make 0.3.2 STACK-clean.

comment:28 Changed 4 months ago by nickm

I think it's okay for us to leave things as they are then with the fixes on this ticket above. STACK-cleanness can be for master.

GeKo sent me another warning, though: making a _new_ patch for that...

comment:29 Changed 4 months ago by nickm

Here's the remaining issue:

bug: anti-dce
model: |
  sw.default.i:
  %.b.i =3D load i1* @networkstatus_get_flavor_name.warning_logged__, !dbg
!1116
  br i1 %.b.i, label %if.end.i, label %if.then.i, !dbg !1121, !macro !1123
stack:
  -
/home/thomas/Arbeit/hardening/stack/build36/tor_test/../../../../Tor/tor/src/or/networkstatus.c:2052:34
ncore: 1
core:
  -
/home/thomas/Arbeit/hardening/stack/build36/tor_test/../../../../Tor/tor/src/or/networkstatus.c:2049:28
    - buffer overflow

and here's the code, with the lines marked.

  for (i=0; i<N_CONSENSUS_FLAVORS; ++i) {
    consensus_waiting_for_certs_t *waiting = &consensus_waiting_for_certs[i];
    if (!waiting->consensus)
      continue;
    if (networkstatus_check_consensus_signature(waiting->consensus, 0)>=0) {
      char *waiting_body = waiting->body; // 2049
      if (!networkstatus_set_current_consensus(
                                 waiting_body,
                                 networkstatus_get_flavor_name(i), // 2052
                                 NSSET_WAS_WAITING_FOR_CERTS,
                                 source_dir)) {
        tor_free(waiting_body);
      }
    }

What's I think is happening here is that the compiler sees that waiting->body is computed, and so realizes that "i" must be in range 0 <= i < N_CONSENSUS_FLAVORS. This could be used to eliminate the assertion and default case in networkstatus_get_flavor_name() when it's inlined.

comment:30 Changed 4 months ago by nickm

Status: needs_revisionneeds_review

The trivial workaround here is to compute the flavor name first as in my stack_em_up branch. This doesn't actually change anything or fix anything real, so I'm not doing a changes file on it.

comment:31 Changed 4 months ago by catalyst

You're saying STACK is warning about the tor_fragile_assert() in networkstatus_get_flavor_name() being eliminated as dead code as a consequence of a buffer overflow not occurring when looping over N_CONSENSUS_FLAVORS? Is the problem partly that STACK doesn't see tor_fragile_assert() as intrinsically unreachable because by default it's nonfatal?

comment:32 Changed 4 months ago by nickm

I honestly can't tell, but the warning is about the warning_logged__ variable, which does seem to be part of the fragile assert.

Last edited 4 months ago by nickm (previous) (diff)

comment:33 Changed 4 months ago by catalyst

Do we get this STACK warning if we build with ALL_BUGS_ARE_FATAL? If not, I'm inclined to leave it as is and recommend that people run STACK with that defined.

comment:34 Changed 4 months ago by nickm

gk, question for you: (see above)

Last edited 4 months ago by nickm (previous) (diff)

comment:35 Changed 4 months ago by nickm

Keywords: review-group-29 added

comment:36 in reply to:  34 Changed 4 months ago by cypherpunks

Replying to nickm:

gk, question for you: (see above)

Cc him.

comment:37 in reply to:  33 Changed 4 months ago by gk

Replying to catalyst:

Do we get this STACK warning if we build with ALL_BUGS_ARE_FATAL? If not, I'm inclined to leave it as is and recommend that people run STACK with that defined.

No, that warning does not show up anymore. However, a new one pops up (with commit 29784d80faab3d8307916c7a945e89fe082e3db0) in that case:

---
bug: anti-simplify
model: |
  %tobool19 = icmp ne %struct.workerthread_s* %27, null, !dbg !315
  -->  true
stack: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:508:9
ncore: 1
core: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:337:3
    - null pointer dereference
---
bug: anti-simplify
model: |
  %tobool19 = icmp ne %struct.workerthread_s* %27, null, !dbg !315
  -->  true
stack: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:508:9
ncore: 1
core: 
  - /home/thomas/Arbeit/hardening/stack/build36/tor_fatal2/../../../../Tor/tor/src/common/workqueue.c:337:3
    - null pointer dereference

The patch in stack_em_up on the other hand solves this problem without creating new warnings. I think that's preferable to keeping a list of false positives (because ALL_BUGS_ARE_FATAL wouldn't be defined in any release build anyway).

comment:38 Changed 4 months ago by nickm

Status: needs_reviewmerge_ready

okay; I'll merge stack_em_up unless somebody objects, and look for solutions for those other issues.

comment:39 Changed 4 months ago by catalyst

I think ALL_BUGS_ARE_FATAL is good to use with STACK because we use it with other static analysis tools for similar reasons. (It helps tools figure out what code is intentionally dead and decreases the noise level.)

I kind of understand why the new error is happening with ALL_BUGS_ARE_FATAL but I can't describe it well yet.

comment:40 Changed 4 months ago by nickm

I've merged stack_em_up. Tossing this one to catalyst for the ALL_BUGS_ARE_FATAL warnings; but please toss it back to me if it's not an easy fix.

comment:41 Changed 4 months ago by nickm

Owner: changed from nickm to catalyst
Status: merge_readyassigned

comment:42 Changed 2 months ago by nickm

Keywords: 033-triage-20180320 added

Marking all tickets reached by current round of 033 triage.

comment:43 Changed 2 months ago by nickm

Keywords: 033-removed-20180320 added

Mark all not-already-included tickets as pending review for removal from 0.3.3 milestone.

comment:44 Changed 2 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: unspecified

These tickets were marked as removed, and nobody has said that they can fix them. Let's remember to look at 033-removed-20180320 as we re-evaluate our triage process, to see whether we're triaging out unnecessarily, and to evaluate whether we're deferring anything unnecessarily. But for now, we can't do these: we need to fix the 033-must stuff now.

Note: See TracTickets for help on using tickets.