Opened 4 months ago

Last modified 2 weeks ago

#29528 needs_revision defect

UndefinedBehaviorSanitizer errors should fail the unit tests

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-ci, tor-test, 041-proposed, 029-backport, 035-backport, 040-backport, 041-should
Cc: Actual Points: 0.2
Parent ID: Points: 2
Reviewer: nickm Sponsor:

Description

In #29527, clang's UndefinedBehaviorSanitizer logs a failure, but the unit test succeeds.

We should fail the unit tests on sanitizer errors. (There might be an environmental variable to fail on errors?)

Child Tickets

TicketStatusOwnerSummaryComponent
#29830newUse UndefinedBehaviorSanitizer when the UBSan configure checks pass, rather than the ASan configure checksCore Tor/Tor
#29831closedteorBackport "enable expensive hardening message is wrong with static library builds"Core Tor/Tor
#29832closedteorUse the correct library names when UBSan isn't supportedCore Tor/Tor

Change History (24)

comment:1 Changed 4 months ago by nickm

Keywords: 041-proposed added

comment:2 Changed 4 months ago by riastradh

FYI, exactly zero of the cases this sanitizer flagged as undefined behaviour in #29527 are actually undefined behaviour. Of the ~50 reports, only two of them are potentially evidence of problems, but since the tests pass with no floating-point invalid-operation exceptions I suspect they might not be problems either.

comment:3 Changed 4 months ago by riastradh

There is an open Clang bug for this: https://bugs.llvm.org/show_bug.cgi?id=19535

Possible workaround: use -fno-sanitize=float-divide-by-zero in addition to -fsanitize=undefined.

comment:4 Changed 4 months ago by nickm

Seems plausible, but could I ask your opinion on the rest of that thread, where people are arguing about what's undefined and what isn't?

comment:5 in reply to:  3 Changed 4 months ago by teor

Replying to riastradh:

There is an open Clang bug for this: https://bugs.llvm.org/show_bug.cgi?id=19535

Possible workaround: use -fno-sanitize=float-divide-by-zero in addition to -fsanitize=undefined.

Ok, that's a workaround for the specific cases in #29527 which are not bugs.
Let's implement it as part of that ticket?

We should still fail tests when they encounter genuinely undefined behaviour.

comment:6 in reply to:  3 Changed 4 months ago by asn

Replying to riastradh:

There is an open Clang bug for this: https://bugs.llvm.org/show_bug.cgi?id=19535

Possible workaround: use -fno-sanitize=float-divide-by-zero in addition to -fsanitize=undefined.

Provided a patch for this particular case in #29527.

comment:7 in reply to:  4 Changed 4 months ago by riastradh

Replying to nickm:

Seems plausible, but could I ask your opinion on the rest of that thread, where people are arguing about what's undefined and what isn't?

There are four main points here:

  1. C99 technically does say of the / operator that 'if the value of the second operand is zero, the behavior is undefined' (C99, Sec. 6.5.5 Multiplicative operators, clause 5, p. 82).
  1. Annex F specifies that all the arithmetic operations on floating-point types have IEEE 754 semantics: 'The +, -, *, and / operators provide the IEC 60559 [another name for IEEE 754, along with ISO/IEC 559] add, subtract, multiply, and divide operations.' (C99, Annex F, F.3 Operators and functions, p. 445)
  1. Strictly speaking, Annex F is optional. Strictly speaking, there may be bugs in the IEEE 754 conformance of clang or gcc. Strictly speaking, not everything is up to the compiler, so if you used clang or gcc but linked against a broken libm that didn't provide IEEE 754 semantics, it might be technically wrong for clang or gcc to advertise IEEE 754 semantics (a.k.a. Annex F support, indicated by the definition of the __STDC_IEC_559__ macro).
  1. Nobody except a disingenuous language lawyer trolling for a point would seriously choose to wittingly make clang or gcc deviate in any substantial way from IEEE 754 semantics. Essentially the entire body of numerical software on the planet of the past quarter century, outside now-obscure platforms like legacy IBM mainframes or VAXen, has been designed under the premise of IEEE 754 semantics.

Even clang UB optimization attorneys, who might delete null pointer checks if they can be proven to follow undefined behaviour in the absence of -fno-delete-null-pointer-checks, don't seem to be inclined to take advantage of the potential room for disagreement between C99 6.5.5 and Annex F. Where clang deviates from IEEE 754 semantics, it's because of a lack of devpower to go through IEEE 754 and clang with a fine-toothed comb to catch all the corner cases (e.g., optimization bugs in nondefault rounding modes), not because they intend to exploit it.

For an illustrative example of the value of IEEE 754 semantics, see the dozens of cases in #29527 that could have gone wrong if we didn't have IEEE 754 divide-by-zero semantics, but that do exactly the right thing even though I wasn't thinking about those cases when I wrote the code (until I wrote the test cases, and again later when I reviewed all the false positives). Failure to support IEEE 754 semantics, particularly extremely basic parts like consistently giving infinities for division by zero, would likely be blamed for billion-dollar mistakes.

Last edited 4 months ago by riastradh (previous) (diff)

comment:8 Changed 3 months ago by teor

Actual Points: 0.2
Keywords: 029-backport 034-backport 035-backport 040-backport added
Milestone: Tor: unspecifiedTor: 0.4.1.x-final
Status: newneeds_review

This ticket implements this fix, and all the child tickets.
They can close when this ticket closes.

Please review my branches for merging:

And extra branches for testing that this fix doesn't fail CI:

I tried to use all the current and legacy command-line arguments. I expect that the compilers will do something sensible if multiple arguments work: all we really need is the failure.

The command-line arguments are listed here:
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#usage
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html

The library names are listed here:
https://stackoverflow.com/questions/29392702/missing-libclang-rt-san-x86-64-a-file-for-llvm-compiler-rt
https://bugzilla.redhat.com/show_bug.cgi?id=1303766

comment:9 Changed 3 months ago by teor

I expect 0.4.0 to fail until #29527 is merged. If it isn't failing, then maybe I made a mistake in this patch?

comment:10 Changed 3 months ago by teor

master failed due to #29693.

comment:12 in reply to:  9 Changed 3 months ago by teor

Replying to teor:

I expect 0.4.0 to fail until #29527 is merged. If it isn't failing, then maybe I made a mistake in this patch?

I'd like the reviewer to help me work out what's going on here.

comment:13 Changed 3 months ago by teor

I made a pull request that always shows the test log:
https://github.com/torproject/tor/pull/838

Hopefully that will help us work out what is going on.

comment:14 Changed 3 months ago by asn

Reviewer: nickm

comment:15 Changed 3 months ago by nickm

Status: needs_reviewmerge_ready

This looks okay to me, though I am very leery of merging into 0.2.9 or earlier without a lot of testing in 0.4.1.

comment:16 in reply to:  13 Changed 3 months ago by teor

Status: merge_readyneeds_information

This ticket is not merge_ready, because we expect 0.4.0 to fail on float divide by zero, and it does not:

Replying to teor:

I made a pull request that always shows the test log:
https://github.com/torproject/tor/pull/838

Hopefully that will help us work out what is going on.

comment:17 Changed 3 months ago by teor

Reviewer: nickm

Oops, that didn't work, we only get the test log from make check if a test fails.

Let's try just doing a make test:
https://github.com/torproject/tor/pull/846

It's based on 0.4.0, so it should not have -fno-sanitize-float-divide-by-zero.

comment:18 Changed 3 months ago by teor

Reviewer: nickm

comment:19 Changed 3 months ago by teor

Status: needs_informationneeds_revision

Well, this is truly weird.

When we pass -fsanitize=undefined:
https://travis-ci.org/teor2345/tor/jobs/511294592#L2002
We get:

circuitpadding/circuitpadding_sample_distribution: [forking]  src/lib/math/prob_distr.c:1311:17: runtime error: division by zero
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/lib/math/prob_distr.c:1311:17 in
...

https://travis-ci.org/teor2345/tor/jobs/511294592#L4644

When we pass -fsanitize=undefined -fno-sanitize-recover=undefined -fsanitize-trap=undefined -fsanitize-undefined-trap-on-error:
https://travis-ci.org/teor2345/tor/jobs/511273849#L2002
We get no warnings or errors:

circuitpadding/circuitpadding_sample_distribution: [forking] OK

https://travis-ci.org/teor2345/tor/jobs/511273849#L4648

That's truly weird. I'm going to have to test the new options one by one, to see which one has this unexpected behaviour.

comment:20 Changed 6 weeks ago by nickm

Keywords: 041-should added

comment:21 in reply to:  19 ; Changed 6 weeks ago by cypherpunks

Replying to teor:

When we pass -fsanitize=undefined -fno-sanitize-recover=undefined -fsanitize-trap=undefined -fsanitize-undefined-trap-on-error:

trap ones override non-trap ones and don’t use buggy UBSan run-time support.

comment:22 in reply to:  21 Changed 5 weeks ago by teor

Replying to cypherpunks:

Replying to teor:

When we pass -fsanitize=undefined -fno-sanitize-recover=undefined -fsanitize-trap=undefined -fsanitize-undefined-trap-on-error:

trap ones override non-trap ones and don’t use buggy UBSan run-time support.

So the UBSan runtime reports divisions by zero, but the UBSan trap does not?
That's also really weird. And makes it hard to work out which one we should use.

comment:23 Changed 5 weeks ago by cypherpunks

So the UBSan runtime reports divisions by zero, but the UBSan trap does not? That's also really weird.

Clang has very poor documentation, but a lot of features come from GCC, and Clang's devs recommend to use GCC docs :( plus "read the source" :(
So, here's where it comes from (GCC):
"Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs."
And trap one has the same default behavior:
"The -fsanitize-undefined-trap-on-error option instructs the compiler to report undefined behavior using __builtin_trap rather than a libubsan library routine. The advantage of this is that the libubsan library is not needed and is not linked in, so this is usable even in freestanding environments."

And makes it hard to work out which one we should use.

Should? You should try to ship Tor with production-grade version of UBSan: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#minimal-runtime

comment:24 Changed 2 weeks ago by nickm

Keywords: 034-backport removed

Removing 034-backport from all open tickets: 034 has reached EOL.

Note: See TracTickets for help on using tickets.