UndefinedBehaviorSanitizer errors should fail the unit tests

Trac:
Child Ticket(s): #29832 (moved), #29830 (moved), #29831 (moved)

added 035-backport 040-backport 041-proposed 041-should 044-deferred actualpoints::0.2 component::core tor/tor milestone::Tor: unspecified points::2 priority::medium reviewer::nickm severity::normal status::needs-revision tor-ci tor-test type::defect labels

Trac:
Keywords: N/A deleted, 041-proposed added

FYI, exactly zero of the cases this sanitizer flagged as undefined behaviour in #29527 (moved) are actually undefined behaviour. Of the ~50 reports, only two of them are potentially evidence of problems, but since the tests pass with no floating-point invalid-operation exceptions I suspect they might not be problems either.

There is an open Clang bug for this: https://bugs.llvm.org/show_bug.cgi?id=19535

Possible workaround: use -fno-sanitize=float-divide-by-zero in addition to -fsanitize=undefined.

Seems plausible, but could I ask your opinion on the rest of that thread, where people are arguing about what's undefined and what isn't?

Replying to riastradh:

There is an open Clang bug for this: https://bugs.llvm.org/show_bug.cgi?id=19535

Possible workaround: use -fno-sanitize=float-divide-by-zero in addition to -fsanitize=undefined.

Ok, that's a workaround for the specific cases in #29527 (moved) which are not bugs. Let's implement it as part of that ticket?

We should still fail tests when they encounter genuinely undefined behaviour.

Replying to riastradh:

There is an open Clang bug for this: https://bugs.llvm.org/show_bug.cgi?id=19535

Possible workaround: use -fno-sanitize=float-divide-by-zero in addition to -fsanitize=undefined.

Provided a patch for this particular case in #29527 (moved).

Replying to nickm:

Seems plausible, but could I ask your opinion on the rest of that thread, where people are arguing about what's undefined and what isn't?

There are four main points here:

C99 technically does say of the / operator that 'if the value of the second operand is zero, the behavior is undefined' (C99, Sec. 6.5.5 Multiplicative operators, clause 5, p. 82).
Annex F specifies that all the arithmetic operations on floating-point types have IEEE 754 semantics: 'The +, -, *, and / operators provide the IEC 60559 [another name for IEEE 754, along with ISO/IEC 559] add, subtract, multiply, and divide operations.' (C99, Annex F, F.3 Operators and functions, p. 445)
Strictly speaking, Annex F is optional. Strictly speaking, there may be bugs in the IEEE 754 conformance of clang or gcc. Strictly speaking, not everything is up to the compiler, so if you used clang or gcc but linked against a broken libm that didn't provide IEEE 754 semantics, it might be technically wrong for clang or gcc to advertise IEEE 754 semantics (a.k.a. Annex F support, indicated by the definition of the __STDC_IEC_559__ macro).
Nobody except a disingenuous language lawyer trolling for a point would seriously choose to wittingly make clang or gcc deviate in any substantial way from IEEE 754 semantics. Essentially the entire body of numerical software on the planet of the past quarter century, outside now-obscure platforms like legacy IBM mainframes or VAXen, has been designed under the premise of IEEE 754 semantics.

Even clang UB optimization attorneys, who might delete null pointer checks if they can be proven to follow undefined behaviour in the absence of -fno-delete-null-pointer-checks, don't seem to be inclined to take advantage of the potential room for disagreement between C99 6.5.5 and Annex F. Where clang deviates from IEEE 754 semantics, it's because of a lack of devpower to go through IEEE 754 and clang with a fine-toothed comb to catch all the corner cases (e.g., optimization bugs in nondefault rounding modes), not because they intend to exploit it.

For an illustrative example of the value of IEEE 754 semantics, see the dozens of cases in #29527 (moved) that could have gone wrong if we didn't have IEEE 754 divide-by-zero semantics, but that do exactly the right thing even though I wasn't thinking about those cases when I wrote the code (until I wrote the test cases, and again later when I reviewed all the false positives). Failure to support IEEE 754 semantics, particularly extremely basic parts like consistently giving infinities for division by zero, would likely be blamed for [mistakes].

This ticket implements this fix, and all the child tickets. They can close when this ticket closes.

Please review my branches for merging:

0.2.9: https://github.com/torproject/tor/pull/812
0.3.4: https://github.com/torproject/tor/pull/813

And extra branches for testing that this fix doesn't fail CI:

0.3.5: https://github.com/torproject/tor/pull/815
0.4.0: https://github.com/torproject/tor/pull/816
master: https://github.com/torproject/tor/pull/817

I tried to use all the current and legacy command-line arguments. I expect that the compilers will do something sensible if multiple arguments work: all we really need is the failure.

The command-line arguments are listed here: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#usage https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html

The library names are listed here: https://stackoverflow.com/questions/29392702/missing-libclang-rt-san-x86-64-a-file-for-llvm-compiler-rt https://bugzilla.redhat.com/show_bug.cgi?id=1303766

Trac:
Keywords: N/A deleted, 029-backport, 040-backport, 035-backport, 034-backport added
Milestone: Tor: unspecified to Tor: 0.4.1.x-final
Status: new to needs_review
Actualpoints: N/A to 0.2

I expect 0.4.0 to fail until #29527 (moved) is merged. If it isn't failing, then maybe I made a mistake in this patch?

master failed due to #29693 (moved).

Link for #29693 (moved) failure: https://travis-ci.org/torproject/tor/jobs/508841104

master failed on appveyor due to #29645 (moved): https://ci.appveyor.com/project/torproject/tor/builds/23216905/job/1q8wji257qffgfs4

Replying to teor:

I expect 0.4.0 to fail until #29527 (moved) is merged. If it isn't failing, then maybe I made a mistake in this patch?

I'd like the reviewer to help me work out what's going on here.

I made a pull request that always shows the test log: https://github.com/torproject/tor/pull/838

Hopefully that will help us work out what is going on.

Trac:
Reviewer: N/A to nickm

This looks okay to me, though I am very leery of merging into 0.2.9 or earlier without a lot of testing in 0.4.1.

Trac:
Status: needs_review to merge_ready

This ticket is not merge_ready, because we expect 0.4.0 to fail on float divide by zero, and it does not:

Replying to teor:

I made a pull request that always shows the test log: https://github.com/torproject/tor/pull/838

Hopefully that will help us work out what is going on.

Trac:
Status: merge_ready to needs_information

Oops, that didn't work, we only get the test log from make check if a test fails.

Let's try just doing a make test: https://github.com/torproject/tor/pull/846

It's based on 0.4.0, so it should not have -fno-sanitize-float-divide-by-zero.

Trac:
Reviewer: nickm to N/A

UndefinedBehaviorSanitizer errors should fail the unit tests

Child items ...

Activity