Coverage merge failures cause test_process_slow stderr check to fail

changed milestone to %Tor: 0.2.9.x-final

Trac:
Child Ticket(s): #29962 (moved)

added 029-backport 034-backport 035-backport 040-backport 041-accepted-20190115 actualpoints::0.6 component::core tor/tor milestone::Tor: 0.2.9.x-final owner::teor points::0.5 priority::high regression resolution::fixed reviewer::catalyst severity::major status::closed tor-ci tor-ci-fail-sometimes type::defect version::tor 0.2.9.15 labels

Interesting catch! Assigning to myself.

Trac:
Owner: N/A to ahf
Status: new to assigned

We accepted this in the meeting.

Trac:
Keywords: 041-proposed deleted, 041-accepted-20190115 added

This issue caused a master build to fail: https://travis-ci.org/torproject/tor/jobs/507372106

It doesn't seem to happen very often, so I'm dropping the severity.

Trac:
Severity: Critical to Major
Priority: Medium to High

Here are the full logs, I'm going to restart the job to clear the error:

slow/crypto/fuzz_donna/ed25519_donna: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/crypto/fuzz_donna/ed25519_ref10: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/process/callbacks: profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
  FAIL src/test/test_process_slow.c:241: assert(smartlist_len(process_data->stderr_data) OP_EQ 3): 4 vs 3
  [callbacks FAILED]
slow/process/callbacks_terminate: profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/prob_distr/stochastic_genpareto: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/prob_distr/stochastic_geometric: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/prob_distr/stochastic_uniform: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/prob_distr/stochastic_logistic: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/prob_distr/stochastic_log_logistic: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
slow/prob_distr/stochastic_weibull: [forking] profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108
OK
1/20 TESTS FAILED. (0 skipped)
profiling:/home/travis/build/torproject/tor/src/trunnel/src_trunnel_libor_trunnel_testing_a-socks5.gcda:Merge mismatch for function 108

I'm not sure if I read the output correctly, but does this error condition mean that the test binaries (and the child-process binary) output "Merge mismatch for function X" on standard error?

We can detect that line in stderr's first line if we want to in the process tests, but I don't think that would be the right solution. Can we either fail in an earlier way when this condition appears OR figure out a way to reset the state so these "Merge mismatch" disappear?

Replying to ahf:

I'm not sure if I read the output correctly, but does this error condition mean that the test binaries (and the child-process binary) output "Merge mismatch for function X" on standard error?

We can detect that line in stderr's first line if we want to in the process tests, but I don't think that would be the right solution. Can we either fail in an earlier way when this condition appears OR figure out a way to reset the state so these "Merge mismatch" disappear?

Let's try deleting the cached coverage files before building? Or let's delete the coverage files after a coverage run, so they don't take up space in the cache?

I think it would make the most sense to delete the coverage files so they don't take up space in the cache rather than deleting them from the cache before we (maybe?) update them. It seems like these files should never be cached, right?

Replying to ahf:

I think it would make the most sense to delete the coverage files so they don't take up space in the cache rather than deleting them from the cache before we (maybe?) update them. It seems like these files should never be cached, right?

It doesn't make sense to cache the files, and it's probably a bad thing: coverage can change when unrelated modules change.

If we delete the files before caching, they'll disappear from the cache after the first build on each branch. So the initial build, and old branches, might still have this bug. I think that's ok?

Let's create a "clean-coverage" make target, and add it in the "before_cache:" phase? https://docs.travis-ci.com/user/caching/#before_cache-phase

Trac:
Keywords: N/A deleted, 029-backport, 040-backport, 035-backport, 034-backport added

There's already a target called reset-gcov that might do what you want.

Thanks for the hint with reset-gcov. Based on that I opened: https://github.com/torproject/tor/pull/803

Let's see what CI thinks of this.

Looks like CI was happy. Let's get it reviewed.

Trac:
Status: assigned to needs_review

Trac:
Reviewer: N/A to catalyst

Probably worth noting here: I use after_script for this after having read: https://blog.travis-ci.com/after_script_behavior_changes

Please rebase on maint-0.2.9: we need to fix the process test failure on master, but we also need reliable coverage stats on 0.2.9 and later. And we want smaller caches on all branches to speed up the build.

Replying to ahf:

Probably worth noting here: I use after_script for this after having read: https://blog.travis-ci.com/after_script_behavior_changes

I think you want before_cache, it's documented to run before the cache phase: https://docs.travis-ci.com/user/caching/#before_cache-phase

Trac:
Status: needs_review to needs_revision

ahf and I spoke about this fix on irc, here's the full list of changes:

rebase on maint-0.2.9
run coverage as the last script step
change make reset-gcov so it deletes ".gcov" files
- so we don't accidentally use stale files if the coverage command fails
- it also makes the cache smaller
put make reset-gcov in before_cache

Let's see what CI says to: https://github.com/torproject/tor/pull/806

https://github.com/torproject/tor/pull/806 - CI looks happy. Let's get it reviewed.

Trac:
Status: needs_revision to needs_review

Replying to ahf:

https://github.com/torproject/tor/pull/806 - CI looks happy. Let's get it reviewed. Thanks! That one branch looks good to me. Does this merge forward cleanly?

Trac:
Status: needs_review to needs_information

I'll check that it merges forward cleanly before I push to torproject.org.

Trac:
Status: needs_information to merge_ready

ahf, please remember to fill in actual points?

Trac:
Keywords: N/A deleted, teor-merge added

Trac:
Actualpoints: N/A to 0.5

Hi ahf, catalyst,

Replying to teor:

change make reset-gcov so it deletes ".gcov" files

so we don't accidentally use stale files if the coverage command fails

it also makes the cache smaller

I still think we should change reset-gcov so it deletes the .gcov files. They aren't any good in the cache, and they will hide some kinds of coverage build/execution failures. What do you think?

Edit: typos

Trac:
Status: merge_ready to needs_revision

Trac:
Keywords: N/A deleted, tor-ci-fail-sometimes added

#29036 (moved) and #29962 (moved) both add before_cache, so I'd like to backport them and merge them forward together.

Trac:
Owner: ahf to teor
Status: needs_revision to assigned

Trac:
Status: assigned to needs_revision

I added these commits:

57e9fe2: deletes the .gcno and .gcov files in make reset-gcov
eb0bd18: tweaks the changes file so it passes make check-changes on 0.3.5
clean merge of #29962 (moved) into 0.3.4
33be8d8: combine before_cache from #29036 (moved) and #29962 (moved)
7014e57 merge: combine the stem lines from maint-0.3.5 with the moved coverage line from #29036 (moved)

catalyst, can you review my extra commits on #29036 (moved), and my extra commits on #29962 (moved)? (You've already reviewed ahf's commits on #29036 (moved), and I reviewed rl1987's commits on #29962 (moved).)

Here are the pull requests:

0.2.9: https://github.com/torproject/tor/pull/877
0.3.4: https://github.com/torproject/tor/pull/878
clean merge to maint-0.3.4, also merge #29962 (moved)
0.3.5: https://github.com/torproject/tor/pull/879
trivial line-based merge to 0.3.5
0.4.0: https://github.com/torproject/tor/pull/881
clean merge, testing only
master: https://github.com/torproject/tor/pull/882
- clean merge, testing only

Edit: typo

Trac:
Status: needs_revision to needs_review

Replying to teor:

catalyst, can you review my extra commits on #29036 (moved), and my extra commits on #29962 (moved)? (You've already reviewed ahf's commits on #29036 (moved), and I reviewed rl1987's commits on #29962 (moved).) Thanks! These look good.

Trac:
Status: needs_review to merge_ready

Please merge https://github.com/torproject/tor/pull/879 to 0.4.0 and later.

Once the CI finishes successfully, I'll backport the 0.2.9, 0.3.4, and 0.3.5 branches.

Trac:
Actualpoints: 0.5 to 0.6
Keywords: teor-merge deleted, nickm-merge, asn-merge added

Sorry, I think we need a different target name for this new "reset-gcov" semantics, if that's what we're going for. In the current semantics, running 'reset-gcov' after a test run resets the coverage, but doesn't stop you from running "make check" again. But if we change it to remove the gcno files too, that will means that you need to do a full "make clean" and "make" before you can run the tests for coverage again.

Trac:
Status: merge_ready to needs_revision

Replying to nickm:

Sorry, I think we need a different target name for this new "reset-gcov" semantics, if that's what we're going for. In the current semantics, running 'reset-gcov' after a test run resets the coverage, but doesn't stop you from running "make check" again. But if we change it to remove the gcno files too, that will means that you need to do a full "make clean" and "make" before you can run the tests for coverage again.

I think I understand better now:

we MUST NOT delete gcno files, because they are created when the code is compiled
we MUST delete gcda files to remove the warning (and make our coverage accurate)
we MAY delete gcov files, because they take up space in the cache (and leaving old files around might make our coverage inaccurate)

I'll amend that commit to delete gcda and gcov files, but leave gcno files.

Ok, I removed the Makefile line that deletes the gcno files.

Please merge https://github.com/torproject/tor/pull/879 to 0.4.0 and later.

Trac:
Status: needs_revision to merge_ready

merged to 040 and forward.

Trac:
Resolution: N/A to fixed
Status: merge_ready to closed

Please don't close tickets that need to be backported: put them in Tor: 0.3.5.x-final instead.

Trac:
Status: closed to reopened
Resolution: fixed to N/A
Milestone: Tor: 0.4.1.x-final to Tor: 0.3.5.x-final
Version: Tor: unspecified to Tor: 0.2.9.15

Trac:
Keywords: nickm-merge, asn-merge deleted, N/A added

Trac:
Status: reopened to merge_ready

Merged #29036 (moved), #30011 (moved), and #30021 (moved).

For this ticket, that's:

0.2.9: https://github.com/torproject/tor/pull/877
0.3.4: https://github.com/torproject/tor/pull/878
0.3.5: https://github.com/torproject/tor/pull/879

Trac:
Resolution: N/A to fixed
Status: merge_ready to closed

Trac:
Milestone: Tor: 0.3.5.x-final to Tor: 0.2.9.x-final

closed

changed time estimate to 4h

added 4h 48m of time spent

mentioned in issue #29962 (moved)

mentioned in issue #30001 (moved)

mentioned in issue #30011 (moved)

mentioned in issue #30021 (moved)

moved to tpo/core/tor#29036 (closed)

Coverage merge failures cause test_process_slow stderr check to fail

Child items ...

Activity