Opened 3 weeks ago

Last modified 7 hours ago

#31538 needs_review defect

Windows bundles based on ESR 68 are not built reproducibly

Reported by: gk Owned by: tbb-team
Priority: Very High Milestone:
Component: Applications/Tor Browser Version:
Severity: Major Keywords: tbb-rbm, ff68-esr, tbb-9.0-must-alpha, TorBrowserTeam201909R, GeorgKoppen201909
Cc: boklm Actual Points:
Parent ID: #30322 Points: 2
Reviewer: Sponsor:

Description

As mentioned in comment:20:ticket:28238 we have files that are not built reproducibly with the new mingw-w64-clang toolchain. It affects not all binaries, though, only firefox.exe, libGLESv2.dll, mozglue.dll, plugin-container.exe, plugin-hang-ui.exe, and xul.dll.

This is still an issue after switching to clang, lld etc. 8.0.0.

Child Tickets

TicketStatusOwnerSummaryComponent
#31621closedtbb-teamFix node bug that makes large writes to stdout failApplications/Tor Browser

Attachments (8)

plugin_hang_ui_diff.bz2 (666.2 KB) - added by gk 3 weeks ago.
tpo_mozglue_analysis (25.4 KB) - added by gk 13 days ago.
uni_mozglue_analysis (25.4 KB) - added by gk 13 days ago.
mozglue_lld.log (73.7 KB) - added by gk 10 days ago.
mozglue_lld2.log (73.7 KB) - added by gk 10 days ago.
mozglue.map (1.4 MB) - added by gk 10 days ago.
mozglue2.map (1.4 MB) - added by gk 10 days ago.
mozglue.map.diff (912.6 KB) - added by gk 10 days ago.

Change History (32)

Changed 3 weeks ago by gk

Attachment: plugin_hang_ui_diff.bz2 added

comment:1 Changed 3 weeks ago by gk

The diff is surprisingly large (see the example attached for the plugin-hang-ui.exe).

comment:2 Changed 3 weeks ago by cypherpunks

libtool?

comment:3 Changed 2 weeks ago by boklm

After building the current nightly for windows x86_64 twice on the same machine, I am getting the same bundles. Is it an issue that mostly happen when building on different machines?

comment:4 in reply to:  3 Changed 2 weeks ago by gk

Replying to boklm:

After building the current nightly for windows x86_64 twice on the same machine, I am getting the same bundles. Is it an issue that mostly happen when building on different machines?

Yeah, that's part of the mystery: it only happens on different machines it seems.

comment:5 Changed 2 weeks ago by cypherpunks

"The builds from the same machine were identical, but different on the second machine." boklm said last time it happened...

comment:6 Changed 2 weeks ago by gk

Keywords: tbb-9.0-must-alpha added; tbb-9.0-must-nightly removed

Move must-nightly items to must-alpha ones.

comment:7 Changed 2 weeks ago by gk

Keywords: TorBrowserTeam201909 added; TorBrowserTeam201908 removed

Moving must-alpha tickets to September.

comment:8 Changed 2 weeks ago by gk

Keywords: GeorgKoppen201909 added; GeorgKoppen201908 removed

First ticket move of mine.

comment:9 Changed 2 weeks ago by pili

Points: 2

comment:10 Changed 13 days ago by gk

It seems at least the table information is not written reproducibly. Taking mozglue.dll as an example:

Name: Import symbols table RVA: 802489 (0xc3eb9) Size: 380
Name: Export symbols table RVA: 792816 (0xc18f0) Size: 9673
Name: Base relocation table RVA: 937984 (0xe5000) Size: 6172
Name: Debugging information RVA: 880640 (0xd7000) Size: 28
Name: Exception table RVA: 901120 (0xdc000) Size: 26388
Name: Thread local storage table RVA: 746264 (0xb6318) Size: 40
Name: Resource table RVA: 933888 (0xe4000) Size: 1200
Name: Import address table RVA: 804752 (0xc4790) Size: 1880

vs:

Name: Import symbols table RVA: 802505 (0xc3ec9) Size: 380
Name: Export symbols table RVA: 792832 (0xc1900) Size: 9673
Name: Base relocation table RVA: 937984 (0xe5000) Size: 6176
Name: Debugging information RVA: 880640 (0xd7000) Size: 28
Name: Exception table RVA: 901120 (0xdc000) Size: 26388
Name: Thread local storage table RVA: 746272 (0xb6320) Size: 40
Name: Resource table RVA: 933888 (0xe4000) Size: 1200
Name: Import address table RVA: 804768 (0xc47a0) Size: 1880

comment:11 Changed 13 days ago by gk

I am adding the files I got while using peanalyis for posterity.

Changed 13 days ago by gk

Attachment: tpo_mozglue_analysis added

Changed 13 days ago by gk

Attachment: uni_mozglue_analysis added

Changed 10 days ago by gk

Attachment: mozglue_lld.log added

Changed 10 days ago by gk

Attachment: mozglue_lld2.log added

comment:12 Changed 10 days ago by gk

Added two log files after compiling with -Wl,-verbose. It seems things start to go wrong when libc++.a is getting consulted.

Changed 10 days ago by gk

Attachment: mozglue.map added

Changed 10 days ago by gk

Attachment: mozglue2.map added

Changed 10 days ago by gk

Attachment: mozglue.map.diff added

comment:13 Changed 10 days ago by gk

Added the map files we get after compiling with -Wl,-Map,map (thanks to Martin Storsjö for for both ideas) and the diff.

comment:14 Changed 10 days ago by gk

The libc++.a files have the same content fwiw (even though the archive is not assembled reproducibly).

comment:15 Changed 10 days ago by cypherpunks

Broken libtool from system gcc?

comment:16 in reply to:  15 Changed 7 days ago by gk

Replying to cypherpunks:

Broken libtool from system gcc?

Hm, why do you think that could be relevant here?

comment:17 in reply to:  14 Changed 7 days ago by gk

Replying to gk:

The libc++.a files have the same content fwiw (even though the archive is not assembled reproducibly).

That's probably the issue here as the linker is taking input from libc++.a as it appears in the .a file (thanks Martin).

Now, we do two things with libc++.a: 1) we build the archive and 2) we have the merge_libs() function. I guess as a next step we try to pin down where exactly the problem occurs (maybe it's just 2) that is problematic here).

comment:18 Changed 7 days ago by gk

Okay, things go already south in 1).

comment:19 Changed 7 days ago by gk

The different order in libc++.a is already visible in the respecting link command:

/var/tmp/dist/mingw-w64-clang/bin/llvm-ar qc libc++.a  CMakeFiles/cxx_objects.dir/__/src/shared_mutex.cpp.obj CMakeFiles/cxx_objects.dir/__/src/exception.cpp.obj CMakeFiles/cxx_objects.dir/__/src/system_error.cpp.obj CMakeFiles/cxx_objects.dir/__/src/charconv.cpp.obj CMakeFiles/cxx_objects.dir/__/src/new.cpp.obj CMakeFiles/cxx_objects.dir/__/src/variant.cpp.obj CMakeFiles/cxx_objects.dir/__/src/optional.cpp.obj CMakeFiles/cxx_objects.dir/__/src/bind.cpp.obj CMakeFiles/cxx_objects.dir/__/src/locale.cpp.obj CMakeFiles/cxx_objects.dir/__/src/mutex.cpp.obj CMakeFiles/cxx_objects.dir/__/src/iostream.cpp.obj CMakeFiles/cxx_objects.dir/__/src/stdexcept.cpp.obj CMakeFiles/cxx_objects.dir/__/src/thread.cpp.obj CMakeFiles/cxx_objects.dir/__/src/random.cpp.obj CMakeFiles/cxx_objects.dir/__/src/ios.cpp.obj CMakeFiles/cxx_objects.dir/__/src/typeinfo.cpp.obj CMakeFiles/cxx_objects.dir/__/src/future.cpp.obj CMakeFiles/cxx_objects.dir/__/src/functional.cpp.obj CMakeFiles/cxx_objects.dir/__/src/any.cpp.obj CMakeFiles/cxx_objects.dir/__/src/utility.cpp.obj CMakeFiles/cxx_objects.dir/__/src/valarray.cpp.obj CMakeFiles/cxx_objects.dir/__/src/hash.cpp.obj CMakeFiles/cxx_objects.dir/__/src/regex.cpp.obj CMakeFiles/cxx_objects.dir/__/src/condition_variable.cpp.obj CMakeFiles/cxx_objects.dir/__/src/chrono.cpp.obj CMakeFiles/cxx_objects.dir/__/src/string.cpp.obj CMakeFiles/cxx_objects.dir/__/src/debug.cpp.obj CMakeFiles/cxx_objects.dir/__/src/algorithm.cpp.obj CMakeFiles/cxx_objects.dir/__/src/memory.cpp.obj CMakeFiles/cxx_objects.dir/__/src/vector.cpp.obj CMakeFiles/cxx_objects.dir/__/src/strstream.cpp.obj CMakeFiles/cxx_objects.dir/__/src/support/win32/thread_win32.cpp.obj CMakeFiles/cxx_objects.dir/__/src/support/win32/support.cpp.obj CMakeFiles/cxx_objects.dir/__/src/support/win32/locale_win32.cpp.obj

vs.

/var/tmp/dist/mingw-w64-clang/bin/llvm-ar qc libc++.a  CMakeFiles/cxx_objects.dir/__/src/typeinfo.cpp.obj CMakeFiles/cxx_objects.dir/__/src/debug.cpp.obj CMakeFiles/cxx_objects.dir/__/src/thread.cpp.obj CMakeFiles/cxx_objects.dir/__/src/hash.cpp.obj CMakeFiles/cxx_objects.dir/__/src/stdexcept.cpp.obj CMakeFiles/cxx_objects.dir/__/src/algorithm.cpp.obj CMakeFiles/cxx_objects.dir/__/src/valarray.cpp.obj CMakeFiles/cxx_objects.dir/__/src/condition_variable.cpp.obj CMakeFiles/cxx_objects.dir/__/src/locale.cpp.obj CMakeFiles/cxx_objects.dir/__/src/functional.cpp.obj CMakeFiles/cxx_objects.dir/__/src/bind.cpp.obj CMakeFiles/cxx_objects.dir/__/src/shared_mutex.cpp.obj CMakeFiles/cxx_objects.dir/__/src/any.cpp.obj CMakeFiles/cxx_objects.dir/__/src/exception.cpp.obj CMakeFiles/cxx_objects.dir/__/src/chrono.cpp.obj CMakeFiles/cxx_objects.dir/__/src/system_error.cpp.obj CMakeFiles/cxx_objects.dir/__/src/ios.cpp.obj CMakeFiles/cxx_objects.dir/__/src/string.cpp.obj CMakeFiles/cxx_objects.dir/__/src/optional.cpp.obj CMakeFiles/cxx_objects.dir/__/src/new.cpp.obj CMakeFiles/cxx_objects.dir/__/src/charconv.cpp.obj CMakeFiles/cxx_objects.dir/__/src/vector.cpp.obj CMakeFiles/cxx_objects.dir/__/src/regex.cpp.obj CMakeFiles/cxx_objects.dir/__/src/strstream.cpp.obj CMakeFiles/cxx_objects.dir/__/src/memory.cpp.obj CMakeFiles/cxx_objects.dir/__/src/random.cpp.obj CMakeFiles/cxx_objects.dir/__/src/iostream.cpp.obj CMakeFiles/cxx_objects.dir/__/src/utility.cpp.obj CMakeFiles/cxx_objects.dir/__/src/mutex.cpp.obj CMakeFiles/cxx_objects.dir/__/src/future.cpp.obj CMakeFiles/cxx_objects.dir/__/src/variant.cpp.obj CMakeFiles/cxx_objects.dir/__/src/support/win32/support.cpp.obj CMakeFiles/cxx_objects.dir/__/src/support/win32/locale_win32.cpp.obj CMakeFiles/cxx_objects.dir/__/src/support/win32/thread_win32.cpp.obj

Responsible for that is CMake which saves the command to use in a link.txt file

cd /var/tmp/build/libcxx/build/lib && /var/tmp/dist/cmake/bin/cmake -E cmake_link_script CMakeFiles/cxx_static.dir/link.txt --verbose=1

I am not sure yet where sorting that part differently got solved but updating CMake to the latest stable version, 3.15.3, (we are currently using 3.4.3) seems to solve that part of the puzzle. However, other object files are still included non-deterministically after that part. I am not sure yet where they are coming from...

comment:20 in reply to:  19 Changed 5 days ago by gk

Replying to gk:

However, other object files are still included non-deterministically after that part. I am not sure yet where they are coming from...

Okay, I think I am at the bottom of that rabbit hole. Thre reason for that issue is that libc++.a is getting merged with the contents of libc++abi.a (which, by the way, itself is reproducible at least after switching to CMake's 3.5.13). Now, that merging gets done in libc++'s merge_archives.py (https://github.com/llvm/llvm-project/blob/release/8.x/libcxx/utils/merge_archives.py). The problem here is:

files = glob.glob(os.path.join(temp_directory_root, '*.o*'))

From the doc (https://docs.python.org/2/library/glob.html):

"The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order."

So, doing a simple

files = sorted(glob.glob(os.path.join(temp_directory_root, '*.o*')))

gets us finally a reproducible libc++.a and a reproducible 64-bit .exe. I am currently testing the 32-bit version as well and will prepare a proper patch for review tomorrow.

comment:21 Changed 3 days ago by gk

Keywords: TorBrowserTeam201909R added; TorBrowserTeam201909 removed
Status: newneeds_review

bug_31538_v2 (https://gitweb.torproject.org/user/gk/tor-browser-build.git/log/?h=bug_31538_v2) has wo patches for review (on top of #31732), which should fix the problem. I rebuilt the macOS bundles as well and they are still matching. Doing currently Linux.

comment:22 Changed 2 days ago by gk

Linux 64bit builds are still matching. So, we are good, woo!

comment:23 Changed 9 hours ago by tom

We seem to use a different technique for merging libraries than Martin: https://searchfox.org/mozilla-central/rev/d1e33e3e11f559952d7d80e722d26a6cf5dd80ac/taskcluster/scripts/misc/build-clang-8-mingw.sh#125

I don't know if ours is reproducible. But Martin can probably upstream that clang change fairly easily. And I should - if not copy your approach - add a comment about it in our build.

comment:24 in reply to:  23 Changed 7 hours ago by gk

Replying to tom:

We seem to use a different technique for merging libraries than Martin: https://searchfox.org/mozilla-central/rev/d1e33e3e11f559952d7d80e722d26a6cf5dd80ac/taskcluster/scripts/misc/build-clang-8-mingw.sh#125

No, that's a different piece. It merges libc++ with libunwind which is actually okay. The problematic one, which you hit as well, is within the libc++ build step where libc++ and libc++abi are getting merged into libc++.

I don't know if ours is reproducible. But Martin can probably upstream that clang change fairly easily. And I should - if not copy your approach - add a comment about it in our build.

Upstream changed meanwhile, so the patch in question is only usable for clang 8-based toolchains. However, it seems the new changes still have the same issue potentially. I'll get back to Martin with that once this patch lands and we are good on our side.

Note: See TracTickets for help on using tickets.