Opened 4 weeks ago

Closed 8 days ago

#32052 closed defect (duplicate)

Linux32 bundles for Tor Browser 9.0a8 are not reproducible

Reported by: gk Owned by: tbb-team
Priority: Immediate Milestone:
Component: Applications/Tor Browser Version:
Severity: Critical Keywords: tbb-9.0-must, tbb-9.0-issues, tbb-regression, tbb-9.0.1-can, TorBrowserTeam201911
Cc: boklm, manishearth@…, acrichton@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

For some reasons boklm and I got different Linux32 bundles when building our rc for 9.0a8

Child Tickets

Change History (11)

comment:1 Changed 4 weeks ago by gk

The diff is too large to attach here but it's contained to libxul.so.

comment:2 Changed 4 weeks ago by gk

I put everything I currently have up at https://people.torproject.org/~gk/builds/9.0a8-build2-test/ (including my linux32 firefox obj dir to figure out which objects actually differ).

Last edited 4 weeks ago by gk (previous) (diff)

comment:3 Changed 4 weeks ago by boklm

After rebuilding the firefox part to copy the obj directory, I now get the same libxul.so as GeKo. So it looks like an intermittent issue.

To find more about this issue, I think I could do a loop that keeps rebuilding and removing the obj dir, untill the libxul.so is differing.

For now I will rebuild the 8.0a9 linux32 bundles to check that they are now matching with GeKo's bundles.

comment:4 Changed 4 weeks ago by boklm

In order to try to reproduce the issue, and get an obj- directory from a build having this issue, I am planning to run a build with this patch:
https://gitweb.torproject.org/user/boklm/tor-browser-build.git/commit/?h=bug_32052&id=6a513d33ad9c3dc03b064f450ca0c1b07509340b

comment:5 Changed 4 weeks ago by gk

Cc: manishearth@… acrichton@… added

Here is what we know so far: while the libxul.so usually matches between different builds if one tries hard enough (building over and over again) one gets different libraries. Looking at the diff shows that gkrust-f4d3d8c9a1eaf037.gkrust.eac5ce9j-cgu.0.rcgu.o is different from non-matching builds.

Looking closer at that one gets us something like

--- /dev/fd/63	2019-10-17 08:43:29.203950618 +0200
+++ /dev/fd/62	2019-10-17 08:43:29.207950653 +0200
@@ -1,6 +1,6 @@
 00000000: 7f45 4c46 0101 0100 0000 0000 0000 0000  .ELF............
 00000010: 0100 0300 0100 0000 0000 0000 0000 0000  ................
-00000020: d08e d100 0000 0000 3400 0000 0000 2800  ........4.....(.
+00000020: 908e d100 0000 0000 3400 0000 0000 2800  ........4.....(.
 00000030: 4c7b 0100 0000 0000 0000 0000 0000 0000  L{..............
 00000040: 5553 5756 83ec 0c8b 4424 20e8 0000 0000  USWV....D$ .....
 00000050: 5b81 c303 0000 008b 7024 85f6 745d 8b4c  [.......p$..t].L
@@ -2879,26 +2879,26 @@
 0000b3e0: 6c6c 2d61 7272 6f77 2d66 6f72 7761 7264  ll-arrow-forward
 0000b3f0: 646f 772d 6275 7474 6f6e 2d63 6c6f 7365  dow-button-close
 0000b400: 2d6d 6f7a 2d77 696e 646f 772d 6275 7474  -moz-window-butt
-0000b410: 6c65 7468 756d 622d 7665 7274 6963 616c  lethumb-vertical
-0000b420: 7363 616c 6574 6875 6d62 2d76 6572 7469  scalethumb-verti
-0000b430: 7466 6965 6c64 2d6d 756c 7469 6c69 6e65  tfield-multiline
-0000b440: 7465 7874 6669 656c 642d 6d75 6c74 696c  textfield-multil
-0000b450: 7262 7574 746f 6e2d 6472 6f70 646f 776e  rbutton-dropdown
-0000b460: 746f 6f6c 6261 7262 7574 746f 6e2d 6472  toolbarbutton-dr
-0000b470: 6568 6561 6465 7273 6f72 7461 7272 6f77  eheadersortarrow
-0000b480: 7472 6565 6865 6164 6572 736f 7274 6172  treeheadersortar
-0000b490: 6963 6174 696f 6e73 2d74 6f6f 6c62 6f78  ications-toolbox
-0000b4a0: 2d6d 6f7a 2d77 696e 2d63 6f6d 6d75 6e69  -moz-win-communi
-0000b4b0: 6572 7461 6262 6172 2d74 6f6f 6c62 6f78  ertabbar-toolbox
-0000b4c0: 2d6d 6f7a 2d77 696e 2d62 726f 7773 6572  -moz-win-browser
-0000b4d0: 756c 6c73 6372 6565 6e2d 6275 7474 6f6e  ullscreen-button
-0000b4e0: 2d6d 6f7a 2d6d 6163 2d66 756c 6c73 6372  -moz-mac-fullscr
-0000b4f0: 6f6e 2d62 6f78 2d6d 6178 696d 697a 6564  on-box-maximized
-0000b500: 2d62 7574 746f 6e2d 6d61 7869 6d69 7a65  -button-maximize
-0000b510: 2d62 7574 746f 6e2d 6d69 6e69 6d69 7a65  -button-minimize
-0000b520: 772d 6275 7474 6f6e 2d72 6573 746f 7265  w-button-restore
-0000b530: 646f 772d 6672 616d 652d 626f 7474 6f6d  dow-frame-bottom
-0000b540: 2d6d 6f7a 2d77 696e 646f 772d 6672 616d  -moz-window-fram
+0000b410: 646f 772d 6672 616d 652d 626f 7474 6f6d  dow-frame-bottom
+0000b420: 2d6d 6f7a 2d77 696e 646f 772d 6672 616d  -moz-window-fram
+0000b430: 6c65 7468 756d 622d 7665 7274 6963 616c  lethumb-vertical
+0000b440: 7363 616c 6574 6875 6d62 2d76 6572 7469  scalethumb-verti
+0000b450: 7466 6965 6c64 2d6d 756c 7469 6c69 6e65  tfield-multiline
+0000b460: 7465 7874 6669 656c 642d 6d75 6c74 696c  textfield-multil
+0000b470: 7262 7574 746f 6e2d 6472 6f70 646f 776e  rbutton-dropdown
+0000b480: 746f 6f6c 6261 7262 7574 746f 6e2d 6472  toolbarbutton-dr
+0000b490: 6568 6561 6465 7273 6f72 7461 7272 6f77  eheadersortarrow
+0000b4a0: 7472 6565 6865 6164 6572 736f 7274 6172  treeheadersortar
+0000b4b0: 6963 6174 696f 6e73 2d74 6f6f 6c62 6f78  ications-toolbox
+0000b4c0: 2d6d 6f7a 2d77 696e 2d63 6f6d 6d75 6e69  -moz-win-communi
+0000b4d0: 6572 7461 6262 6172 2d74 6f6f 6c62 6f78  ertabbar-toolbox
+0000b4e0: 2d6d 6f7a 2d77 696e 2d62 726f 7773 6572  -moz-win-browser
+0000b4f0: 756c 6c73 6372 6565 6e2d 6275 7474 6f6e  ullscreen-button
+0000b500: 2d6d 6f7a 2d6d 6163 2d66 756c 6c73 6372  -moz-mac-fullscr
+0000b510: 2d62 7574 746f 6e2d 6d61 7869 6d69 7a65  -button-maximize
+0000b520: 2d62 7574 746f 6e2d 6d69 6e69 6d69 7a65  -button-minimize
+0000b530: 6f6e 2d62 6f78 2d6d 6178 696d 697a 6564  on-box-maximized
+0000b540: 772d 6275 7474 6f6e 2d72 6573 746f 7265  w-button-restore
 0000b550: 746c 6562 6172 2d6d 6178 696d 697a 6564  tlebar-maximized
 0000b560: 2d6d 6f7a 2d77 696e 646f 772d 7469 746c  -moz-window-titl
 0000b570: 7375 7265 2d62 7574 746f 6e2d 6f70 656e  sure-button-open
@@ -414252,45983 +414252,45983 @@
 006522b0: 4e2c 897e 30c6 0601 83c4 7c5e 5f5b 5dc3  N,.~0.....|^_[].
 006522c0: 0fb6 c8e8 fcff ffff 8b17 e997 fdff ff0f  ................
 006522d0: 0b0f 0b00 0000 0000 0000 0000 0000 0000  ................
-006522e0: 5553 5756 81ec ac00 0000 8bbc 24c0 0000  USWV........$...
-006522f0: 0089 5424 0ce8 0000 0000 5b89 ce81 c305  ..T$......[.....
-00652300: 0000 008b 178b 6a08 8b42 1045 8944 2438  ......j..B.E.D$8
-00652310: 8a47 042b 6a0c c647 0403 3c03 0f85 7c01  .G.+j..G..<...|.

followed by dozens of MiB of differences.

-0000b530: 646f 772d 6672 616d 652d 626f 7474 6f6d  dow-frame-bottom
-0000b540: 2d6d 6f7a 2d77 696e 646f 772d 6672 616d  -moz-window-fram
+0000b410: 646f 772d 6672 616d 652d 626f 7474 6f6d  dow-frame-bottom
+0000b420: 2d6d 6f7a 2d77 696e 646f 772d 6672 616d  -moz-window-fram

might be interesting as those to lines are the only difference in that particular block: in the first build they are at the end while they are at the beginning of the second one.

Either way: Alex/Manish: is there anything know on Rust's side that could be causing that? That's with ESR 68 and the self-compiled Rust 1.34.2. Any ideas what we could try to get a smaller testcase/scenario to reproduce the bug would be highly appreciated as well. :)

Back then in #26475 we had been fighting Rust related reproducibility issues but I double-checked that this bug is something different.

comment:6 Changed 4 weeks ago by boklm

I uploaded a tarball containing two versions of the file libgkrust-f4d3d8c9a1eaf037.a, as well as the output from diffoscope:
https://people.torproject.org/~boklm/builds/bug_32052/libgkrust.tar.xz
https://people.torproject.org/~boklm/builds/bug_32052/libgkrust.tar.xz.asc

3f89552d3f37c2e1bbe8beab8561cc397561f030a6d8d236516eb3645a2ad63a  libgkrust.tar.xz

comment:7 Changed 4 weeks ago by alexcrichton

Thanks for the cc and for the investigation into this! This looks like it's either a compiler but or a bug in the build system because the Rust object file is what's changing here. A compiler bug could definitely cause it but there may also be something nondeterministic being fed into rustc (e.g. generated code or something like that).

I don't know of rustc bugs off-hand (although I'm not exactly all-knowing!). In terms of minimization I think the "easiest" way would be to start playing whack-a-mole with code. Basically get to a point where you can edit the gkrust crate and then delete code incrementally until you can't get a compile difference.

If this is a build system bug then you may need to trace the nondeterministic codegen back further from gkrust, since generics may be getting monomorphized into gkrust itself.

comment:8 Changed 3 weeks ago by gk

Keywords: tbb-9.0-issues tbb-regression tbb-9.0.1-can added

comment:9 Changed 3 weeks ago by boklm

In order to isolate the issue, I made this change to our build script to only build the toolkit/library/rust directory:
https://gitweb.torproject.org/user/boklm/tor-browser-build.git/tree/projects/firefox/build?h=bug_32052_v3#n103

In the firsts two tests, I always got a non-matching build on the 2nd build. The third time however, there was 7 matching builds with checksum 6ced1d29d0cf7b34a1a7841e560a8a219e3bce103f6b3a1d0069702f41c00ca2, and the 8th one with checksum a88ee40a2c92ca8fcb48b984f13a5e24ced66b9d1fc32edc60fc4e17860dbcdc.

Then I tried this patch, which removes dependencies on jsrust:
https://gitweb.torproject.org/user/boklm/tor-browser-build.git/tree/projects/firefox/gkrust-disable.patch?h=bug_32052_v3

This seems to make the issue appears less frequently, as on the first test I got 20 matching builds. However on the second test I only got 12 matching builds (with checksum 2065ceebe53073eea3114690a581486e03b89259aedb333bad77f1aac08595f2), and a 13th one with checksum 4b2b8fb088b5d30b45127612e84050223a73f4fc6a8a2a001367f1291b54890e. On the third try, I got 20 builds matching 2065ceebe53073eea3114690a581486e03b89259aedb333bad77f1aac08595f2.

I am now trying to build with -j1.

comment:10 Changed 9 days ago by pili

Keywords: TorBrowserTeam201911 added; TorBrowserTeam201910 removed

Moving tickets to November 2019

comment:11 Changed 8 days ago by gk

Resolution: duplicate
Status: newclosed

Okay, I looked at it closer by examining the intermediate output as done in #32053 and it seems both tickets are in fact duplicates: the linux builds have proper non-optimized bytecode but the optimization is not done deterministically either. Closing and having #32053 as the sole one for this issue.

Note: See TracTickets for help on using tickets.