For some reasons boklm and I got different macOS bundles when building our rc for 9.0a8. Linux bundles are affected, too (seee #32052 (moved)) and other platforms as well.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I got differences for macOS after a while (but I agree with boklm that this seems to be harder to achieve). libgkrust.a does not match. More specifically gkrust-6f8221aa429c2389.gkrust.41si33dt-cgu.0.rcgu.o. I am not convinced yet that this is a duplicate of #32052 (moved) as the diff looks different enough. And it is considerably larger (way over 1 GiB!). I'll add the first thousand lines in case it helps.
While that's no resurrection of #26475 (moved) my gut tells me LTO might still be involved somehow. So, I am testing next disabling LTO and checking whether that changes things.
Okay, time to give an update here. bug_32053_v2 (https://gitweb.torproject.org/user/gk/tor-browser-build.git/log/?h=bug_32053_v2) contains two commits that reduce the build time while still being able to reproduce the bug. First of all, I am not 100% yet that LTO is not introducing a second reproducibility issue here but disabling it does not solve the bug I am hunting. It has the nice side-effect, though, that without LTO the build time of libgkrust.a goes down another approx. 2 minutes on my faster machine.
I don't blow the whole obj dir away anymore. Rather, I build everything the first time and if it's matching I just remove libstyle-*.rlib. After a while I get different Stylo .rlib files. Keeping those .rlib files and trying to check whether geckoservo or even gkrust builds trigger the bug (by just deleting their respective artifacts and checking whether libgkrust.a changes) is negative. So, I am fairly confident that building Stylo is the problem here.
That moves me to phase 2 in this exciting process: I'll start bisecting the Rust compiler to figure out where this bug started (while avoiding #26475 (moved) :) ) and I'll try to save even a bit more build time by not caring about libgkrust.a but doing the SHA-256 check against the Stylo .rlib directly.
Okay, time to give an update here. bug_32053_v2 (https://gitweb.torproject.org/user/gk/tor-browser-build.git/log/?h=bug_32053_v2) contains two commits that reduce the build time while still being able to reproduce the bug. First of all, I am not 100% yet that LTO is not introducing a second reproducibility issue here but disabling it does not solve the bug I am hunting. It has the nice side-effect, though, that without LTO the build time of libgkrust.a goes down another approx. 2 minutes on my faster machine.
FWIW, I tried to do the same for the linux32 build for #32052 (moved), and was able to reproduce the issue with servo.patch (I did not try with no_lto.patch yet) while cleaning only libstyle-* and *.a files after each build, and was able to reproduce the issue too.
Okay, bisecting Rust is hard due to Mozilla's Rust version requirements: I can repro the issue with 1.38.0 and 1.32.0. 1.30 seems to be too old. However, switching to esr60 and trying there does not work either as 1.30 and above are too new.
So, I guess the next plan is to check Firefox commits between esr60 and esr68 to figure out some which can get compiled with older Rust versions...
Okay, while still bisecting my way done to the Rust commit causing this I looked a bit closer at where the differences are showing up. It turns out that the libstyle rlib is already the problem and extracting that archive shows me that
a) rust.metadata.bin matches and
b) the bc.z files differ
c) the .o files differ
Alex, Manish: Does that give a hint in which direction we need to look? Like, is b) an indication that this is a clang issue? Or do the results give some other clues? For instance would it be helpful analyzing the bc.z files, if so how?
*.bc.z files in archives are a semi-custom compression format for LLVM IR files (we really should just use *.gz...) The *.o files are the codegen'd versions of those. Given that the LLVM IR is changing that means one of a few things:
Something in the source is changing, causing different IR to be produced
Rustc is non-deterministically producing IR
LLVM is non-deterministically optimizing IR
It's great to narrow this down to just one crate! I've found that minimization tends to make bisection much much easier. Given that this is related to rlibs this probably isn't related to LTO since those object files are all pre-LTO. In terms of minimizing this further, are the object files similarly named? If so are there "obvious diffs" within them? Otherwise if the object files have completely different names that'd be more worrisome!
If you can I'd recommend whacking away at the style crate's source code, deleting swaths of it as you can to see if you can get non-reproducible builds on one compiler. Basically at this point it's just a game of minimization to find the bug. If you've got a set of semi-digestable instructions to reproduce where you're at as well, we could try to pass this around and see if others can help chip in too to diagnose the bug.
*.bc.z files in archives are a semi-custom compression format for LLVM IR files (we really should just use *.gz...) The *.o files are the codegen'd versions of those. Given that the LLVM IR is changing that means one of a few things:
Something in the source is changing, causing different IR to be produced
Rustc is non-deterministically producing IR
LLVM is non-deterministically optimizing IR
It's great to narrow this down to just one crate! I've found that minimization tends to make bisection much much easier. Given that this is related to rlibs this probably isn't related to LTO since those object files are all pre-LTO. In terms of minimizing this further, are the object files similarly named? If so are there "obvious diffs" within them? Otherwise if the object files have completely different names that'd be more worrisome!
The object files have the same name but alas there are no obvious diffs. The diff file I am getting after running
is 300 MiB (!) large and skimming it nothing really sticks out.
One thing that's been interesting during all this bisecting is that there is not a variety of different results one can get when compiling the style crate. In fact, there are only two different .rlib files I've got so far per tested Rust version (if the Rust version contained the reproducibility bug).
If you can I'd recommend whacking away at the style crate's source code, deleting swaths of it as you can to see if you can get non-reproducible builds on one compiler. Basically at this point it's just a game of minimization to find the bug. If you've got a set of semi-digestable instructions to reproduce where you're at as well, we could try to pass this around and see if others can help chip in too to diagnose the bug.
Thanks. boklm has been working on minimizing the code that gets built when building the style crate. He might have some update on that.
In terms of diffing you may have better mileage diffing the *.bc.z files. While there's no standalone tool to extract those, the format is documented and you may be able to write a small manually program using flate2 to extract the *.bc file which you can then feed through llvm-dis. That textual representation may be a bit more diffable? (no offsets and whatnot).
Barring that though I suspect more progress will need to be made with further reductions.
*.bc.z files in archives are a semi-custom compression format for LLVM IR files (we really should just use *.gz...) The *.o files are the codegen'd versions of those. Given that the LLVM IR is changing that means one of a few things:
Something in the source is changing, causing different IR to be produced
Rustc is non-deterministically producing IR
LLVM is non-deterministically optimizing IR
While struggling with reducing libstyle size I got wondering whether there is a way to easily dzmp the output of those steps. For instance, is there a rustc option i could use to dump the IR before LLVM is optimizing it so that we can narrow further down where the issue in the toolchain lies? I guess if we go the route you mentioned in comment:13 we would get the LLVM optimized IR? If not I'd be interested in dumping that as well with some compilation setting, if possible.
If there aren't any such options to dump intermediate output yet, could you point me to the place in the compiler where I could hack this up?
Ah that's a good point! I should probably have mentioned that earlier too... In any case you can set RUSTFLAGS=-Csave-temps and that'll spray a massive amount of files all over the place (*.bc, *.o, etc). You should be able to basically run a diff of all those files between builds, and you can probably pick the smallest one which has a difference in it. The *.bc files should also be natively disassemble-able by llvm-dis
The important part here is that the style-eb257c29b0562cc6.style.5crbtq6r-cgu.0.rcgu.no-opt.bc files are matching while the style-eb257c29b0562cc6.style.5crbtq6r-cgu.0.rcgu.bc ones do not. Assuming no-opt means not-optimized then the problem happens in the optimization step(?) for the bytecode, so I guess the "LLVM is non-deterministically optimizing IR" step you mentioned above.
So, it seems to be an LLVM problem. Do you know whether we could work around that? Like using no-opt.bc for now? More importantly, though, do you have an idea how a small repro test could look like based on that information? Back in the day when working on #26475 (moved) you saved my day when you came up with a test snippet for an unrelated issue, so I could avoid staring at libstyle. I have the same hope for this issue. :)
I can look at the actual diff in the .bc files tomorrow if you think that would be helpful.
I agree with that conclusion as well in that it looks like LLVM may have a nondeterministic optimization somewhere in it. Can you upload the *.bc files so I could poke around at them? Both the no-opt and optimized versions if you can. Also, what rustc commit are you using? I'll try to get the same set of LLVM tools used on that commit.
In terms of how to keep minimizing, I think the first step is to use 100% pure LLVM tools to reproduce this. For example "run this command 1000 times and I get different results between runs". Given that the next best step would probably be to use bugpoint from LLVM to help reduce the input IR file into something smaller. Historically bugpoint has been a massive pain to use, but http://blog.llvm.org/2015/11/reduce-your-testcases-with-bugpoint-and.html was somewhat helpful to me in the past. The general idea is that you'll write a script which says whether an input module is "interesting", and in this case "interesting" means "I ran some LLVM optimizations a few times and it didn't always produce the same result".
In any case I can try to help with the bugpoint process once I'm able to reproduce with the input LLVM modules.
I agree with that conclusion as well in that it looks like LLVM may have a nondeterministic optimization somewhere in it. Can you upload the *.bc files so I could poke around at them? Both the no-opt and optimized versions if you can. Also, what rustc commit are you using? I'll try to get the same set of LLVM tools used on that commit.
I uploaded the files to https://people.torproject.org/~gk/misc/32053/ with the sha256sums as given in comment:17. We are using the 1.34.2 tag but if things are easier for you I can recreate the .bc files with the currently stable Rust as the issue is still there. BTW: thanks for your help, that's really appreciated.
In terms of how to keep minimizing, I think the first step is to use 100% pure LLVM tools to reproduce this. For example "run this command 1000 times and I get different results between runs".
That's my current plan. I am not familiar with the LLVM tools both in terms of which to use best and which parameters to deploy but ideally I like to take the no-opt.bc file from above, run just the optimization with some LLVM tool N times and hit the bug at some point. That should be fast enough to allow a meaningful Rust/LLVM bisect if needed.
Given that the next best step would probably be to use bugpoint from LLVM to help reduce the input IR file into something smaller. Historically bugpoint has been a massive pain to use, but http://blog.llvm.org/2015/11/reduce-your-testcases-with-bugpoint-and.html was somewhat helpful to me in the past. The general idea is that you'll write a script which says whether an input module is "interesting", and in this case "interesting" means "I ran some LLVM optimizations a few times and it didn't always produce the same result".
In any case I can try to help with the bugpoint process once I'm able to reproduce with the input LLVM modules.
Oh, something I forgot and which might be important: so far we only see this issue in a cross-compilation context. Assuming bugs like #32052 (moved) are actually the same problem (I'll verify that one later today) we've essentially seen this issue for any of our platforms we cross-compile for (Linux32, macOS, Windows, Android) but not for doing native Linux64 builds. Might be coincidence, though.
While digging into how the optimization is actually working and how Rust is using that I realized that we might be able to play with the optimization flags to narrow things further down if we don't find a better approach (librustc_codegen_llvm/brack/write.rs has the optimize() function which is a good start:
if config.opt_level.is_some() {
). There are more options that might play a role here (see: with_llvm_pmb() as well).
I've managed to reproduce this and it indeed looks like an LLVM issue! (yay?) I ran opt -O3 style.no-opt.bc -o foo.bc && md5sum foo.bc and I've gotten two different checksums after running a few times. I also just checked out the most recent LLVM trunk and I can see the same issue there.
I don't think there's anything else needed from rustc here, with these bitcode files it should be enough to just run these through LLVM's opt tool to find a reduction that is smaller than 90MB to report :). That being said I suspect that an LLVM bug could go ahead and get opened for this and LLVM folks might be able to help with the reduction here.
It's good to point out the cross-compile aspect, although I suspect that likely just happens to tickle the right portion of LLVM, and it's actually a bug for all platforms. We'll se though!
Do you want me to file the LLVM bug, or would you like to do so?
I've managed to reproduce this and it indeed looks like an LLVM issue! (yay?) I ran opt -O3 style.no-opt.bc -o foo.bc && md5sum foo.bc and I've gotten two different checksums after running a few times. I also just checked out the most recent LLVM trunk and I can see the same issue there.
I don't think there's anything else needed from rustc here, with these bitcode files it should be enough to just run these through LLVM's opt tool to find a reduction that is smaller than 90MB to report :). That being said I suspect that an LLVM bug could go ahead and get opened for this and LLVM folks might be able to help with the reduction here.
It's good to point out the cross-compile aspect, although I suspect that likely just happens to tickle the right portion of LLVM, and it's actually a bug for all platforms. We'll se though!
Do you want me to file the LLVM bug, or would you like to do so?
Awesome, thanks! Would you mind filing the bug mentioning all the necessary info for the llvm folks to look at (I am not sure in which components to put and whom to Cc etc.)? You can link to my files, I'll keep them there are least until the issue is resolved (Not sure if the llvm bug tracker allows such big files added). Please Cc me, if possible (gk [@] torproject [.] org).
I've opened up https://bugs.llvm.org/show_bug.cgi?id=43909 and will track that, I'm attempting to use LLVM's automatic test case reduction tools but it's likely going to take quite some time due to how large the module is.
Closed #32052 (moved) as an actual duplicate after inspecting the intermediate compilation output of non-matching results.
Trac: Description: For some reasons boklm and I got different macOS bundles when building our rc for 9.0a8.
to
For some reasons boklm and I got different macOS bundles when building our rc for 9.0a8. Linux bundles are affected, too (seee #32052 (moved)) and other platforms as well. Summary: macOS bundles for Tor Browser 9.0a8 are not reproducible to Tor Browser bundles based on Firefox 68 ESR are not reproducible (LLVM optimization issue)
I've opened up https://bugs.llvm.org/show_bug.cgi?id=43909 and will track that, I'm attempting to use LLVM's automatic test case reduction tools but it's likely going to take quite some time due to how large the module is.
Thanks! Let me know whether/how I can help here. I don't know much about opt and its options/flags, so there is some learning curve for me but I could spend some cycles tomorrow and the coming days given how important that bug is for us. Maybe I should just start bisecting figuring out where this got introduced. Might help tracking the optimization issue down. Either way, let me know.
I just posted a comment on the bug report with a much more minimal test case (only a few hundred KB!), it only took many cpu hours to extract :)
From here I'm still trying to reduce it further to increase the likelihood that someone from LLVM can help fix (I'm not so good at LLVM internals). This test case is small enough though that it may be pretty reasonable to bisect LLVM itself with. Dealing with a bitcode file across that many LLVM revisions may be pretty difficult though, so bisection likely won't be trivial.
I just posted a comment on the bug report with a much more minimal test case (only a few hundred KB!), it only took many cpu hours to extract :)
From here I'm still trying to reduce it further to increase the likelihood that someone from LLVM can help fix (I'm not so good at LLVM internals). This test case is small enough though that it may be pretty reasonable to bisect LLVM itself with. Dealing with a bitcode file across that many LLVM revisions may be pretty difficult though, so bisection likely won't be trivial.
Okay, it seems the optimization being the problem here is in -O1, which is unfortunate because I had some hope reducing the current -O2 to it could be a workaround... I am not sure whether -O0 is worth it. But it might be an option if we don't solve the bug until the next planned release.
I'll set up some bisecting in parallel to your efforts and see whether that gets us anywhere. I think I narrowed the problem down Rust version-wise quite a bit before (1.32 is still broken while I think 1.30 is good), which might help. If you get to the problem with bugpoint or some LLVM dev is helping meanwhile even better. :)
The bug seems to be in the -jump-threading pass which I suspect is included in the O1 optimizations, yeah, but this technically only arose during O3 when presumably enough inlining had happened to then trigger the bug. I'm not really sure what the best way to avoid this bug would be unfortunately, but I suspect that an -O1 build should be reproducible (albeit slow).
The bug seems to be in the -jump-threading pass which I suspect is included in the O1 optimizations, yeah, but this technically only arose during O3 when presumably enough inlining had happened to then trigger the bug. I'm not really sure what the best way to avoid this bug would be unfortunately, but I suspect that an -O1 build should be reproducible (albeit slow).
Actually -O2 is already enough. I can't trigger the issue with O1 nor with just jump-threading (and I tried pretty hard today). So, from those results I would say "something in -O2 is the problem" which brings me to the thought that we might hunt different bugs. :) But on the positive side of things I think I have a setup ready now for actual bisecting LLVM which I will pick up tomorrow.
Okay, so before I speculate further I double-check your results using -opt-bisect-limit at least figuring out which optimization is the culprit for the tests I am currently running.
FWIW: It might have been enough to just patch toolchain.configure but that would result in different opt-level options passed to rustc and I was not exactly sure what would happen in that case.
Trac: Status: new to needs_review Keywords: TorBrowserTeam201911 deleted, TorBrowserTeam201911R added
That sounds like a good idea to test this in the next alpha. And the patch looks good to me.
That strategy does not fly, alas, as using -O1 is causing build bustage on Linux at least (due to the current defense we have against proxy bypasses of Rust code), see: #32426 (moved).
Okay, so before I speculate further I double-check your results using -opt-bisect-limit at least figuring out which optimization is the culprit for the tests I am currently running.
Yeah, I can confirm that this is the -jump-threading operation here, too, good. Then let's get the LLVM bisecting going.
Alex: So, I tried to extract the problematic function name with llvm-extract but I failed so far due to my lack of knowledge of LLVM tools. How do I properly demangle the function name so that llvm-extract likes it? I tried llvm-cxxfilt but no dice. The opt output for the problematic function I get is:
BISECT: running pass (1208271) Jump Threading on function (_ZN83_$LT$style..values..specified..box_..Appearance$u20$as$u20$style..parser..Parse$GT$5parse17ha60227de7ee101e5E)
Oh for llvm-extract I used the -rfunc argument which is a regex instead of an exact name, like so: llvm-extract before.bc -rfunc=17h5949677e2a2fd343E -o before-extract.bc
Oh for llvm-extract I used the -rfunc argument which is a regex instead of an exact name, like so: llvm-extract before.bc -rfunc=17h5949677e2a2fd343E -o before-extract.bc
Thanks, that helped. However, I've tried to repro just doing the -jump-threading pass thousands of times on the same machine (kernel, glibc etc.), with the same clang version, essentially with the same script, as I reproduce the bug when running all the passes up to and including the problematic -jump-threading one: but I don't hit the bug that way, which seems to reproduce my results from comment:30. I wonder what we are missing here.
Oh so for just -jump-threading to work you'll need to do:
Start with foo.bc
Figure out smallest N where opt -O3 foo.bc -opt-bisect-limit is non-deterministic
Run opt -O3 -o input.bc -opt-bisect-limit=N-1 foo.bc
Use llvm-extract on input.bc to extract the function
Run opt -jump-threading over the extracted *.bc file
You won't be able to run -jump-threading over the original module, you'll need to run it over the module just before the output becomes nondeterministic.
I think Eli from LLVM found a fix at https://reviews.llvm.org/D70103, or at least that fixes the test cases for me locally. Can y'all patch LLVM locally to test out on your end?
Oh so for just -jump-threading to work you'll need to do:
Start with foo.bc
Figure out smallest N where opt -O3 foo.bc -opt-bisect-limit is non-deterministic
Run opt -O3 -o input.bc -opt-bisect-limit=N-1 foo.bc
Use llvm-extract on input.bc to extract the function
Run opt -jump-threading over the extracted *.bc file
You won't be able to run -jump-threading over the original module, you'll need to run it over the module just before the output becomes nondeterministic.
I think Eli from LLVM found a fix at https://reviews.llvm.org/D70103, or at least that fixes the test cases for me locally. Can y'all patch LLVM locally to test out on your end?
Okay, some status update. Bisecting gets hard. I am down to building stylo with Rust 1.30.0 and am at LLVM's b1546da0e8849d58fcdcf17fa1f2fab0cdae70a4 and have not reached the bottom yet. While exploring ways to move further here I'll investigate whether we can just move the jump-threading optimization out of -O1, say, into -O3 so the we are actually do not use that one.
Okay, some status update. Bisecting gets hard. I am down to building stylo with Rust 1.30.0 and am at LLVM's b1546da0e8849d58fcdcf17fa1f2fab0cdae70a4 and have not reached the bottom yet.
Building with Rust 1.29.2 I am down to LLVM's 8c59921ca3284ced1c358c4c86ec2c830db0bd70 and still have not reached the bottom...
While exploring ways to move further here I'll investigate whether we can just move the jump-threading optimization out of -O1, say, into -O3 so the we are actually do not use that one.
I tried that, surprisingly enough that does not fix the issue. I inspected the log of the optimization passes that got run and no jump-threading showed up (so, that part worked). However, I still got non-reproducible outcomes. I gonna ask on the LLVM bug whether that's expected and if not whether that could give us some lead where to look closer to understand the bug better.
Okay, some status update. Bisecting gets hard. I am down to building stylo with Rust 1.30.0 and am at LLVM's b1546da0e8849d58fcdcf17fa1f2fab0cdae70a4 and have not reached the bottom yet.
Building with Rust 1.29.2 I am down to LLVM's 8c59921ca3284ced1c358c4c86ec2c830db0bd70 and still have not reached the bottom...
While exploring ways to move further here I'll investigate whether we can just move the jump-threading optimization out of -O1, say, into -O3 so the we are actually do not use that one.
I tried that, surprisingly enough that does not fix the issue. I inspected the log of the optimization passes that got run and no jump-threading showed up (so, that part worked). However, I still got non-reproducible outcomes. I gonna ask on the LLVM bug whether that's expected and if not whether that could give us some lead where to look closer to understand the bug better.
So, it turns out the issue Alex was hitting and reporting to LLVM is a new determinism problem. I verified that the commit he used + the patch applied gives deterministic output, which is promising as we can theoretically bisect our way to the fix for the bug we are hitting. However, Eli over in the LLVM bug said there were some determinism related fixes and it might be hard to figure out which is affecting us. :(
So, I am back with bisecting the optimization options with the patch from Eli applied to rule out that the jump-threading issue is affecting us. That takes considerable time on my machines, though, alas. :(