Opened 6 years ago

Closed 5 years ago

#9829 closed defect (fixed)

Firefox ESR 24 does need a newer compiler than gcc 4.2

Reported by: gk Owned by: erinn
Priority: Medium Milestone:
Component: Applications/Tor bundles/installation Version:
Severity: Keywords: tbb-3.0, ff24-esr, MikePerry201312R
Cc: mingwandroid, dcf Actual Points:
Parent ID: #10103 Points:
Reviewer: Sponsor:

Description

Our current Mac toolchain is using gcc 4.2 to cross-compile Firefox for Mac OS X. That is not working anymore without patching.

Child Tickets

Attachments (32)

crosstool.config (1.1 KB) - added by gk 6 years ago.
crosstool.config_real (1.1 KB) - added by gk 6 years ago.
crosstool.config_real2 (1.1 KB) - added by gk 6 years ago.
crosstool.config_real3 (1.1 KB) - added by gk 6 years ago.
clang_error1 (1.1 KB) - added by gk 6 years ago.
clang_error2 (85.3 KB) - added by gk 6 years ago.
clang_error_optimized.log (513.9 KB) - added by gk 6 years ago.
clang_error_optimized_config.log (138.5 KB) - added by gk 6 years ago.
clang_error_not_optimized.log (587.9 KB) - added by gk 6 years ago.
clang_error_not_optimized_config.log (147.8 KB) - added by gk 6 years ago.
0001-clang-compile-patch.patch (2.3 KB) - added by gk 6 years ago.
.mozconfig-mac-test (933 bytes) - added by gk 6 years ago.
clang_error_not_optimized24.log (522.8 KB) - added by gk 6 years ago.
clang_error_not_optimized_config24.log (175.1 KB) - added by gk 6 years ago.
clang33CompileFailureESR17 (8.7 KB) - added by gk 6 years ago.
linkerrori386.tar.bz2 (123.7 KB) - added by gk 6 years ago.
cctoolsfailure.log (18.6 KB) - added by gk 6 years ago.
Mac10_6_crash (27.4 KB) - added by gk 6 years ago.
sqliteTracing32 (20.9 KB) - added by gk 6 years ago.
sqliteTracing64 (14.7 KB) - added by gk 6 years ago.
config32bit (7.1 KB) - added by gk 6 years ago.
config64bit (7.0 KB) - added by gk 6 years ago.
sqliteTracingTiming32 (21.4 KB) - added by gk 6 years ago.
sqliteTracingTiming64 (14.9 KB) - added by gk 6 years ago.
crosstool-ng-and-ESR24-script-configs-patches.tar.xz (4.2 KB) - added by mingwandroid 6 years ago.
Ray's WIP build sript/configs/patches for building crosstool-ng and ESR24
130-Bug-933071---add---with-macos-private-frameworks-t.patch (2.8 KB) - added by mingwandroid 6 years ago.
Backported patch by Nathan Froyd
0001-new-.mozconfig-file-for-the-new-cross-compiler-and-E.patch (2.5 KB) - added by gk 5 years ago.
0002-updating-the-Gitian-Firefox-descriptor-for-Mac-OS-X-.patch (3.6 KB) - added by gk 5 years ago.
0003-breakpad.patch (1.6 KB) - added by gk 5 years ago.
0004-ctypes.patch (1.5 KB) - added by gk 5 years ago.
0005-otool.patch (877 bytes) - added by gk 5 years ago.
0006-va-list.patch (6.8 KB) - added by gk 5 years ago.

Change History (116)

comment:1 Changed 6 years ago by nickm

Longerm, I'd advocate moving to a newer gcc anyways. The speed of generated code for our ECC crypto primitives is better (about 12%) with more recent gcc (4.7).

comment:2 Changed 6 years ago by mikeperry

gk - You may want to try the instructions in #9711 to see if that compiler works better? If you need help I can put you in contact with Ray. He's pretty helpful and eager for his patches to see testing.

comment:3 Changed 6 years ago by gk

Yeah, that was my first idea. I'll get back to you if I need Ray's help.

comment:4 Changed 6 years ago by gk

Okay, we probably start here with crosstools-ng and clang as clang is the only compiler supported by Mozilla for Mac OS X.

Relevant bugs might be:

https://bugzilla.mozilla.org/show_bug.cgi?id=784029
https://bugzilla.mozilla.org/show_bug.cgi?id=787931

and as kind of a meta-bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=733905

If that does not work out we might want to look at the tenfourfox way

https://code.google.com/p/tenfourfox/issues/detail?id=52

See: https://bugzilla.mozilla.org/show_bug.cgi?id=700524 as well here.

Last edited 6 years ago by gk (previous) (diff)

comment:5 Changed 6 years ago by gk

Cc: mingwandroid added

comment:6 Changed 6 years ago by mingwandroid

I am looking into this now. If there's some patches or (better) a branch that's setup to use ESR 24 please point me to it.

comment:7 Changed 6 years ago by mingwandroid

I pushed some fixes to crosstool-ng (cctools-llvm branch as usual). I was able to successfully compile a C++ hello-world.cpp:

#include <iostream>
using namespace std;

int main(int argc, char const* argv[])
{
        cout << "Hello World!";
        return 0;
}

.. for both i686 and x86_64 using the following commandline:

i686-apple-darwin11-clang++ -arch x86_64 -arch i686 hello-world.cpp -o hello-world --sysroot $HOME/MacOSX10.7.sdk 

.. in crosstool.config I set:

CT_STATIC_TOOLCHAIN=n
CT_DARWIN_COPY_SDK_TO_SYSROOT=n

Tomorrow I will try to cross-compile Python then, time permitting, ESR 24.

comment:8 Changed 6 years ago by mingwandroid

Good news. I ran both the 32bit and 64bit variants of hello-world on a Mac and they worked correctly.

comment:9 in reply to:  6 Changed 6 years ago by gk

Replying to mingwandroid:

I am looking into this now. If there's some patches or (better) a branch that's setup to use ESR 24 please point me to it.

The easiest thing is probably to test the current setup with ESR 17 as I expect not much trouble switching to ESR 24 once we get ESR 17 cross-compiled. So, downloading and trying a vanilla ESR 17 from ftp.mozilla.org seems good enough to me or take the tor-browser repo you already cloned for getting the gitian builds running. That said, there is a ESR 24 branch that contains at least some of the already rebased patches and built on Linux and Windows last time I tried:

https://git.torproject.org/user/mikeperry/tor-browser.git

I tested the binaries you linked to in one of your last mails with your sample C++ file and the respective arguments but it failed again (it could not find the libstdc++). Not sure what's wrong here. Maybe I should use your newer binaries?

comment:10 Changed 6 years ago by mingwandroid

I tested the binaries you linked to in one of your last mails with your sample C++ file and the respective arguments but it failed again (it could not find the libstdc++). Not sure what's wrong here. Maybe I should use your newer binaries?

Yeah, that won't work as I only fixed the issues in crosstool-ng last night, as I said:

I pushed some fixes to crosstool-ng (cctools-llvm branch as usual).

.. so yeah, you will need new binaries. Mine were built with CT_DEBUGGABLE_TOOLCHAIN=y and are therefore too large to share. Anyway, we probably want to test determinism of crosstool-ng builds at this point so I think we should both build new compilers and compare results. Please post me your crosstool.config when you kick off a build (use the two options I recommended above and also CT_DEBUGGABLE_TOOLCHAIN=n).

Changed 6 years ago by gk

Attachment: crosstool.config added

Changed 6 years ago by gk

Attachment: crosstool.config_real added

comment:11 in reply to:  10 Changed 6 years ago by gk

Replying to mingwandroid:

I tested the binaries you linked to in one of your last mails with your sample C++ file and the respective arguments but it failed again (it could not find the libstdc++). Not sure what's wrong here. Maybe I should use your newer binaries?

Yeah, that won't work as I only fixed the issues in crosstool-ng last night, as I said:

I pushed some fixes to crosstool-ng (cctools-llvm branch as usual).

Ah, okay. I did not realize that these were the fixes for the libstdc++ issue...

.. so yeah, you will need new binaries. Mine were built with CT_DEBUGGABLE_TOOLCHAIN=y and are therefore too large to share. Anyway, we probably want to test determinism of crosstool-ng builds at this point so I think we should both build new compilers and compare results. Please post me your crosstool.config when you kick off a build (use the two options I recommended above and also CT_DEBUGGABLE_TOOLCHAIN=n).

See: the crosstool.config_real. I start building it via gitian...

comment:12 Changed 6 years ago by mingwandroid

Ah, sorry, I just remembered ..
Due to touch --date not working on OSX I had to change the format of
CT_SRC_REFERENCE_DATETIME
so we need to change:
CT_SRC_REFERENCE_DATETIME=="1999-12-31 23:59:59"
to:
CT_SRC_REFERENCE_DATETIME=="19991231235959.59"

Can you make this change and re-attach a new crosstool.config? I could of course do it myself, but I want to use the same file as you, down to the byte if possible here.

comment:13 Changed 6 years ago by mingwandroid

Oh, I didn't mean the ==, just single =

Changed 6 years ago by gk

Attachment: crosstool.config_real2 added

comment:14 in reply to:  12 Changed 6 years ago by gk

Replying to mingwandroid:

Can you make this change and re-attach a new crosstool.config? I could of course do it myself, but I want to use the same file as you, down to the byte if possible here.

Done and re-building...

comment:15 Changed 6 years ago by gk

touch is complaining about the new date format, trying to find a better one...

Last edited 6 years ago by gk (previous) (diff)

Changed 6 years ago by gk

Attachment: crosstool.config_real3 added

comment:16 Changed 6 years ago by gk

One "59" too much. Attached *real3 and start re-building...

comment:17 Changed 6 years ago by mingwandroid

Thanks, also re-building...

comment:18 Changed 6 years ago by mingwandroid

mingw-w64-svn-snapshot.zip: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
.. any ideas?

comment:19 Changed 6 years ago by mingwandroid

More good news. I cross-compiled Python 2.7.5 for both x86 and x86-64 using clang 3.3 and it runs on OSX. This gives me a good amount of confidence about the quality of the clang toolchain.

Changed 6 years ago by gk

Attachment: clang_error1 added

Changed 6 years ago by gk

Attachment: clang_error2 added

comment:20 in reply to:  18 Changed 6 years ago by gk

Replying to mingwandroid:

mingw-w64-svn-snapshot.zip: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
.. any ideas?

Well, there could have been issues with your download or changes in the versions file slipped somehow in. But if you are just testing a Mac build you can safely ignore that.

I started compiling Firefox and did not come very far. clang_error1 shows the issue if I pass configure "--enable-optimize". I can bypass this issue if I use "--disable-optimize". However, then the builds after compiling some files with errors in clang_error2. I'll look into it tomorrow if you are not beating me to it.
The sha256sum of my i686-apple-darwin10.tar.bz2 file (>800MB!) is 39eec577b9d24bafaebcba80ab593e11ad92acdf47a17d48f176faf8d1e8fbd8

comment:21 Changed 6 years ago by mingwandroid

Are these logs with ESR 17?

Can you give me the full stdout/err logs, the config.log files and also detailed reproduction steps and patches used?

My crosstool-ng build is finishing up so I hope to be able to get the sha256sum soon. When we tar the file, we should pass --exclude='lib/*.a' as the majority of that 800MB will be LLVM static libraries that are not needed.

comment:22 in reply to:  21 Changed 6 years ago by gk

Replying to mingwandroid:

Are these logs with ESR 17?

Yes.

Can you give me the full stdout/err logs, the config.log files and also detailed reproduction steps and patches used?

Attached are the Gitian error logs and config.log files, the patch for Gitian and the custom mozconfig file used by the Firefox build process. It is against master (commit 651110fbf74c17a72c37268638115d6f1ea39b5c).
Steps to reproduce:

1) Make sure you have run fetch-inputs.sh after switching to commit 651110fbf74c17a72c37268638115d6f1ea39b5c.
2) Apply the patch and copy .mozconfig-mac-test and your i686-apple-darwin10.tar.bz2 to gitian-builder/inputs
3) Run gGtian and this should give you the optimized error
4) Comment out "--enable-optimize" and enable "--disable-optimize" in .mozconfig-mac-test
5) Run Gitian and this should give you the error if optimization is disabled

My crosstool-ng build is finishing up so I hope to be able to get the sha256sum soon. When we tar the file, we should pass --exclude='lib/*.a' as the majority of that 800MB will be LLVM static libraries that are not needed.

Okay, did that which reduced the bz2 size to ca 300MB. The new sha256sum is: 363eb0b76819425b38f56ad4a7bf2c6bee5bc26bdae09e2e2a05a89b40c370eb

Changed 6 years ago by gk

Attachment: clang_error_optimized.log added

Changed 6 years ago by gk

Changed 6 years ago by gk

Changed 6 years ago by gk

Changed 6 years ago by gk

Changed 6 years ago by gk

Attachment: .mozconfig-mac-test added

comment:23 Changed 6 years ago by mingwandroid

Thanks, I will study that a bit later.

My compilers built now, but I don't have a i686-apple-darwin10.tar.bz2. Seems that my gitian-firefox.yml doesn't tar up this archive. Am I using the wrong one or is this a step you are doing manually post-build via on-target? If so, please tell me the root folder and also the full command line.

comment:24 Changed 6 years ago by gk

Yes, I did that manually but not via on-target (which is probably working as well). I mount the image via qemu-nbd (see: https://en.wikibooks.org/wiki/QEMU/Images#Mounting_an_image_on_the_host) and change to the /home/ubuntu/build directory and do:

tar cf i686-apple-darwin10.tar i686-apple-darwin10 --exclude="lib/*.a"
bzip2 i686-apple-darwin10.tar

I attached the config.log (and build log) for ESR 24. I just run Gitian (with --disabled-optimize in the .mozconfig file) and it now contains a lot of the errors that were in the build log of ESR 17 (+ new exciting ones :) ).

Changed 6 years ago by gk

Changed 6 years ago by gk

comment:25 Changed 6 years ago by gk

Actually it seems the optimized/non-optimized versions are complaining about the same issues but are only failing differently...

comment:26 Changed 6 years ago by gk

The preprocessor fallback to /lib/cpp seems to cause the problems. If I set the preprocessor in the .mozconfig-mac-test:

CPP="$HOME/build/i686-apple-darwin10/bin/i686-apple-darwin10-cpp"

then things look much better IMO. I get some undefined symbols when linking libmozglue.dylib. Not sure if I am on the right track, though. Further investigating...

comment:27 Changed 6 years ago by mingwandroid

Maybe:

CPP="$HOME/build/i686-apple-darwin10/bin/i686-apple-darwin10-clang"

.. tough maybe not. I will re-join the investigation tomorrow unless you get it fixed before then!

comment:28 Changed 6 years ago by gk

No, that does not help. Turns out the issue is quite subtle. Even though i686-apple-darwin10-clang++ is just a link to i686-apple-darwin10-clang we must use the former when defining "CXX" in .mozconfig-mac-test. There is probably some configure magic that is setting different (and correct) flags in that case. Now, on to the next broken thing...

comment:29 Changed 6 years ago by gk

The "SSE instruction set not enabled" error can be fixed by using "-arch i386" instead of "-arch i686". Now I am stuck at "ld: framework not found CoreServices" while linking libnspr4.dylib.

comment:30 in reply to:  29 Changed 6 years ago by gk

Replying to gk:

The "SSE instruction set not enabled" error can be fixed by using "-arch i386" instead of "-arch i686". Now I am stuck at "ld: framework not found CoreServices" while linking libnspr4.dylib.

"-sysrootlib" was set wrong. Make sure that "LDFLAGS" in gitian-firefox.yml has "-isysroot /usr/lib/apple/SDKs/MacOSX10.6.sdk/" as value (it seems I deleted it for some reason on my branch). Moving on to Spidermonkey...

comment:31 Changed 6 years ago by mingwandroid

Thanks for the tips, you made some really good progress. I will kick off some builds again with the changes you mentioned.

comment:32 Changed 6 years ago by gk

Ray: Is it possible to build the cross-compiler with a clang version < 3.3? If so, what do I have to do? It seems ESR17 is broken with clang 3.3 :( But it should work at least with clang 3.1. Will test ESR 24 shortly...

comment:33 Changed 6 years ago by mingwandroid

Ray: Is it possible to build the cross-compiler with a clang version < 3.3?

Yes, but I need to backport some recent fixes from 3.3 to 3.1 (and 3.2 while I'm at it) first.

It's probably worthwhile fixing the ESR17 clang 3.3 issue as clang 3.3 should produce faster binaries. Let me know how ESR24 goes, I will backport the 3.3 fixes tonight and then give you details of how to build it (I think it's just a matter of replacing all 3_3 in crosstool.config and with 3_1).

comment:34 Changed 6 years ago by gk

Regarding ESR17: I set additionally

CXXCPP="$HOME/build/i686-apple-darwin10/bin/i686-apple-darwin10-cpp"

as /lib/cpp is wrong for the C++ preprocessor as well. And you need to export the path to the (clang) binaries (e.g. in your gitian-firefox.yml), otherwise there is some problem building the NSPR library. If you have done this you'll get the error in the clang33CompileFailureESR17 file attached to this bug. Compiling 3.1 and using it on a Mac OS X 10.6.8 to compile Firefox works while compiling a clang 3.3 and using that breaks with the same error which leads me to the conclusion clang 3.3 is not suitable for compiling ESR17 at all.

Regarding ESR24: Apart from two minor but different issues I almost get the ESR24 compiled. It fails near the end with:

string_conversion.cc
In file included from /home/ubuntu/build/tor-browser/toolkit/crashreporter/google-breakpad/src/common/mac/arch_utilities.cc:30:0:
/home/ubuntu/build/tor-browser/toolkit/crashreporter/google-breakpad/src/common/mac/../../common/mac/arch_utilities.h:35:25: fatal error: mach-o/arch.h: No such file or directory
compilation terminated.
/home/ubuntu/build/tor-browser/obj-macos/_virtualenv/bin/python /home/ubuntu/build/tor-browser/config/pythonpath.py \
	  -I/home/ubuntu/build/tor-browser/other-licenses/ply \
	  -I/home/ubuntu/build/tor-browser/xpcom/typelib/xpt/tools \
	  /home/ubuntu/build/tor-browser/obj-macos/dist/sdk/bin/typelib.py -I/home/ubuntu/build/tor-browser/toolkit/components/commandlines -I../../../dist/idl /home/ubuntu/build/tor-browser/toolkit/components/commandlines/nsICommandLineValidator.idl -d .deps/nsICommandLineValidator.xpt.pp -o _xpidlgen/nsICommandLineValidator.xpt
make[6]: *** [host_arch_utilities.o] Error 1
make[6]: Leaving directory `/home/ubuntu/build/tor-browser/obj-macos/toolkit/crashreporter/google-breakpad/src/common/mac'
make[5]: *** [crashreporter/google-breakpad/src/common/mac_libs] Error 2
make[5]: /home/ubuntu/build/i686-apple-darwin10/bin/i686-apple-darwin10-clang++ -arch i386 -isysroot /usr/lib/apple/SDKs/MacOSX10.6.sdk -o string_conversion.o -c  -fvisibility=hidden -DNO_NSPR_10_SUPPORT -I/home/ubuntu/build/tor-browser/toolkit/crashreporter/google-breakpad/src/common/.. -I../../../../../dist/include  -fPIC -Qunused-arguments  -Qunused-arguments -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -Wno-c++0x-extensions -Wno-extended-offsetof -Wno-unknown-warning-option -Wno-return-type-c-linkage -Wno-mismatched-tags -isysroot /usr/lib/apple/SDKs/MacOSX10.6.sdk/ -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pthread -DNO_X11 -pipe -DHAVE_MACH_O_NLIST_H  -DNDEBUG -DTRIMMED -g -O3 -fomit-frame-pointer  -Qunused-arguments  -DMOZILLA_CLIENT -include ../../../../../mozilla-config.h -MD -MP -MF .deps/string_conversion.o.pp  /home/ubuntu/build/tor-browser/toolkit/crashreporter/google-breakpad/src/common/string_conversion.cc
*** Waiting for unfinished jobs....

There is hope. But I don't think it is worth fixing ESR17 given that Tor Browser with ESR24 is coming soon.

Changed 6 years ago by gk

Attachment: clang33CompileFailureESR17 added

comment:35 Changed 6 years ago by mingwandroid

I'm not sure why you're having mach-o/arch.h problems. Can you test the following:

find ~/x-tools/i686-apple-darwin10 -name "arch.h" -exec ls -l {} \;
-r--r--r-- 1 ubuntu ubuntu 4231 Oct 14 17:31 /home/ubuntu/x-tools/i686-apple-darwin10/include/mach-o/arch.h

and also:

find /home/ubuntu/MacOSX10.6.sdk -name "arch.h" -exec ls -l {} \;
-rw-r--r-- 1 ubuntu ubuntu 4185 Apr  8  2011 /home/ubuntu/MacOSX10.6.sdk/usr/include/mach-o/arch.h

I also tested compiling just this .cc file (outside of the gitian build system, in fact, just a basic test from the commandline:

/c/x/dx-i686-3_3/bin/i686-apple-darwin11-clang++ /home/ubuntu/ESR24-work/mozilla-esr24/toolkit/crashreporter/google-breakpad/src/common/mac/arch_utilities.cc --sysroot ~/MacOSX10.6.sdk -I/home/ubuntu/ESR24-work/mozilla-esr24/toolkit/crashreporter/google-breakpad/src/ -c

.. and that worked. You'll notice I used a GCC built as darwin11, but that *shouldn't* matter.

I've just finished making clang 3.0-3.2 patches to bring them in-line with 3.3. After some testing I will commit them. I will add a comment when this is done. I think that 3.3 and ESR24 is the best thing to focus on, however for the ESR17 bug I googled it and found:

https://bugzilla.mozilla.org/show_bug.cgi?id=887645

"I'm pretty sure you can enclose that DebugOnly line in #ifdef DEBUG / ... / #endif to fix this."

comment:36 in reply to:  35 ; Changed 6 years ago by gk

Replying to mingwandroid:

I also tested compiling just this .cc file (outside of the gitian build system, in fact, just a basic test from the commandline:

/c/x/dx-i686-3_3/bin/i686-apple-darwin11-clang++ /home/ubuntu/ESR24-work/mozilla-esr24/toolkit/crashreporter/google-breakpad/src/common/mac/arch_utilities.cc --sysroot ~/MacOSX10.6.sdk -I/home/ubuntu/ESR24-work/mozilla-esr24/toolkit/crashreporter/google-breakpad/src/ -c

.. and that worked. You'll notice I used a GCC built as darwin11, but that *shouldn't* matter.

Well, the files are there. The problem is that this particular file is compiled with the host C++ compiler. And passing the proper include path to it seems harder than I thought. I need to look deeper at the problem...

https://bugzilla.mozilla.org/show_bug.cgi?id=887645

"I'm pretty sure you can enclose that DebugOnly line in #ifdef DEBUG / ... / #endif to fix this."

Good catch! For some reason Startpage gave me just an older, similar bug and not that. And using Google with Tor is a PITA.

comment:37 in reply to:  36 Changed 6 years ago by gk

Replying to gk:

Replying to mingwandroid:

I also tested compiling just this .cc file (outside of the gitian build system, in fact, just a basic test from the commandline:

/c/x/dx-i686-3_3/bin/i686-apple-darwin11-clang++ /home/ubuntu/ESR24-work/mozilla-esr24/toolkit/crashreporter/google-breakpad/src/common/mac/arch_utilities.cc --sysroot ~/MacOSX10.6.sdk -I/home/ubuntu/ESR24-work/mozilla-esr24/toolkit/crashreporter/google-breakpad/src/ -c

.. and that worked. You'll notice I used a GCC built as darwin11, but that *shouldn't* matter.

Well, the files are there. The problem is that this particular file is compiled with the host C++ compiler. And passing the proper include path to it seems harder than I thought. I need to look deeper at the problem...

I found a workaround: I disabled compiling that part of the code. That is fine for now as we already disable building the crashreporter and the (problematic) parts that are even build in this case are used for profiling which is not enabled in TBB. If I manage to include the path to the missing header files at all other parts of the build are blowing up complaining about missing symbols when linkning. I probably talk to some Mozilla build folks about this in order to get it solved properly. But what worries me more is the next failure: linking libxul fails with

ld: in ../../dom/bindings/SVGFilterElementBinding.o, can't map file, errno=12 for architecture i386
i686-apple-darwin10-clang: error: linker command failed with exit code 1 (use -v to see invocation)

First, that fails both in gitian and without a VM with the same error message. Second, I still have enough memory for linking (startpaging a bit it was suggested that 'errno=12' occurs if memory is missing; not sure about that, though). Third, the build with a clang 3.3 on a Mac with half of the memory my build machine has is completing properly. Thus, that might be an issue with the cross-compiling toolchain. Anyway, pointers for debugging this further (and fixing it) are greatly appreciated.

comment:38 Changed 6 years ago by gk

I have a suspicion here: I specify "-arch i386" (might be no good fix then, see comment 29) but am still using "--target=i686-apple-darwin10". Trying to get both to use "i386" or both to "i686" now...

comment:39 Changed 6 years ago by gk

Using "-arch i386" and "--target=i386-apple-darwin10" gives attached link error (missing symbols for i386) if I point ranlib and friends to the respective binaries in i686-apple-darwin10/bin (without doing that the build is already failing earlier as Firefox is looking for i386-apple-darwin-ranlib etc. which the cross-compiler does not have.

Changed 6 years ago by gk

Attachment: linkerrori386.tar.bz2 added

comment:40 Changed 6 years ago by mingwandroid

I found a workaround: I disabled compiling that part of the code.

Can you tell me how you disabled this or send me a patch to do that?

The "-arch i386"/"--target=i386-apple-darwin10" stuff should not be needed since commit:

https://github.com/diorcety/crosstool-ng/commit/b8ef79d3dc63b5158ea168bdfba85c489c884b00

You can see that it transforms i?86 to i386 which is then added as an implicit --target= commandline (clang does a lot of this!)

You are using clang 3.3 and not 3.1? And you definitely rebuilt with that commit?

comment:41 in reply to:  40 Changed 6 years ago by gk

Replying to mingwandroid:

I found a workaround: I disabled compiling that part of the code.

Can you tell me how you disabled this or send me a patch to do that?

In the ESR 24 tor-browser repo in configure.in you change

MOZ_ENABLE_PROFILER_SPS=1

to

MOZ_ENABLE_PROFILER_SPS=

A vanilla Firefox ESR 24 should do the trick as well, I guess.

The "-arch i386"/"--target=i386-apple-darwin10" stuff should not be needed since commit:

https://github.com/diorcety/crosstool-ng/commit/b8ef79d3dc63b5158ea168bdfba85c489c884b00

You can see that it transforms i?86 to i386 which is then added as an implicit --target= commandline (clang does a lot of this!)

You are using clang 3.3 and not 3.1? And you definitely rebuilt with that commit?

Yes, I am using clang 3.3 and I built the cross-compiler with commit 9a711e316a9c374f815c3d018dd2614fea2382d5 which is a later one (see our disussion above, comment 10ff).

The "arch -i686" and "--target=i686-apple-darwin10" build just "finished" and gives me the same error message as mentioned in comment 37 (inclusive the reference to i386).

If you need further information or have ideas on how to debug this, just ask/write :)

Last edited 6 years ago by gk (previous) (diff)

comment:42 Changed 6 years ago by mingwandroid

No further ideas just now. I hope to spend some time this weekend investigating though.

comment:43 Changed 6 years ago by gk

I got ESR17 compiled (outside gitian first) with the cross-compiler! That gives me now the chance to bisect my way down to the commit that broke ESR 24 for us (in theory :) ). I am starting with that right now and am trying to get ESR17 built with the gitian way to test the resulting bundle(s)...

comment:44 Changed 6 years ago by gk

Ray, I've got some questions: 1) doing

./i686-apple-darwin10-ld --help

gives me

ld64: For information on command line options please use 'man ld'

But the linker is 32bit, no?
2) How do I build a 64bit toolchain, is that possible at all?
3) If so, can I generate 32bit code with it?

comment:45 Changed 6 years ago by mingwandroid

ld64 is the next generation linker that Apple developed to be 64bit capable (and to interface with clang/llvm for link time optimization). It can handle 32bit just fine though and the old linker (now called ld classic) should be avoided.

These toolchains are multilib-enabled, meaning that irrespective of whether the host compilers are 32bit or 64bit, when targeting (Intel) Darwin, they can generate both i686 and x86_64 object files. In-fact, as with native Darwin clang, you can build 'fat' executables that contain binaries for both 32bit and 64bit in a single invocation as follows:

i686-apple-darwin10-clang -arch i386 -arch x86_64 blah.c
or:
i686-apple-darwin10-clang++ -arch i386 -arch x86_64 blah.cpp

(to pass different per-arch flags should you need to, you can use the -Xarch flag)

Is gitian builder 64bit capable? 64bit host toolchain *should* be possible (I built one a few months ago). You need to remove the "-m32" options from crosstool.config for this (and perhaps a few other options).

comment:46 in reply to:  45 Changed 6 years ago by gk

Replying to mingwandroid:

Is gitian builder 64bit capable? 64bit host toolchain *should* be possible (I built one a few months ago). You need to remove the "-m32" options from crosstool.config for this (and perhaps a few other options).

Yes, it is. We are already building 64bit TBBs for Linux. For the background of my question I started a thread on Mozilla's dev.build list (which is not indexed yet) asking about the linker problem and the best answer I got back up to now is:

Well, it could still be that the linker mmaps a bunch of stuff and so
while there's free ram the linker is out of address space.  I guess you
need to get a 64 bit linker or find ways to get your linker to use less
memory (maybe see if you can make it mmap less?).

I am currently trying to verify that theory (and one idea is to build everything on a 64bit system).

comment:47 in reply to:  45 Changed 6 years ago by gk

Replying to mingwandroid:

You need to remove the "-m32" options from crosstool.config for this (and perhaps a few other options).

Which other options do you have in mind? Do you have a working example crosstool.config by chance? And do I use that then with i686-apple-darwin11 or with x86_64-apple-darwin10 or doesn't that matter? Interestingly the latter still has the "-m32" options...

comment:48 Changed 6 years ago by gk

I started with i686-apple-darwin11 and modifying the .config file before I started with |ct-ng build|. I came across a openssl configure bug which is worth reporting: in 310-openssl.sh line 96 the "linux-generic64" option is not working. The build fails like http://openssl.6102.n7.nabble.com/compile-openssl-1-0-1e-failed-on-Ubuntu-12-10-x64-td44699.html Using "linux-x86_64" helps in this case.
Now I am stuck at a build failure in cctools (see the attached cctoolsfailure.log) Are there some headers missing on my system? If so, shouldn't configure fail hard?

Changed 6 years ago by gk

Attachment: cctoolsfailure.log added

comment:49 Changed 6 years ago by gk

That is weird. Looking at the libuuid configure output shows that e.g. getuid and getgid are found there while they are not declared for cctools...

comment:50 Changed 6 years ago by mingwandroid

Thanks for the OpenSSL fix, I tested and applied it:

https://github.com/diorcety/crosstool-ng/commit/23f28d8fc7f6e3f7e5428715f6cb7814d14fc6a4

I can also confirm that I have the same libuuid problem as you; I am investigating now.

comment:51 Changed 6 years ago by mingwandroid

Libuuid -> cctools..

I fixed that. FYI the problem is that cctools is a tricky build as it must include system headers for both host and Darwin.

If you update you should be able to get as far as clang which will fail. So probably worth holding off for now. I will look into this ASAP.

comment:52 Changed 6 years ago by mingwandroid

Looking into the failure it was:

final link failed: No space left on device

Oops .. so it is worth trying to build a new toolchain now.

comment:53 Changed 6 years ago by gk

That worked, thanks!

Last edited 6 years ago by gk (previous) (diff)

comment:54 Changed 6 years ago by mingwandroid

That worked, thanks!

Great. I've since pushed a few more fixes, mainly because my x86_64 fixes broke x86, so nothing that you'd need urgently.

I managed to build (fairly minimal builds of) Python 2.7.5 for both Darwin-x86 and Darwin-x86_64 using a Linux-x86_64 hosted clang 3.3 and I ran both successfully on OSX.

comment:55 Changed 6 years ago by gk

So, the 64bit toolchain let me link a 32bit libxul and the 32bit (with -arch i686) compilation succeeded. Alas, the build is crashing on my old Mac 10.6 (see Mac10_6_crash), not sure why yet. Two things that might be worth noting:

1) The 64bit toolchain is not capable compiling a 64bit version of Firefox yet. There is this error showing up:

/home/firefox64/x-tools/x86_64-apple-darwin10/bin/x86_64-apple-darwin10-clang++ -arch x86_64 -isysroot /usr/lib/apple/SDKs/MacOSX10.6.sdk -msse2 -o jsprf.o -c  -fvisibility=hidden -DNO_NSPR_10_SUPPORT -DEXPORT_JS_API -DJS_HAS_CTYPES -DDLL_PREFIX=\"lib\" -DDLL_SUFFIX=\".dylib\" -DUSE_ZLIB -Ictypes/libffi/include -I.  -I/home/firefox64/Downloads/mozilla-esr24/js/src/../../mfbt/double-conversion -I/home/firefox64/Downloads/mozilla-esr24/js/src/../../intl/icu/source/common -I/home/firefox64/Downloads/mozilla-esr24/js/src/../../intl/icu/source/i18n  -I/home/firefox64/Downloads/mozilla-esr24/js/src -I. -I./../../dist/include  -I/home/firefox64/Downloads/mozilla-esr24/obj-macos/dist/include/nspr      -I/home/firefox64/Downloads/mozilla-esr24/js/src -I/home/firefox64/Downloads/mozilla-esr24/js/src/assembler -I/home/firefox64/Downloads/mozilla-esr24/js/src/yarr  -fPIC -Qunused-arguments  -Qunused-arguments -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Werror=conversion-null -Wsign-compare -Wno-invalid-offsetof -Wno-c++0x-extensions -Wno-extended-offsetof -Wno-unknown-warning-option -Wno-return-type-c-linkage -Wno-mismatched-tags -fno-common -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -pthread -pipe  -DNDEBUG -DTRIMMED -g -O3 -fno-stack-protector -fomit-frame-pointer -DUSE_SYSTEM_MALLOC=1 -DENABLE_ASSEMBLER=1 -DENABLE_JIT=1  -Qunused-arguments  -DMOZILLA_CLIENT -include ./js-confdefs.h -MD -MP -MF .deps/jsprf.o.pp  /home/firefox64/Downloads/mozilla-esr24/js/src/jsprf.cpp
jspropertytree.cpp
/home/firefox64/Downloads/mozilla-esr24/js/src/jsprf.cpp:611:9: error: array type 'va_list' (aka '__builtin_va_list') is not assignable
        VARARGS_ASSIGN(nas[cn].ap, ap);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/firefox64/Downloads/mozilla-esr24/js/src/jsprf.cpp:33:47: note: expanded from macro 'VARARGS_ASSIGN'
#define VARARGS_ASSIGN(foo, bar)        (foo) = (bar)
                                        ~~~~~ ^

2) The 64bit toolhcain is slow as hell compared to the 32bit one. While the latter needed ca. 2 hours to come to the stage where the libxul is getting linked the former needs about 12 hours. Is that normal? Do you experience similar differences? Can we do something about it? No VM was involved, it was the same machine with 4 cores and 4GB of RAM just once with a 32bit Ubuntu 12.04 and once with a 64bit Ubuntu 12.04.

Changed 6 years ago by gk

Attachment: Mac10_6_crash added

comment:56 Changed 6 years ago by mingwandroid

I didn't notice the 64 bit python compile being especially slow when I tested it but then again python compiles very quickly anyway so it could have taken a lot longer than a 32 bit compiler. Can you use strace or some thing like that to figure out which part of the compilation is slow? I.e. clang or the assembler or the linker?

I am out of action at the moment unfortunately due to another cold and also the arrival of an SSD drive causing me to reinstall several OSes.

The crash looks like a NULL pointer access to me. Is it possible to enable more complete diagnostic output in Firefox?

I will try to get everything up an running ASAP to check into these issues.

Changed 6 years ago by gk

Attachment: sqliteTracing32 added

Changed 6 years ago by gk

Attachment: sqliteTracing64 added

comment:57 in reply to:  56 Changed 6 years ago by gk

Replying to mingwandroid:

I didn't notice the 64 bit python compile being especially slow when I tested it but then again python compiles very quickly anyway so it could have taken a lot longer than a 32 bit compiler. Can you use strace or some thing like that to figure out which part of the compilation is slow? I.e. clang or the assembler or the linker?

I attached the strace output I got while compiling sqlite.c (I added the time it took + the actually executed command (in mozilla-esr24/db/sqlite/src) above the strace output) on (non-VM) Ubuntu 12.04 (once 32bit, once 64bit). There are differences (mmap vs. mmap2 e.g.) but I am not sure how to map those to the latency of the 64bit build.

I am out of action at the moment unfortunately due to another cold and also the arrival of an SSD drive causing me to reinstall several OSes.

No problem :)

The crash looks like a NULL pointer access to me. Is it possible to enable more complete diagnostic output in Firefox?

Not without compiling it with debug symbols and without optimization. If I don't get the 64bit build running today I'll start a debug Firefox build in the afternoon to have the results tomorrow morning...

I will try to get everything up an running ASAP to check into these issues.

Thanks, that's really appreciated.

comment:58 Changed 6 years ago by gk

Parent ID: #9827#10103

comment:59 in reply to:  55 ; Changed 6 years ago by gk

Replying to gk:

1) The 64bit toolchain is not capable compiling a 64bit version of Firefox yet. There is this error showing up:

After looking at various debug data I think that is a bug on Mozilla's side. I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=934981. I worked around it and so far the build is going fine...

comment:60 Changed 6 years ago by mingwandroid

Could you add -t to the strace invocations? That will timestamp each part and we can zero in on the slow bit that way, sorry I should have mentioned this bit previously.

Also, if you still have the .config files from the crosstool build folders that could prove insightful too.

comment:61 in reply to:  59 Changed 6 years ago by gk

Replying to gk:

Replying to gk:

1) The 64bit toolchain is not capable compiling a 64bit version of Firefox yet. There is this error showing up:

After looking at various debug data I think that is a bug on Mozilla's side. I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=934981. I worked around it and so far the build is going fine...

we are probably in luck here, long term as well, as Mozilla plans the same: cross-compiling all the Mac things on Linux: https://bugzilla.mozilla.org/show_bug.cgi?id=921040 (#934981 got already fixed on trunk due to that).
Ray: I just skimmed through this bug but it seems they use no special toolchain. Maybe you can convince them to change that :)

Changed 6 years ago by gk

Attachment: config32bit added

Changed 6 years ago by gk

Attachment: config64bit added

Changed 6 years ago by gk

Attachment: sqliteTracingTiming32 added

Changed 6 years ago by gk

Attachment: sqliteTracingTiming64 added

comment:62 in reply to:  60 Changed 6 years ago by gk

Replying to mingwandroid:

Could you add -t to the strace invocations? That will timestamp each part and we can zero in on the slow bit that way, sorry I should have mentioned this bit previously.

Also, if you still have the .config files from the crosstool build folders that could prove insightful too.

Added the four files.

comment:63 Changed 6 years ago by mingwandroid

sqlite3.c is > 140k lines of code, I expect that would take an awfully long time to compile. Do people really program this way? I hope some script made this file as an ad-hoc full-program-optimisation replacement! Anyway, that doesn't explain the 12 hours or why it's so much slower when the host tools are 64bit.

In your configs you have:

CT_DEBUGGABLE_TOOLCHAIN=y

I recommend not having this for your final builds (they are built at -O0 -ggdb). If we need to debug a specific problem then toolchains built that way will come in useful, so it's a good idea to back them up if you can.

This could explain some of why you've got caffeine shakes, your flat is so clean and you've caught up on your entire reading list. It still doesn't explain the 6x-slower-for-64bit-host though, but I'd be keen to see how long it takes you with toolchains built with CT_DEBUGGABLE_TOOLCHAIN=n - FWIW, I noticed that mine were also DEBUGGABLE, so I am now rebuilding them; I did manage to get ESR24 to commence compiling though, so that's positive.

I've got a script (with some patches and config files) for Ubuntu that installs all dependencies downloads source tarballs and builds everything from scratch for crosstool-ng and ESR24, and once that works correctly I will post it to the Mozilla bug report you found. I wonder if you or Mike can put in a good word for our crosstool-ng fork in the meantime? Sounds to me like they are duplicating a lot of effort, more than they realize probably. Personally, I'd like crosstool-ng to become the go-to project for all serious cross compilation tasks (it's traditionally been focussed on embedded and Linux) ..

I will attach an archive of the WIP script/configs/patches to this ticket actually.

Changed 6 years ago by mingwandroid

Ray's WIP build sript/configs/patches for building crosstool-ng and ESR24

comment:64 Changed 6 years ago by gk

So, the 64bit build is crashing in the same way :( It seems we need to enable debugging symbols etc. In case you are faster you need to add the following to your .mozconfig file and rebuild everything:

ac_add_options --disable-install-strip
ac_add_options --enable-debug-symbols
ac_add_options --disable-optimize
ac_add_options --enable-debug

--enable-optimize and --disable-debug have to get commented out.

If I dare to build Firefox that way I probably go to my neighbor for some books :) (good shot in comment 63!). But I do it in the afternoon to have the build hopefully tomorrow if I don't come up with a better idea.

(But maybe your build I should test makes all this moot and the issue lies within the toolchain I built...)

comment:65 Changed 6 years ago by mingwandroid

Builds using 64bit host (and 32bit target still) crash in the same place as Mac10_6_crash, so it's good that we're consistent.

From Mac10_6_crash:

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   XUL 0x0b4318b7 JSAutoCompartment::JSAutoCompartment(JSContext*, JSObject*) + 23
1   ???                                 0x00167200 0 + 1470976
2   ???                                 0x00167200 0 + 1470976

.. enabling a debug build we get a bit more info:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   XUL 0x0527d61d js::EncapsulatedPtr<js::Shape, unsigned long>::operator js::Shape*() const + 13
1   XUL 0x0527cf07 js::ObjectImpl::lastProperty() const + 39
2   XUL 0x05278127 js::ObjectImpl::compartment() const + 23
3   XUL 0x05252d11 JSAutoCompartment::JSAutoCompartment(JSContext*, JSObject*) + 81
4   XUL 0x05252cb1 JSAutoCompartment::JSAutoCompartment(JSContext*, JSObject*) + 49
5   XUL 0x02f65e93 _ZL24NativeInterface2JSObjectN2JS6HandleIP8JSObjectEEP11nsISupportsP14nsWrapperCachePK4nsIDbPNS_5ValueEPP26nsIXPConnectJSObjectHolder + 211
6   XUL 0x02f6627e nsXPConnect::WrapNativeToJSVal(JSContext*, JSObject*, nsISupports*, nsWrapperCache*, nsID const*, bool, JS::Value*, nsIXPConnectJSObjectHolder**) + 574
7   XUL 0x02f8c143 xpc::WrapperFactory::PrepareForWrapping(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, unsigned int) + 4051
8   XUL 0x052e20bc JSCompartment::wrap(JSContext*, JS::MutableHandle<JS::Value>, JS::Handle<JSObject*>) + 2092
9   XUL 0x052d6b3f JSContext::wrapPendingException() + 223
10  XUL 0x05278109 JSContext::enterCompartment(JSCompartment*) + 89
11  XUL 0x05290e24 js::AutoCompartment::AutoCompartment(JSContext*, JSCompartment*) + 68
12  XUL 0x0527aec1 js::AutoCompartment::AutoCompartment(JSContext*, JSCompartment*) + 49
13  XUL 0x0525a660 JS_NewGlobalObject(JSContext*, JSClass*, JSPrincipals*, JS::CompartmentOptions const&) + 544
14  XUL 0x02f64e2f xpc::CreateGlobalObject(JSContext*, JSClass*, nsIPrincipal*, JS::CompartmentOptions&) + 383
15  XUL 0x02f089c7 XPCWrappedNative::WrapNewGlobal(xpcObjectHelper&, nsIPrincipal*, bool, JS::CompartmentOptions&, XPCWrappedNative**) + 919
16  XUL 0x02f6563d nsXPConnect::InitClassesWithNewWrappedGlobal(JSContext*, nsISupports*, nsIPrincipal*, unsigned int, JS::CompartmentOptions&, nsIXPConnectJSObjectHolder**) + 781
17  XUL 0x02f73904 mozJSComponentLoader::PrepareObjectForLocation(JSCLContextHelper&, nsIFile*, nsIURI*, bool, bool*) + 1028
18  XUL 0x02f70ef2 mozJSComponentLoader::ObjectForLocation(nsIFile*, nsIURI*, JSObject**, char**, bool, JS::MutableHandle<JS::Value>) + 306
19  XUL 0x02f70332 mozJSComponentLoader::LoadModule(mozilla::FileLocation&) + 866
20  XUL 0x043d2777 nsComponentManagerImpl::KnownModule::Load() + 119
21  XUL 0x043d3360 nsFactoryEntry::GetFactory() + 144
22  XUL 0x043d3ef9 nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) + 313
23  XUL 0x043cf385 nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) + 1301
24  XUL 0x043300a3 CallGetService(char const*, nsID const&, void**) + 195
25  XUL 0x0433083b nsGetServiceByContractID::operator()(nsID const&, void**) const + 43
26  XUL 0x0432da79 nsCOMPtr_base::assign_from_gs_contractid(nsGetServiceByContractID, nsID const&) + 41
27  XUL 0x043490a9 nsCOMPtr<nsISupports>::nsCOMPtr(nsGetServiceByContractID) + 89
28  XUL 0x04348a40 nsCOMPtr<nsISupports>::nsCOMPtr(nsGetServiceByContractID) + 32
29  XUL 0x043cbf4d NS_CreateServicesFromCategory(char const*, nsISupports*, char const*) + 957
30  XUL 0x0102fe2f nsXREDirProvider::DoStartup() + 991
31  XUL 0x01019a2b XREMain::XRE_mainRun() + 3131
32  XUL 0x0101ae40 XREMain::XRE_main(int, char**, nsXREAppData const*) + 768
33  XUL 0x0101b2ac XRE_main + 108

.. but actually the best clue comes from the console output from a debug build:

WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /home/ray/tbb-work/mozilla-esr24.dbg/js/xpconnect/loader/mozJSComponentLoader.cpp, line 953
System JS : ERROR resource://gre/modules/osfile/osfile_shared_allthreads.jsm:30
                     NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import]
WARNING: Cannot create startup observer : service,@mozilla.org/datareporting/service;1: file /home/ray/tbb-work/mozilla-esr24.dbg/embedding/components/appstartup/src/nsAppStartupNotifier.cpp, line 81
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /home/ray/tbb-work/mozilla-esr24.dbg/js/xpconnect/loader/mozJSComponentLoader.cpp, line 953
System JS : ERROR resource://gre/modules/osfile/osfile_shared_allthreads.jsm:30
                     NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import]
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /home/ray/tbb-work/mozilla-esr24.dbg/js/xpconnect/loader/mozJSComponentLoader.cpp, line 953
###!!! ASSERTION: bad param: 'aScopeArg', file /home/ray/tbb-work/mozilla-esr24.dbg/js/xpconnect/src/nsXPConnect.cpp, line 613

unzipping omni.ja, and looking at modules/osfile/osfile_shared_allthreads.jsm:30 we have:

     if (typeof Components != "undefined") {
       const Cu = Components.utils;
       Cu.import("resource://gre/modules/ctypes.jsm");  // <---- This is line 30.
       Components.classes["@mozilla.org/net/osfileconstantsservice;1"].
         getService(Components.interfaces.nsIOSFileConstantsService).init();

       if (typeof exports.OS.Shared.DEBUG !== "undefined") {
         return; // Avoid reading and attaching an observer more than once.
       }

.. so my theory is that the crash is because of the following line in .mozconfig:

#
# Compiling libffi is currently broken. Although we should try with
# --enable-system-ffi and again as we now have the headers installed...
ac_add_options --disable-ctypes

.. now whether disabling ctypes is something that should work or not is something I've got no idea about, but I'm going to investigate what happens if I enable ctypes by removing this option.

comment:66 in reply to:  65 Changed 6 years ago by gk

Replying to mingwandroid:

# Compiling libffi is currently broken. Although we should try with
# --enable-system-ffi and again as we now have the headers installed...
ac_add_options --disable-ctypes
}}}
.. now whether disabling ctypes is something that should work or not is something I've got no idea about, but I'm going to investigate what happens if I enable ctypes by removing this option.

Good idea. As the comment above says the build is currently broken without it as libffi has some hiccups when getting cross-compiled. Disabling ctypes is one of my shortcuts to bypass that :) I might find some time tomorrow to fix it... Apart from that I am currently trying Mozilla's cross-compile setup. We'll see how far I get with it.

comment:67 in reply to:  65 Changed 6 years ago by gk

Replying to mingwandroid:

.. now whether disabling ctypes is something that should work or not is something I've got no idea about, but I'm going to investigate what happens if I enable ctypes by removing this option.

On a second thought, that should be no crash reason as the working Mozilla cross-compiled build has ctypes disabled as well...

Last edited 6 years ago by gk (previous) (diff)

comment:68 Changed 6 years ago by mingwandroid

Good idea. As the comment above says the build is currently broken without it as libffi has some hiccups when getting cross-compiled.

Initial investigation suggests this won't be too difficult to fix.

Apart from that I am currently trying Mozilla's cross-compile setup. We'll see how far I get with it.

:-(

On a second thought, that should be no crash reason as the working Mozilla cross-compiled build has ctypes disabled as well...

.. this suggests that the javascript (or something) has been re-worked since ESR24 to make things not crash without ctypes.

comment:69 Changed 6 years ago by mingwandroid

I fixed the ctypes build problem and Firefox launches correctly now (under brief testing it seems to work well)

Screen shot:
https://www.dropbox.com/s/98ly6o190rh96up/FirefoxDebug-ScreenShot.png

Script, configs and patches:
https://www.dropbox.com/s/ko48vxijmwjd2z2/crosstool-ng-and-ESR24-script-configs-patches-20131107.tar.xz
b137e6843fbd4a6350710a27a7e3db457e1a646a crosstool-ng-and-ESR24-script-configs-patches-20131107.tar.xz

Firefox builds:
https://www.dropbox.com/s/v6tsfz4u965kk7h/Firefox-darwin-i686.app-20131107-built-on-linux-gnu-x86_64.tar.bz2
2542d98cd7660f0138cd529dbd09b32fec9cd6c5 Firefox-darwin-i686.app-20131107-built-on-linux-gnu-x86_64.tar.bz2
https://www.dropbox.com/s/dmz7e1cs2htmyh2/FirefoxDebug-darwin-i686.app-20131107-built-on-linux-gnu-x86_64.tar.bz2
33aedc90378847b1e421494b995ccc2588e07ece FirefoxDebug-darwin-i686.app-20131107-built-on-linux-gnu-x86_64.tar.bz2

Toolchain I used:
https://www.dropbox.com/s/2f52qrwucs2dlzm/cross-target-x86_64-apple-darwin10-host-x86_64-linux.tar.xz
9ca9c5e9c3a0d990e924f775e3de0cf4897d930e cross-target-x86_64-apple-darwin10-host-x86_64-linux.tar.xz

  • Updated with shasums.
Last edited 6 years ago by mingwandroid (previous) (diff)

comment:70 Changed 6 years ago by gk

\o/ Thanks Ray! I was about to recommend https://bugzilla.mozilla.org/show_bug.cgi?id=932127 which does similar things to get ctypes going but I am glad you found it out earlier. And interesting that it was responsible for the crash at all as we had releases with ctypes disabled which worked fine... I am currently sick and hope to verify your results in the meantime at least. Then on Monday the transformation into gitian can get started...

comment:71 Changed 6 years ago by gk

After 15 hours of building I got something to test and enabling ctypes fixes it for me as well, nice. I was still skeptical about the ctypes fix and made a native build with ctypes disabled. It does not crash (which worries me slightly) but is not highly usable. So, it seems we can't avoid ctypes easily which is kind of bad news (but will be a different bug) as we like to disable it in the future due to security concerns.

Note to self: We should backport

https://bugzilla.mozilla.org/show_bug.cgi?id=931043
https://bugzilla.mozilla.org/show_bug.cgi?id=932127 (for proper ctypes support)
https://bugzilla.mozilla.org/show_bug.cgi?id=931053 (for handling the breakpad code sanely)
https://bugzilla.mozilla.org/show_bug.cgi?id=933071 (to avoid those link hacks to get libxul linked)

+ a yet to be written patch to handle otool without link hacks.

comment:72 Changed 6 years ago by mingwandroid

After 15 hours of building

We really need to get to the bottom of this! Please list your equivalent of the following details:

My setup:
Machine: Dell XPS L702x
Host OS: Windows 7 x64.
Host CPU: Intel Core i7-2820QM
Host Memory: 8GB

VirtualBox 4.3.2:
Guest OS: ubuntu-12.04.3-desktop-amd64
Guest Processor(s): 6 (so Windows is somewhat starved CPU-wise)
Guest Execution Cap: 100%
Guest Memory: 4GB

My toolchain:
https://www.dropbox.com/s/2f52qrwucs2dlzm/cross-target-x86_64-apple-darwin10-host-x86_64-linux.tar.xz

My build timing:
Building, to see log, tail -F /home/ray/tbb-work/mozilla-esr24.rel/build.log from another terminal

real 55m37.983s
user 192m5.304s
sys 12m17.456s

.. so I can build it in under 1 hour (actually, around 1 hour as configuring and packaging take a few minutes each).

So, it seems we can't avoid ctypes easily which is kind of bad news (but will be a different bug) as we like to disable it in the future due to security concerns.

Ok I did think that ctypes would add too much security risk, but I guess it's likely an fairly essential bridge to allow JavaScript to call native code for speed critical things? I can ask the Mozilla guys now that we've made contact.

https://bugzilla.mozilla.org/show_bug.cgi?id=933071 (to avoid those link hacks to get libxul linked)

I will follow this up with the patch as I have already backported and tested it. I might put my build scripts and patches up on github actually so you and Mozilla can easier track my progress?

Changed 6 years ago by mingwandroid

Backported patch by Nathan Froyd

comment:73 in reply to:  72 Changed 6 years ago by gk

Replying to mingwandroid:

After 15 hours of building

We really need to get to the bottom of this!

After some measurements it seems the problem was on layer 8. Taking your crosstool configs I get both 32bit and 64bit non-debug builds finished within 2 hours and the debug builds take both around 14 hours. What probably happened was that I built a debug version of the cross-compiler on my 32bit system, too but I did not use it but rather the non-debug version. I realized that the path in my .mozconfig file did not point to $HOME/x-tools where the debug version was but rather to an other where I probably copied the optimized version to. Problem solved, thanks.

So, it seems we can't avoid ctypes easily which is kind of bad news (but will be a different bug) as we like to disable it in the future due to security concerns.

Ok I did think that ctypes would add too much security risk, but I guess it's likely an fairly essential bridge to allow JavaScript to call native code for speed critical things? I can ask the Mozilla guys now that we've made contact.

Well, you could ask if it is expected that the Firefox built with the cross-compiler crashes every time on startup if we disable ctypes in the .mozcpnfig while this is not happening with a native built Firefox using the same compiler (+ same version + same .mozconfig) (noting that the latter is unusable though). That might be an interesting thing to know.

https://bugzilla.mozilla.org/show_bug.cgi?id=933071 (to avoid those link hacks to get libxul linked)

I will follow this up with the patch as I have already backported and tested it. I might put my build scripts and patches up on github actually so you and Mozilla can easier track my progress?

Thanks. Seems to be a good idea to me, yes.

comment:74 Changed 6 years ago by gk

Okay, the first 3.5 alpha bundle I made fails on start-up with:

Error: Error: Could not open system library: no libc
Source File: resource://gre/modules/osfile/osfile_unix_allthreads.jsm
Line: 55

and a lot of other (follow-up) errors rendering the browser unusable.

comment:75 Changed 6 years ago by gk

Here comes a test bundle for OS X users:
https://people.torproject.org/~mikeperry/builds/3.5-alpha-1pre/TorBrowserBundle-3.5-alpha-1-osx32_en-US.zip
sha256sum: fb67c1d1d47d8f3358231161fde27213920bf4659f228c9805c1e4b274a7145b

comment:76 Changed 6 years ago by mingwandroid

I tried it very briefly on a Mac, looks good.

Congratulations Georg!

Changed 5 years ago by gk

Attachment: 0003-breakpad.patch added

Changed 5 years ago by gk

Attachment: 0004-ctypes.patch added

Changed 5 years ago by gk

Attachment: 0005-otool.patch added

Changed 5 years ago by gk

Attachment: 0006-va-list.patch added

comment:77 Changed 5 years ago by gk

Status: newneeds_review

Okay, to get this going here some steps to keep in mind for the review process:

1) I suggest using Ray's cross-compiler he uploaded to Dropbox (see comment 72). I built my test bundle with it.
2) The descriptor patch (0002-*) assumes that the compiler archive is renamed to "x86_64-apple-darwin10.tar.xz".
3) Patch 0001-new* - 0006-* (0002-* excluded) needs to get merged into tor-browser.
4) It is not tested whether the compiler works with ESR 17 as-is.
5) In order to get the browser compiled, the patch in #10139 is needed as well.
6) If the second step of the Gitian build process is not failing with the new compiler then the work in this bug is done IMO. For the packaging step (which still contains some things to fix) we have #9858.

comment:78 Changed 5 years ago by mikeperry

Keywords: MikePerry201312R added

comment:79 Changed 5 years ago by mikeperry

Ok, I have merged the firefox patches to origin/tor-browser-24.1.1esr-1. I merged the descriptor update to mikeperry/ff24-staging, along with a couple other fixes I needed to get it to build. It is in the process of building right now, but it seems fine so far. I'll post another update after I try it out on my mac.

We still need to fetch and authenticate the new compiler, though. At this point, fetching the compiler binary from dropbox seems fine (if we can get a stable URI for it). After we get something working reliably, we can then work on building it from a gitian descriptor (preferably an independent one dedicated to just the build tools we need, a-la #10120).

comment:80 Changed 5 years ago by mingwandroid

Sounds good.

If you want me to put it up on my Google Code page then just let me know.

comment:81 Changed 5 years ago by mingwandroid

BTW, during the process of helping out on this I submitted some patches to libfaketime which the developers are acting on.

One was to correct the nanoseconds issue, however with that fixed, configure scripts deterministically report that the build system is not 'sane', as they (AFAIR) write a file then another, then check that the 2nd has a later timestamp than the 1st, considering it not sane otherwise.

I think there'll be other issues along the same lines to figure out workarounds for later.

comment:82 in reply to:  81 Changed 5 years ago by mikeperry

Cc: dcf added

Replying to mingwandroid:

BTW, during the process of helping out on this I submitted some patches to libfaketime which the developers are acting on.

One was to correct the nanoseconds issue, however with that fixed, configure scripts deterministically report that the build system is not 'sane', as they (AFAIR) write a file then another, then check that the 2nd has a later timestamp than the 1st, considering it not sane otherwise.

Awesome. Perhaps the answer here is to add another env var that it listens for, or add the ability to parse additional time fields past the seconds to the existing env var? That way we could change the env var to spoof nanoseconds after we run configure, but allow configure to proceed with only partially spoofed timestamps?

I think there'll be other issues along the same lines to figure out workarounds for later.

Probably. David Fifield has been running into sub-second timestamp issues with building python pluggable transport (traffic obfuscation for censored users) packages in #9444. It sounds like he's got a fix via other mechanisms, but he still might find these patches of some use too.

comment:83 Changed 5 years ago by mingwandroid

The pull request was:

https://github.com/wolfcw/libfaketime/pull/34

there was some discussion amongst the developers that you guys may have some opinions on given your experience with it.

I built a debian package of my fork of libfaketime, see
https://trac.torproject.org/projects/tor/ticket/9711#comment:53 and #comment:54

.. if you need some help with this stuff then I'll try to remember the details; I've been busy trying to finish and upstream crosstool-ng.

comment:84 Changed 5 years ago by gk

Resolution: fixed
Status: needs_reviewclosed

Fixed, thanks Ray. Building the compiler from source is handled in #9711.

Note: See TracTickets for help on using tickets.