#26320 closed defect (wontfix)

Orfox 52.8.0esr-7.5-1 Crash

Reported by: sysrqb Owned by: tbb-team
Priority: Immediate Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: tbb-mobile
Cc: igt0, gk Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The current Orfox build, based on 52.8.0esr, is crashing - seemingly due to an uplifted patch for the javascript runtime.

Our Upstream: https://bugzilla.mozilla.org/show_bug.cgi?id=1463741

Child Tickets

Change History (15)

comment:1 Changed 16 months ago by sysrqb

Component: - Select a componentApplications/Tor Browser
Owner: set to tbb-team

:(

comment:2 Changed 16 months ago by sysrqb

Keywords: tbb-mobile added
Priority: MediumImmediate
Status: newneeds_information

The crash is reproducible here

I think there may be two different crash scenarios we're seeing. The first:

F libc    : Fatal signal 11 (SIGSEGV), code 1, fault addr 0xfffffff0 in tid 15134 (JS Helper)
F MOZ_Assert: Assertion failure: [infer failure] Missing type in object [Object * 0x92f15310] _dirty: float, at
             /opt/Orfox/external/tor-browser/js/src/vm/TypeInference.cpp:256
F MOZ_CRASH: Hit MOZ_CRASH() at /opt/Orfox/external/tor-browser/js/src/vm/TypeInference.cpp:257

Appears in the javascript runtime. This has been the main focus.

However, I am not seeing that crash anymore (and now I have doubt the above webpage actually reproduced that crash. Below is the current crash I see - stack corruption.

adb| [23490] WARNING: Write failed (non-fatal): file /home/sysrqb/Orfox/external/tor-browser/xpcom/io/nsInputStreamTee.cpp, line 179
adb| void mozilla::AndroidBridge::HandleGeckoMessage(JSContext*, JS::HandleObject)
adb| No listeners for PrivateBrowsing:Data in dispatchEvent
[New Thread 23826]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 23826]
0xf41ca080 in ?? ()
(gdb) info stack
#0  0xf41ca080 in ?? ()
#1  0xf41cab42 in ?? ()
#2  0xf41cab42 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

And from adb:

F/libc    (13049): Fatal signal 11 (SIGSEGV), code 2, fault addr 0x74736e75 in tid 13071 (Gecko)

igt0, can you still reproduce the first crash?

comment:3 in reply to:  2 Changed 16 months ago by sysrqb

Replying to sysrqb:

However, I am not seeing that crash anymore (and now I have doubt the above webpage actually reproduced that crash. Below is the current crash I see - stack corruption.

[...]

And from adb:

F/libc    (13049): Fatal signal 11 (SIGSEGV), code 2, fault addr 0x74736e75 in tid 13071 (Gecko)

igt0, can you still reproduce the first crash?

Oh, right. So, this crash is reproducible from the last upstream 52.8.0ESR commit:

commit 0ce659a05fd3fefeead88d4ef0b35ec497c07415
Author: Lee Salzman <lsalzman@mozilla.com>
Date:   Sun Apr 29 20:10:51 2018 -0400

    Bug 1454692 - Backport some upstream Skia fixes to ESR52. r=rhunt, a=abillings
    
    --HG--
    extra : histedit_source : 0fcd64cabe6f54a2286083d6518e4e6451183a19%2C37f5e7f9dbbfc01102631c33b23329d2af5aa71b

and that crash results in:

F/libc    ( 6367): Fatal signal 11 (SIGSEGV), code 1, fault addr 0xfffffff0 in tid 6485 (JS Helper)

But I still have a corrupt stack.

With mozconfig:

$ cat .mozconfig
ac_add_options --enable-application=mobile/android
ac_add_options --target=arm-linux-androideabi
ac_add_options --with-android-ndk="$NDK_BASE" #Enter the android ndk location(ndk r10e)
ac_add_options --with-android-sdk="$SDK_BASE" #Enter the android sdk location

ac_add_options --disable-debug
ac_add_options --disable-debug-symbols

comment:4 Changed 16 months ago by igt0

I was still able to reproduce the first crash. And when I reverted the commit:

https://hg.mozilla.org/releases/mozilla-esr52/rev/14eab155eaa8

I don't see the crash anymore.

comment:5 Changed 16 months ago by igt0

The backtrace when it crashes:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 23125]
0x9393f4ba in ?? ()
Loading symbols... Done
(gdb) bt
#0  0x9393f4ba in js::jit::MacroAssemblerARM::ma_mov(js::jit::ImmGCPtr, js::jit::Register) ()
   from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#1  0x939469ee in void js::jit::MacroAssembler::storeUnboxedValue<js::jit::Address>(js::jit::ConstantOrRegister const&, js::jit::MIRType, js::jit::Address const&, js::jit::MIRType) () from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#2  0x938338e8 in js::jit::CodeGenerator::visitStoreSlotT(js::jit::LStoreSlotT*) ()
   from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#3  0x93847baa in js::jit::CodeGenerator::generateBody() ()
   from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#4  0x93847fde in js::jit::CodeGenerator::generate() () from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#5  0x9385ac22 in js::jit::GenerateCode(js::jit::MIRGenerator*, js::jit::LIRGraph*) ()
   from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#6  0x93a3f654 in js::HelperThread::handleIonWorkload(js::AutoLockHelperThreadState&) ()
   from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#7  0x93a4001a in js::HelperThread::threadLoop() () from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#8  0x93a4f88e in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) ()
   from /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/bin/libxul.so
#9  0xb6ccea04 in ?? ()
#10 0xb6ccea04 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

comment:6 Changed 16 months ago by igt0

And when I have the following flag in the .mozconfig file:

ac_add_options --disable-optimize

The code doesn't crash. So it is why I suspect of the toolchain.

comment:7 Changed 16 months ago by igt0

And when I compile it in debug mode, but with the optimize enabled. It crashes and bellow you can see the backtrace.

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 25652]
0x8657ceb4 in ?? ()
(gdb) bt
#0  0x8657ceb4 in ?? ()
#1  0x977f63d8 in MOZ_ReportAssertionFailure (aStr=<optimized out>, aFilename=<optimized out>, aLine=<optimized out>)
    at /opt/tor-browser/obj-tbb-arm-linux-androideabi/dist/include/mozilla/Assertions.h:164
#2  0x00000000 in ?? ()

.mozconfig:

ac_add_options --enable-debug-symbols
ac_add_options --enable-debug

comment:8 in reply to:  7 ; Changed 16 months ago by sysrqb

Replying to igt0:

ac_add_options --enable-debug-symbols
ac_add_options --enable-debug

Ugh, yeah, that's what I meant.

I wonder if we can identify which optimization causes this. I have these after running configure.

'MOZ_OPTIMIZE_FLAGS': '-freorder-blocks -fno-reorder-functions -Os',

May be we inject them one-by-one and see when it breaks.

comment:9 in reply to:  6 ; Changed 16 months ago by gk

Replying to igt0:

And when I have the following flag in the .mozconfig file:

ac_add_options --disable-optimize

The code doesn't crash. So it is why I suspect of the toolchain.

So, we have a workaround for this issue? Why don't we release a new Orfox ASAP with that one to pick up critical security fixes and then think about ways to track the issue down and whether it is worth our time at all given that the switch away from ESR 52 for Android is in about a month?

comment:10 in reply to:  9 Changed 16 months ago by sysrqb

Replying to gk:

Replying to igt0:

And when I have the following flag in the .mozconfig file:

ac_add_options --disable-optimize

The code doesn't crash. So it is why I suspect of the toolchain.

So, we have a workaround for this issue? Why don't we release a new Orfox ASAP with that one to pick up critical security fixes and then think about ways to track the issue down and whether it is worth our time at all given that the switch away from ESR 52 for Android is in about a month?

Good idea. igt0, do you have a non-optimized build available we can smoke-test and then release?

comment:11 Changed 16 months ago by sysrqb

OKay, we have a non-optimized build available. I tested it and it does not crash, but it is slow. Specifically Gecko's rendering is very slow, but releasing this:

  1. is important for patching some security holes
  2. is the last release before the first TBA-alpha
  3. will incentivize users switch to TBA because it'll be faster
    1. I suppose this is a side effect of using unmaintained/unsupported code

APK: https://people.torproject.org/~sysrqb/non_optimized_fennec-52.8.0.en-US.android-arm.apk

igt0, thanks for the apk. Let's test this for today, and if we don't find any more problems then can you give me an unsigned/unaligned build and we can give it to n8fr8 for signing and upload.

comment:12 Changed 16 months ago by sysrqb

I think we'll need to use the non-optimized build. The short explanation is fennec still crashes using -O1 (different from the default optimization -Os).

The slightly longer answer is I tried compiling with the -O1 optimization flag, but I experienced a compile-time errors, so one (or more) of the optimizations enabled by -Os enforces the inlining requirement:

 4:34.47 In file included from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/dist/system_wrappers/string:3:0,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/dist/stl_wrappers/string:44,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/ipc/chromium/src/base/platform_file.h:15,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/ipc/chromium/src/base/platform_file_posix.cc:7,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/ipc/chromium/Unified_cpp_ipc_chromium1.cpp:2:
 4:34.47 /home/sysrqb/.mozbuild/android-ndk-r11b//sources/cxx-stl/llvm-libc++/libcxx/include/string: In member function 'bool base::SharedMemory::FilenameForMemoryName(const wstring&, std::__ndk1::wstring*)':
 4:34.47 /home/sysrqb/.mozbuild/android-ndk-r11b//sources/cxx-stl/llvm-libc++/libcxx/include/string:700:35: error: inlining failed in call to always_inline 'static constexpr bool std::__ndk1::char_traits<wchar_t
>::eq(std::__ndk1::char_traits<wchar_t>::char_type, std::__ndk1::char_traits<wchar_t>::char_type) throw ()': indirect function call with a yet undetermined callee
 4:34.47      static _LIBCPP_CONSTEXPR bool eq(char_type __c1, char_type __c2) _NOEXCEPT
 4:34.47
[...]
 4:34.47 In file included from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/dist/system_wrappers/algorithm:3:0,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/dist/stl_wrappers/algorithm:44,
 4:34.47                  from /home/sysrqb/.mozbuild/android-ndk-r11b//sources/cxx-stl/llvm-libc++/libcxx/include/string:439,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/dist/system_wrappers/string:3,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/dist/stl_wrappers/string:44,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/ipc/chromium/src/base/platform_file.h:15,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/ipc/chromium/src/base/platform_file_posix.cc:7,
 4:34.47                  from /home/sysrqb/Orfox/external/tor-browser/obj-arm-linux-androideabi/ipc/chromium/Unified_cpp_ipc_chromium1.cpp:2:
 4:34.47 /home/sysrqb/.mozbuild/android-ndk-r11b//sources/cxx-stl/llvm-libc++/libcxx/include/algorithm:1050:13: error: called from here
 4:34.47              if (__pred(*__first1, *__j))

I added -findirect-inlining (the name was a good hint), and that solved the compile-time error - so it seems like Android NDK 11b requires this, at least. Unfortunately, the app still crashes on the test webpage.

comment:13 Changed 16 months ago by cypherpunks

-O0 for JIT only or disable it, no?

comment:14 in reply to:  8 Changed 14 months ago by Unpublished

Replying to sysrqb:

Replying to igt0:

ac_add_options --enable-debug-symbols
ac_add_options --enable-debug

Ugh, yeah, that's what I meant.

I wonder if we can identify which optimization causes this. I have these after running configure.

'MOZ_OPTIMIZE_FLAGS': '-freorder-blocks -fno-reorder-functions -Os',

May be we inject them one-by-one and see when it breaks.

I can't reproduce the crash anymore using

ac_add_options --enable-optimize="Os"

This overrides the MOZ_OPTIMIZE_FLAGS, so one of the following or the combination of both optimizations cause this crash:

-freorder-blocks -fno-reorder-functions

comment:15 Changed 14 months ago by gk

Resolution: wontfix
Status: needs_informationclosed

Closing this as WONTFIX as there is no planned Orfox/Tor Browser for Android release planned based on ESR 52.

Note: See TracTickets for help on using tickets.