When testing updates from Tor Browser 8.0a8 (ESR52) to the 8.0a9 release candidate (ESR60), Kathy and I encountered a failure when trying to update a Win64 browser that was using an obfs4 transport. The error was 19 (CERT_VERIFY_ERROR). We cannot reproduce this error on other platforms, and the mar file signature does in fact seem to be correct.
When testing with the same mar file and running updater.exe manually via a CMD shell, we saw the same error. However, after several attempts the update succeeded. It seems to fail about 4/5 of the time. Georg pointed out that this sounds a lot like a problem that was reported on tor-talk a earlier this year:
Note that this is a bug in our ESR52-based Tor Browser; we will need to determine if the bug still exists in an ESR60-based browser (and look for the cause).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
So far, after many attempts, we have not reproduced this bug in an ESR60-based build. However, since it seems to be timing related, maybe we are just getting lucky (that no failures occur).
Investigating the failures with the ESR52-based Tor Browser, we found that it is easy to reproduce in an optimized build. But we found that we cannot reproduce the problem after adding some logging to the libmar code. We are now going to be smarter about the logging, e.g., we will not execute any additional code (and especially we will not log anything) until after an error occurs within libmar's signature verification code.
Adding logging that occurred after the signature verification failed was helpful and allowed Kathy and me to pin down the cause of this bug in an ESR52-based Tor Browser (but so far not the root cause).
The reason the signature verification fails is because a byte from the MAR file that should be skipped when computing the SHA-512 hash is included in the stream of bytes that is provided as input to the NSS hashing code.
The reason the extra byte is included is because (it seems) there is a bug in the stdio functions that are used inside modules/libmar/verify/mar_verify.c to read the MAR file. The bug is that the file offset is incorrect (functions such as ftello() return a value that is off by one), and when the code inside mar_verify_signatures_for_fp() uses fseeko() to skip past the embedded signature, the resulting file position is wrong.
The wrong file position in turn causes one byte from the signature (which should have been skipped) to be used as input to the SHA-512 hashing code. Game over.
My Windows debugging skills are limited, especially when working with optimized builds. Ideally someone who has some experience with mingw-w64 internals would investigate further. I am not sure if this bug occurs with our ESR60-based Tor Browser builds; we could not reproduce it there but it is intermittent. The libmar code has not been modified much at all but maybe something has been fixed inside mingw-w64. Kathy and I implemented a workaround; I will attach the patch (which we can make available in a git repo if we decide to use it).
Adding logging that occurred after the signature verification failed was helpful and allowed Kathy and me to pin down the cause of this bug in an ESR52-based Tor Browser (but so far not the root cause).
The reason the signature verification fails is because a byte from the MAR file that should be skipped when computing the SHA-512 hash is included in the stream of bytes that is provided as input to the NSS hashing code.
The reason the extra byte is included is because (it seems) there is a bug in the stdio functions that are used inside modules/libmar/verify/mar_verify.c to read the MAR file. The bug is that the file offset is incorrect (functions such as ftello() return a value that is off by one), and when the code inside mar_verify_signatures_for_fp() uses fseeko() to skip past the embedded signature, the resulting file position is wrong.
The wrong file position in turn causes one byte from the signature (which should have been skipped) to be used as input to the SHA-512 hashing code. Game over.
My Windows debugging skills are limited, especially when working with optimized builds. Ideally someone who has some experience with mingw-w64 internals would investigate further. I am not sure if this bug occurs with our ESR60-based Tor Browser builds; we could not reproduce it there but it is intermittent. The libmar code has not been modified much at all but maybe something has been fixed inside mingw-w64.
Aewsome work! I think we could test the latter hypothesis by recompiling an ESR52-based Tor Browser with the mingw-w64 toolchain we use for ESR60, no? If so, let's do that.
Aewsome work! I think we could test the latter hypothesis by recompiling an ESR52-based Tor Browser with the mingw-w64 toolchain we use for ESR60, no? If so, let's do that.
That is a good idea. I tried to do it today but got an error during the firefox compile step. I started with the tbb-8.0a8-build1 tag and applied the patch from #25554 (moved) (4c1f78c312aa10ff54e5b07de4ef3278ab2ed01e). Here is a portion of the log; has anyone seen this kind of error before? Maybe I need other changes from the ESR60 era?
x86_64-w64-mingw32-widl: /var/tmp/build/mingw-w64-ee9fc3d0b8c8/mingw-w64-tools/widl/src/typetree.h:274: type_alias_get_aliasee: Assertion `type_is_alias(type)' failed.AbortedMakefile:109: recipe for target 'typelib_done' failedmake[5]: *** [typelib_done] Error 134make[5]: *** Waiting for unfinished jobs....
Aewsome work! I think we could test the latter hypothesis by recompiling an ESR52-based Tor Browser with the mingw-w64 toolchain we use for ESR60, no? If so, let's do that.
That is a good idea. I tried to do it today but got an error during the firefox compile step. I started with the tbb-8.0a8-build1 tag and applied the patch from #25554 (moved) (4c1f78c312aa10ff54e5b07de4ef3278ab2ed01e). Here is a portion of the log; has anyone seen this kind of error before? Maybe I need other changes from the ESR60 era?
{{{
x86_64-w64-mingw32-widl: /var/tmp/build/mingw-w64-ee9fc3d0b8c8/mingw-w64-tools/widl/src/typetree.h:274: type_alias_get_aliasee: Assertion `type_is_alias(type)' failed.
Aborted
Makefile:109: recipe for target 'typelib_done' failed
make[5]: *** [typelib_done] Error 134
make[5]: *** Waiting for unfinished jobs....
}}}
You need to compile with --disable-accessibility as there is a bug in newer mingw-w64 versions that is not figured out yet if accessibility is enabled.
You need to compile with --disable-accessibility as there is a bug in newer mingw-w64 versions that is not figured out yet if accessibility is enabled.
Thanks! Adding --disable-accessibility allowed me to build Tor Browser 8.0a8 with the newer mingw-w64 toolchain. The result is that this bug still occurs. At this point, I think it makes sense to try again to reproduce this problem in an ESR60-based Tor Browser, since I cannot think of a reason why we would not have the same problem there. When I tested before (see comment:6) I may have included extra logging which Kathy and I learned later can cause this bug to not occur.
I can reproduce the problem with my own ESR60-based build. I started with the 8.0a9 tag, changed app.update.url to point to a test server that I control, and replaced the udater's MAR signing certificates with one that Kathy and I generated. So — minimal changes.
Kathy and I think we should merge the workaround that we developed (see comment:7) until a proper fix inside mingw-w64 is available. If we agree on that approach, I will test the workaround in an ESR60-based build and make it available in a git repo.
Is there someone we can ask about the underlying mingw-w64 bug? I am concerned about spending too much time on this ticket at the expense of other tasks (which is not to say that working updates are not important).
I can reproduce the problem with my own ESR60-based build. I started with the 8.0a9 tag, changed app.update.url to point to a test server that I control, and replaced the udater's MAR signing certificates with one that Kathy and I generated. So — minimal changes.
Kathy and I think we should merge the workaround that we developed (see comment:7) until a proper fix inside mingw-w64 is available. If we agree on that approach, I will test the workaround in an ESR60-based build and make it available in a git repo.
Sounds good to me.
Is there someone we can ask about the underlying mingw-w64 bug? I am concerned about spending too much time on this ticket at the expense of other tasks (which is not to say that working updates are not important).
Yes, we should try to get some help. There are three venues I currently can think of. We can mail Jacek and ask, ask on the mingw-w64 mailing list, and ask folks on #mingw-w64 in OFTC's network. I'll start with asking Jacek and might ask around in #mingw-w64.
Could you provide your preprocessed mar_verify.c? Something like:
make -C obj_dir/modules/libmar/verify/ mar_verify.i
should do the trick.
I will do that once I figure out how to get a shell within the container for the "firefox" build.
boklm: can you tell me how to do that with rbm? I know I can get a shell after a build failure, but is there way to get a shell within the container without forcing a build failure, e.g., after a successful build of the firefox component or after the firefox configure step is done? I can probably figure it out, but I suspect you or gk my know how to do this already.
I don't know how to get a shell within the container during the build. What I usually do if I need to get at preprocessed files is compile with -save-temps and do a exit 1 after the compilation step making sure I get a debug shell.
I don't know how to get a shell within the container during the build. What I usually do if I need to get at preprocessed files is compile with -save-temps and do a exit 1 after the compilation step making sure I get a debug shell.
libmar uses in fact _ftelli64/_fseeki64 due to defines in mar_private.h. Those are not available on old msvcrt.dll, so mingw-w64 has its own implementation:
https://github.com/mirror/mingw-w64/blob/master/mingw-w64-crt/stdio/fseeko64.c#L133
I can't see anything obviously wrong there, but in general it does a lot of stuff that really should be done by MS crt in fact. An ideal solution would be to use ucrt-based toolchain (ucrtbase.dll always provides those functions) and disable mingw-w64 implementation for them. I may take a look at mingw-w64 part of it, but changing to ucrt toolchain is an invasive change for Tor build system.
As a less radical solution could be using fseek64 and ftello64. Those are much thinner wrappers in mingw-64 around msvcrt.dll. You could try that by changing defines in mar_private.h
As a less radical solution could be using fseek64 and ftello64. Those are much thinner wrappers in mingw-64 around msvcrt.dll. You could try that by changing defines in mar_private.h
We tested on an up-to-date Windows 10 system. 200 incremental updates and 200 complete updates all completed without any signature verification failures.
Trac: Status: new to needs_review Keywords: TorBrowserTeam201808 deleted, TorBrowserTeam201808R added
Great! Looks good to me. Applied to tor-browser-60.1.0esr-8.0-1 (commit c2720b2274e51eb76b91e5f6a5f2a82bdfaf8013).
I think it is okay to be conservative and apply the fix to 32bit Windows as well, not knowing where exactly the bug lies. We should test the 32bit case however, making sure we did not regress here.
Jacek: big thanks for your help!
Trac: Status: needs_review to closed Resolution: N/Ato fixed
I think it is okay to be conservative and apply the fix to 32bit Windows as well, not knowing where exactly the bug lies. We should test the 32bit case however, making sure we did not regress here.
Kathy and I tested using a 32-bit Windows build and did not find any problems. I think we are done here!
libmar uses in fact _ftelli64/_fseeki64 due to defines in mar_private.h. Those are not available on old msvcrt.dll, so mingw-w64 has its own implementation:
https://github.com/mirror/mingw-w64/blob/master/mingw-w64-crt/stdio/fseeko64.c#L133
I can't see anything obviously wrong there, but in general it does a lot of stuff that really should be done by MS crt in fact.
Especially, giving the fact that msvcr100.dll supports that, actually.