Opened 7 years ago

Closed 7 years ago

#6539 closed defect (fixed)

Image cache isolation causes assert crash in debug builds (and other cases?)

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone:
Component: Firefox Patch Issues Version:
Severity: Keywords: MikePerry201211d tbb-linkability
Cc: gk Actual Points: 10 (mike: 4, pc: 6)
Parent ID: #6528 Points:
Reviewer: Sponsor:

Description

Turns out that the patch in #5742 crashes in debug builds at VerifyCacheSizes() in imgLoader.cpp:1613.

There might also be a crash on certain XUL dialog types that contain chrome images/icons. In non-debug builds I get a refcount issue when pointing my new Firefox build at an old profile, or when running it with -P -no-remote.. Could be the fact that I re-use the same nsIURI pointer for two arguments in the XUL icon codepath, or could be something else.

Child Tickets

Attachments (5)

stacktrace (3.4 KB) - added by gk 7 years ago.
debugbuild_failure (8.3 KB) - added by gk 7 years ago.
0021-Isolate-the-Image-Cache-per-url-bar-domain.patch (36.2 KB) - added by mikeperry 7 years ago.
Current image cache patch against FF14
stacktrace_startup (25.0 KB) - added by gk 7 years ago.
issue6539-patch.txt (7.3 KB) - added by mcs 7 years ago.
additional patch, with Error Console output

Download all attachments as: .zip

Change History (23)

Changed 7 years ago by gk

Attachment: stacktrace added

Changed 7 years ago by gk

Attachment: debugbuild_failure added

comment:1 Changed 7 years ago by gk

As I promised to test your patch and you did not tell me how your dev environment looked like I took a current Firefox stable and applied your patch. Yes, it crashed although I am not sure whether that is the same issue you encountered. I attached the stacktrace. Then I thought I'd like to look a bit deeper into the issue and started a debug build using --enable-debug and --disable-optimize + the mozilla-release source and, well, I got a build failure (attached as well) that is not occurring without your patch. Sounds like more than one issue...

Changed 7 years ago by mikeperry

Current image cache patch against FF14

comment:2 Changed 7 years ago by mikeperry

Actual Points: 23

gk: I can build with that patch against FF14 with ./configure --enable-debug --enable-debug-symbols --disable-crashreporter --disable-optimize.

FWIW, I also have the rest of the patches in https://gitweb.torproject.org/torbrowser.git/tree/maint-2.3:/src/current-patches/firefox/alpha applied as well, several of which fix build issues (but not the one I think I see in your logs).

comment:3 in reply to:  2 ; Changed 7 years ago by gk

Replying to mikeperry:

gk: I can build with that patch against FF14 with ./configure --enable-debug --enable-debug-symbols --disable-crashreporter --disable-optimize.

I get the same build failure.

FWIW, I also have the rest of the patches in https://gitweb.torproject.org/torbrowser.git/tree/maint-2.3:/src/current-patches/firefox/alpha applied as well, several of which fix build issues (but not the one I think I see in your logs).

Let me see if applying all the patches does the trick...

comment:4 in reply to:  3 Changed 7 years ago by gk

Replying to gk:

Replying to mikeperry:

gk: I can build with that patch against FF14 with ./configure --enable-debug --enable-debug-symbols --disable-crashreporter --disable-optimize.

I get the same build failure.

FWIW, I also have the rest of the patches in https://gitweb.torproject.org/torbrowser.git/tree/maint-2.3:/src/current-patches/firefox/alpha applied as well, several of which fix build issues (but not the one I think I see in your logs).

Let me see if applying all the patches does the trick...

No, I still get the same build failure. Note this does not happen if I only apply the patches found on https://gitweb.torproject.org/torbrowser.git/tree/maint-2.3:/src/current-patches/firefox/alpha. Thus, no better stacktrace from me atm and, alas, I currently have no time to look at the patch thoroughly either :-(

comment:5 Changed 7 years ago by mikeperry

Were you building off of the Firefox 15 betas? I just hit the error while rebasing for FF15. I fixed it in torbrowser.git/mikeperry/firefox15:
https://gitweb.torproject.org/mikeperry/torbrowser.git/tree/firefox15:/src/current-patches/firefox/alpha

However, I am still running into this build issue even with vanilla ff15:
http://forums.mozillazine.org/viewtopic.php?f=23&t=2509805

Did you hit that one by chance?

comment:6 in reply to:  5 ; Changed 7 years ago by gk

Replying to mikeperry:

Were you building off of the Firefox 15 betas?

No, I took the release version which was 14.0.1.

I fixed it in torbrowser.git/mikeperry/firefox15:
https://gitweb.torproject.org/mikeperry/torbrowser.git/tree/firefox15:/src/current-patches/firefox/alpha

Ah, good. I'll give it a try later today.

However, I am still running into this build issue even with vanilla ff15:
http://forums.mozillazine.org/viewtopic.php?f=23&t=2509805

Did you hit that one by chance?

Never. And I am quite regularly building nightlies (e.g. to keep the double-key cookie patch up-to-date) and other (pre-)release versions.

comment:7 in reply to:  6 Changed 7 years ago by gk

Replying to gk:

Replying to mikeperry:

Were you building off of the Firefox 15 betas?

No, I took the release version which was 14.0.1.

I fixed it in torbrowser.git/mikeperry/firefox15:
https://gitweb.torproject.org/mikeperry/torbrowser.git/tree/firefox15:/src/current-patches/firefox/alpha

Ah, good. I'll give it a try later today.

It compiles now. But I always get a segfault on start-up (see attachment). I built the test version on the latest and greatest Ubuntu (32Bit) using the FF current release and added only the patch in question.

Changed 7 years ago by gk

Attachment: stacktrace_startup added

comment:8 Changed 7 years ago by mikeperry

If I recall correctly the difference between crashes at startup for me was if the profile directory was for an old FF version that need updating or not. That is what makes me suspect that certain XUL-sourced images are one source of crashes (the images in the update dialog)...

comment:9 Changed 7 years ago by mikeperry

Keywords: MikePerry201209 added; MikePerry201208 removed

comment:10 Changed 7 years ago by mikeperry

Keywords: MikePerry201209d added; MikePerry201209 removed

comment:11 Changed 7 years ago by mikeperry

Keywords: MikePerry201211d tbb-linkability added; MikePerry201209d removed

comment:13 Changed 7 years ago by mcs

Kathy Brade and I did some debugging to try to pin down the cause of the image cache assertion failures (we see a 'ABORT: Proxyless entry's request has cache entry!' failure at line 1644 of imgLoader.cpp during browser startup). Our current theory is that the problem has something to do with the modified cache keys and the SetHasProxies() / SetHasNoProxies() methods. Unfortunately, we are out of time for today... but we will look at this more early next week and try to fix the problem.

comment:14 Changed 7 years ago by mcs

We fixed several issues but some remain. Here is what we fixed:

(1) Added NS_ADDREF() in imgLoader::FindEntryProperties() (accidentally removed?)
(2) Fixed imgLoader::SetHasProxies() to use the new key format.
(3) Modified imgLoader::GetCacheKey() to not use the new partitioned key format for chrome:// images.
(4) Added some assertions and NULL checks for safety.
(5) Renamed some parameters for clarity (e.g., key -> imgURI).

I will attach a patch that shows the above changes. Stability is much improved.

Change (3) is necessary because the firstPartyURI is NULL when loading images referenced by some internal pages such as the bookmarks or history sidebars. Unfortunately, we still see NULL firstPartyURI values for some internal image loads such as:

moz-anno:favicon:http://www.mozilla.org/media/img/firefox/favicon.png

(when referenced by the history sidebar).

So -- is the intent that firstPartyURI will always be non-NULL inside the image cache? If so, we need to determine why ThirdPartyUtil::GetFirstPartyURI() fails in some cases.

comment:15 in reply to:  14 Changed 7 years ago by mikeperry

Replying to mcs:

We fixed several issues but some remain. Here is what we fixed:

(1) Added NS_ADDREF() in imgLoader::FindEntryProperties() (accidentally removed?)

Possibly. This addref confused me. It appeared to me to be a reference leak because all callers of FindEntryProperties also perform an addref. In fact, I still don't understand how it's not a leak. Can you explain?

Though I suppose even if it is a leak, we should leave it in and simply report it to Mozilla.

(2) Fixed imgLoader::SetHasProxies() to use the new key format.
(3) Modified imgLoader::GetCacheKey() to not use the new partitioned key format for chrome:// images.
(4) Added some assertions and NULL checks for safety.
(5) Renamed some parameters for clarity (e.g., key -> imgURI).

I will attach a patch that shows the above changes. Stability is much improved.

Change (3) is necessary because the firstPartyURI is NULL when loading images referenced by some internal pages such as the bookmarks or history sidebars. Unfortunately, we still see NULL firstPartyURI values for some internal image loads such as:

moz-anno:favicon:http://www.mozilla.org/media/img/firefox/favicon.png

(when referenced by the history sidebar).

So -- is the intent that firstPartyURI will always be non-NULL inside the image cache? If so, we need to determine why ThirdPartyUtil::GetFirstPartyURI() fails in some cases.

Yes. Can we log such cases to the Error Console? I think normal PR_LOG/stdout logs are unlikely to be seen.

I think I am not concerned about the favicon case, as it does not seem like it should allow third party tracking, but if there are other weird cases that could be problematic.

comment:16 Changed 7 years ago by mikeperry

In case my reply to #3 wasn't clear, I think we should decide some special key to assign to NULL firstPartyURI schemes, so the patch is at least functional without crashing. We should then use the Error Console logging to determine how common this is in practice.

comment:17 Changed 7 years ago by mcs

The NS_ADDREF() is needed in imgLoader::FindEntryProperties() because callers use getter_AddRefs(). This may help explain:

https://developer.mozilla.org/en-US/docs/Introduction_to_XPCOM_for_the_DOM#QueryInterface.28.29

In a few minutes, we will post a revised patch that logs to the Error Console.

Changed 7 years ago by mcs

Attachment: issue6539-patch.txt added

additional patch, with Error Console output

comment:18 Changed 7 years ago by mikeperry

Actual Points: 310 (mike: 4, pc: 6)
Resolution: fixed
Status: newclosed

I've rebased this for FF17 and merged it with the previous image cache patch in mikeperry/firefox17-esr.

I think we want to deploy this with the FF17-ESR beta series only for now, since we'll want to watch for those error console messages.

Note: See TracTickets for help on using tickets.