Opened 20 months ago

Closed 18 months ago

Last modified 18 months ago

#10703 closed defect (fixed)

Fallback charset enables fingerprinting of bundle localization

Reported by: dcf Owned by: mikeperry
Priority: normal Milestone:
Component: TorBrowserButton Version:
Keywords: tbb-fingerprinting, tbb-pref, MikePerry201402R Cc: gk
Actual Points: Parent ID:
Points:

Description

Torbutton has the spoof_english pref that changes the value of the Accept-Language header to en-us,en;q=0.5; this cloaks what particular localized bundle you may be using. But localized bundles still differ in their default (fallback) charset. By figuring out what characters a byte sequence decodes as, it's possible to find out what charset is in use.

The attack goes like this. The web server sends an HTML page with no declared charset, neither in the HTTP header (Content-Type) nor in the HTML (<meta charset=...>). The HTML contains one or more byte sequences that stand for different characters in different charsets. JavaScript in the HTML measures the size of the rendered characters. By including a few different byte sequences, it's probably possible to fingerprint all the possible TBB localizations.

It looks like our current bundles may come with any of 6 different default charsets:

I found these by grepping the langpacks' unpacked *.xpi files for "intl.charset.default".

As an example of how byte sequences can be variously decoded, here are decodings of "\xc3\xa3":

  • utf-8: ã
  • iso-8859-1: ã
  • iso-8859-2: ĂŁ
  • windows-1251: ГЈ
  • euc-kr: 찾
  • gbk: 茫

That is, an HTML page can contain the sequence "\xc3\xa3" and it will render as different characters depending on the charset in effect.

A possible solution is just to force intl.charset.default to UTF-8 in all localizations. Here are some Mozilla bugs I found that are relevant to setting this pref to UTF-8: 910165 406498 536506 910169.

Also see https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings#Specifying_the_fallback_encoding, which indicates that Firefox's behavior with respect to the fallback charset will change:

As of Firefox 28, this section is obsolete, since the preference intl.charset.default no longer exists. The mapping from locales onto fallback encodings is now built into Gecko itself.

In the best case, this could be interpreted to mean that the spoof_english setting will become sufficient, and the fallback will become as it would be for en-US. Or it might just mean that the preference is moved to somewhere inside Gecko. It seems the relevant bug is 910192: Get rid of intl.charset.default as a localizable pref and deduce the fallback....

Child Tickets

Attachments (7)

fonts-en_US.png (24.8 KB) - added by dcf 20 months ago.
en-US fonts dialog.
fonts-pl.png (29.1 KB) - added by dcf 20 months ago.
pl fonts dialog.
firefox-28.0b1-virgin.png (21.0 KB) - added by dcf 19 months ago.
firefox-28.0b1-intl.charset.fallback.override=utf-8.png (39.8 KB) - added by dcf 19 months ago.
firefox-28.0b1-utf-8.png (21.0 KB) - added by dcf 19 months ago.
firefox-28.0b1-intl.charset.fallback.override=iso-8859-1.png (40.1 KB) - added by dcf 19 months ago.
firefox-28.0b1-iso-8859-1.png (20.9 KB) - added by dcf 19 months ago.

Download all attachments as: .zip

Change History (23)

Changed 20 months ago by dcf

en-US fonts dialog.

Changed 20 months ago by dcf

pl fonts dialog.

comment:1 Changed 20 months ago by dcf

Here's where the fallback charset is exposed in the UI. Edit→Preferences→Content→Fonts & Colors→Advanced→Character Encoding for Legacy Content.

en-US ISO-8859-1:
en-US fonts dialog.

pl ISO-8859-2:
pl fonts dialog.

comment:2 Changed 20 months ago by dcf

  • Keywords tbb-fingerprinting added; tbb-fingerprints removed

comment:3 Changed 20 months ago by dcf

  • Status changed from new to needs_review

comment:4 in reply to: ↑ description ; follow-up: Changed 20 months ago by dcf

Replying to dcf:

JavaScript in the HTML measures the size of the rendered characters.

Actually it's even easier than measuring the size. You can look at the innerHTML and get a string with exactly the decoded characters in it. Here is what encodeURIComponent(elem.innerHTML) looks like for the example sequence "\xc3\a3":

  • utf-8: ã %C3%A3
  • iso-8859-1: ã %C3%83%C2%A3
  • iso-8859-2: ĂŁ %C4%82%C5%81
  • windows-1251: ГЈ %D0%93%D0%88
  • euc-kr: 찾 %EC%B0%BE
  • gbk: 茫 %E8%8C%AB

comment:5 follow-up: Changed 20 months ago by gk

  • Cc gk added

What should get reviewed giving your description and the changed behavior in Fx 28?

comment:6 in reply to: ↑ 5 Changed 20 months ago by dcf

Replying to gk:

What should get reviewed giving your description and the changed behavior in Fx 28?

Just the idea of setting intl.charset.default=UTF-8.

That, and figuring out if we'll need to do anything special in the next Firefox.

I have a proof-of-concept implementation of the charset detection that I can send you if you want.

comment:7 Changed 20 months ago by mikeperry

  • Keywords tbb-pref MikePerry201401R added

comment:8 in reply to: ↑ 4 Changed 19 months ago by dcf

The ja (Japanese) bundle uses Shift JIS as a fallback encoding, making it also distinct.

  • utf-8: ã %C3%A3
  • iso-8859-1: ã %C3%83%C2%A3
  • iso-8859-2: ĂŁ %C4%82%C5%81
  • windows-1251: ГЈ %D0%93%D0%88
  • euc-kr: 찾 %EC%B0%BE
  • gbk: 茫 %E8%8C%AB
  • shift_jis: テ」 %EF%BE%83%EF%BD%A3

comment:9 Changed 19 months ago by dcf

There is a question of whether we should set the default to UTF-8 or Windows-1252. I would normally say, "just use UTF-8," but I read https://developer.mozilla.org/en-US/docs/Localizations_and_character_encodings#Specifying_the_fallback_encoding:

In order to avoid the problem of Web authors creating new UTF-8 content without declaring that the content uses UTF-8 and in order to maximize the ability of users to read content cross-locale, do not set the fallback encoding to UTF-8 for any newly-introduced localization.

and

When in doubt, use windows-1252 as the fallback encoding.

Still, my recommendation is to use UTF-8. Mozilla wants to encourage authors to declare their encodings, but that's not our goal. The effect of choosing one over the other is probably small anyway. The choice only affects web pages that do not declare their encoding, and UTF-8 and Windows-1252 are the same in the first 128 code points, I think.

comment:10 follow-up: Changed 19 months ago by mikeperry

dcf1: Apparently post FF28, the new pref is "intl.charset.fallback.override". Can you check real quick if that still works for your tests, and either update https://bugzilla.mozilla.org/show_bug.cgi?id=967981, or ping back here?

Just trying to save us a little panic around FF31esr if we merge this.

comment:11 Changed 19 months ago by mikeperry

  • Keywords MikePerry201402R added; MikePerry201401R removed

comment:12 in reply to: ↑ 10 Changed 19 months ago by dcf

Replying to mikeperry:

dcf1: Apparently post FF28, the new pref is "intl.charset.fallback.override". Can you check real quick if that still works for your tests, and either update https://bugzilla.mozilla.org/show_bug.cgi?id=967981, or ping back here?

Just trying to save us a little panic around FF31esr if we merge this.

It looks like intl.charset.fallback.override is the right preference, but it doesn't work to set it to utf-8 (it only works to set it to something else like iso-8859-1). More below.

I tried the ru Firefox 18 beta:
http://download-installer.cdn.mozilla.net/pub/firefox/releases/28.0b1/SHA512SUMS
http://download-installer.cdn.mozilla.net/pub/firefox/releases/28.0b1/SHA512SUMS.asc
http://download-installer.cdn.mozilla.net/pub/firefox/releases/28.0b1/linux-x86_64/ru/firefox-28.0b1.tar.bz2

Before changing anything, the detector finds the fallback as windows-1251, as expected for Russian.


I set intl.charset.fallback.override=utf-8:


There was no change in the detection:


I set intl.charset.fallback.override=iso-8859-1:


It caused the fallback to change to iso-8859-1. The same with iso-8859-2 and others.


I found some source code that says that utf-8 is specifically blacklisted from being set for this preference. That makes me think we should go with iso-8859-1 (or windows-1252 as recommended) for 24ESR. (Note windows-1251 is Russian and windows-1252 is English/European.)

http://dxr.mozilla.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#51

Changed 19 months ago by dcf

Changed 19 months ago by dcf

Changed 19 months ago by dcf

comment:13 follow-up: Changed 19 months ago by mikeperry

We can also remove the code that blacklists UTF-8, if you still think that is a better choice?

comment:14 in reply to: ↑ 13 Changed 19 months ago by dcf

Replying to mikeperry:

We can also remove the code that blacklists UTF-8, if you still think that is a better choice?

Let's do windows-1252 for all locales. I enhanced my detector script to distinguish iso-8859-1 and windows-1252, and it turns out that the en-US default in 24ESR and 28 beta is windows-1252. windows-1252 is a superset of iso-8859-1, it's what Mozilla recommends for a fallback, and it's what their FallbackEncoding::Get falls back to if it can't understand the locale.

comment:15 follow-up: Changed 18 months ago by mikeperry

  • Resolution set to fixed
  • Status changed from needs_review to closed

Ok, I set intl.charset.default to windows-1252 in our prefs override file. This should appear first in 3.6-beta, but maybe a nightly at https://people.torproject.org/~linus/builds/ will appear first.

comment:16 in reply to: ↑ 15 Changed 18 months ago by dcf

Replying to mikeperry:

Ok, I set intl.charset.default to windows-1252 in our prefs override file. This should appear first in 3.6-beta, but maybe a nightly at https://people.torproject.org/~linus/builds/ will appear first.

Thanks. It occurs to me that this is one of the things that support wants to be advised of, because it will potentially break the rendering of some pages.

Note: See TracTickets for help on using tickets.