Opened 3 years ago

Last modified 12 months ago

#20025 new defect

document.characterSet enables fingerprinting of localization (only with HSTS?)

Reported by: dcf Owned by: tbb-team
Priority: Medium Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: tbb-fingerprinting
Cc: xfix Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

At comment:18:ticket:10703, xfix reports on another means of discovering the browser's fallback character encoding, the document.characterSet property (and possibly its aliases document.charset and document.inputEncoding). There is a demo site here:

https://hsivonen.com/test/moz/check-charset.htm

Using tor-browser-linux64-6.5a2_en-US.tar.xz, I get the output

Your fallback charset is: windows-1252

But using tor-browser-linux64-6.0.4_ko.tar.xz, I get the output

Your fallback charset is: EUC-KR

This is a separate issue from #10703. I'll leave a comment with a demo page that shows both techniques, with the one in #10703 giving the same result and document.characterSet giving different results.

The really strange thing is that this only seems to be effective when the server has HSTS (a valid Strict-Transport-Security header). I couldn't reproduce the result of the hsivonen.com demo site with a local web server, nor with an onion service, even when copying the demo and its header exactly. Only when I put it on an HTTPS server with HSTS could I reproduce it. I'll leave a comment with two demo pages allowing you to compare.

Child Tickets

Attachments (4)

en-us-with-hsts.png (8.9 KB) - added by dcf 3 years ago.
tor-browser-linux64-6.5a2_en-US.tar.xz on https://people.torproject.org/~dcf/tor20025/check-charset.html (has HSTS)
en-us-without-hsts.png (8.9 KB) - added by dcf 3 years ago.
tor-browser-linux64-6.5a2_en-US.tar.xz on https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html (no HSTS)
ko-with-hsts.png (14.1 KB) - added by dcf 3 years ago.
tor-browser-linux64-6.0.4_ko.tar.xz on https://people.torproject.org/~dcf/tor20025/check-charset.html (has HSTS)
ko-without-hsts.png (8.9 KB) - added by dcf 3 years ago.
tor-browser-linux64-6.0.4_ko.tar.xz on https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html (no HSTS)

Download all attachments as: .zip

Change History (8)

Changed 3 years ago by dcf

Attachment: en-us-with-hsts.png added

tor-browser-linux64-6.5a2_en-US.tar.xz on https://people.torproject.org/~dcf/tor20025/check-charset.html (has HSTS)

Changed 3 years ago by dcf

Attachment: en-us-without-hsts.png added

tor-browser-linux64-6.5a2_en-US.tar.xz on https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html (no HSTS)

Changed 3 years ago by dcf

Attachment: ko-with-hsts.png added

tor-browser-linux64-6.0.4_ko.tar.xz on https://people.torproject.org/~dcf/tor20025/check-charset.html (has HSTS)

Changed 3 years ago by dcf

Attachment: ko-without-hsts.png added

tor-browser-linux64-6.0.4_ko.tar.xz on https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html (no HSTS)

comment:1 Changed 3 years ago by dcf

I set up a demo page on two servers, one with HSTS and one without. Only the one with HSTS shows a difference in document.characterSet. Note that neither of the servers specifies the encoding in the Content-Type header, so you get a warning in the browser console and the browser has to infer the encoding.

The technique from #10703 always finds iso-8859-1. (I think that technique has trouble distinguishing iso-8859-1 and windows-1252.)

with HSTS

HSTS demo page: https://people.torproject.org/~dcf/tor20025/check-charset.html

document.characterSet is windows-1252 for the en-US bundle and EUC-KR for the ko bundle.

en-US ko
tor-browser-linux64-6.5a2_en-US.tar.xz on https://people.torproject.org/~dcf/tor20025/check-charset.html (has HSTS) tor-browser-linux64-6.0.4_ko.tar.xz on https://people.torproject.org/~dcf/tor20025/check-charset.html (has HSTS)

without HSTS

non-HSTS demo page: https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html

document.characterSet is windows-1252 for both the en-US and ko bundles.

en-US ko
tor-browser-linux64-6.5a2_en-US.tar.xz on https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html (no HSTS) tor-browser-linux64-6.0.4_ko.tar.xz on https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html (no HSTS)

comment:2 Changed 3 years ago by dcf

I checked and the same HSTS weirdness happens with stock Firefox 45.3.0. To reproduce, go to Preferences → Content → Fonts & Colors → Advanced → Text Encoding for Legacy Content, and select Korean. Then the HSTS demo page https://people.torproject.org/~dcf/tor20025/check-charset.html will show EUC-KR for document.characterSet. The non-HSTS demo page https://people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html continues to show windows-1252.

Chromium 52.0.2743.116 doesn't appear to make a difference between HSTS and non-HSTS. Go to Settings → Web content → Customize fonts → Encoding and change to Korean. Both demo pages show EUC-KR.

comment:3 Changed 12 months ago by cypherpunks

Latest Tor Browser:
https://www.bamsoftware.com/people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html

Using ambiguous bytes (#10703) iso-8859-1
document.characterSet (#20025) UTF-8
document.charset UTF-8
document.inputEncoding UTF-8

Anyone got same result?
(Firefox 61 with resistFingerprint also have this value)

comment:4 in reply to:  3 Changed 12 months ago by dcf

Replying to cypherpunks:

Latest Tor Browser:
https://www.bamsoftware.com/people.eecs.berkeley.edu/~fifield/tor20025/check-charset.html

Using ambiguous bytes (#10703) iso-8859-1
document.characterSet (#20025) UTF-8
document.charset UTF-8
document.inputEncoding UTF-8

cypherpunks, please also try https://people.torproject.org/~dcf/tor20025/check-charset.html.

For me, with Tor Browser 8.0a8 en-US, I get:

https://people.torproject.org/~dcf/tor20025/check-charset.html

Using ambiguous bytes (#10703) iso-8859-1
document.characterSet (#20025) UTF-8
document.charset UTF-8
document.inputEncoding UTF-8

https://people.torproject.org/~dcf/tor20025/check-charset.html

Using ambiguous bytes (#10703) iso-8859-1
document.characterSet (#20025) windows-1252
document.charset windows-1252
document.inputEncoding windows-1252

I conjectured that the difference may be because of HSTS, but that appears not to be the case, because bamsoftware.com has HSTS.

Note: See TracTickets for help on using tickets.