Enhance KeyboardEvent fingerprinting protection for unusual characters

added TorBrowserTeam201711 component::applications/tor browser owner::sysrqb priority::medium severity::normal status::needs-revision tbb-fingerprinting type::enhancement labels

Trac:
Cc: N/A to gk

Trac:
Cc: gk to gk, arthuredelstein

Trac:
Cc: gk, arthuredelstein to gk, arthuredelstein, brade, mcs

Trac:
Severity: N/A to Blocker
Reviewer: N/A to N/A
Owner: tbb-team to sysrqb
Status: new to assigned
Sponsor: N/A to N/A

That's not what I wanted to do

Trac:
Severity: Blocker to Normal

Basically we are implementing a virtual customized keyboard layout. This layout does not contain Right-keys (location 2, keys on right side). It is a QWERTY keyboard based on the "English (US)" layout, therefore any non-English characters will be mapped onto US-centric keys when combined with a modifier. We'll need both shift and AltGr (as the combination of asserting ctrl and alt) for this, else we don't have enough combinations available.

The US-International keyboard layout [0] provides a nice base, so beginning with that we gain:

With AltGr:

¡ ² ³ ¤ € ¼ ½ ¾ ‘ ’ ¥ ×
 ä å é ® þ ü ú í ó ö « »
  á ß ð           ø ¶ ´ ¬
   æ   ©     ñ µ ç   ¿

With Shift-AltGr:

¹     £               ÷
 Ä Å É   Þ Ü Ú Í Ó Ö
  Á § Ð           Ø ° ¨ ¦
   Æ   ¢     Ñ   Ç

What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?

What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.

[0] https://en.wikipedia.org/wiki/AltGr_key#US-International [1] https://en.wikipedia.org/wiki/AZERTY

Trac:
Status: assigned to needs_information

Can we map any unrecognized keys to a single keycode? There will certainly be a long tail of characters that aren't explicitly included here, we should handle them gently.

Replying to sysrqb:

Basically we are implementing a virtual customized keyboard layout. This layout does not contain Right-keys (location 2, keys on right side). It is a QWERTY keyboard based on the "English (US)" layout, therefore any non-English characters will be mapped onto US-centric keys when combined with a modifier. We'll need both shift and AltGr (as the combination of asserting ctrl and alt) for this, else we don't have enough combinations available.

The US-International keyboard layout [0] provides a nice base, so beginning with that we gain:

With AltGr: {{{ ¡ ² ³ ¤ € ¼ ½ ¾ ‘ ’ ¥ × ä å é ® þ ü ú í ó ö « » á ß ð ø ¶ ´ ¬ æ © ñ µ ç ¿ }}}

With Shift-AltGr: {{{ ¹ £ ÷ Ä Å É Þ Ü Ú Í Ó Ö Á § Ð Ø ° ¨ ¦ Æ ¢ Ñ Ç }}}

Hi -- my thinking is, to minimize disruption to usability, we should try to spoof the physical key (KeyboardEvent.code and KeyboardEvent.keyCode) that is most commonly used (roughly) for a given character across different locales' physical keyboard layouts. So for example, I imagine we might want to use either the Spanish or French physical key for the ç character. (Unfortunately they are different, so we have to choose.)

And, I think likely it makes sense for more than one character to spoof the same physical key, or physical key combination. We're not trying to simulate any particular whole keyboard layout, but rather we want to spoof individual keys so they don't reveal the true keyboard.

What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?

I think so, yes. We could also consider Cyrillic characters (see Russian vs Serbian keyboard layouts), and maybe other kinds of characters, too. Although if that turns out to be too much for one ticket, I think it would be reasonable to open tickets for categories we don't want to cover here.

What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.

Currently we're not dropping most keys, to minimize the usability impact. If necessary, in some cases we could simply drop the .code and .keyCode members of KeyboardEvent without suppressing the event itself. But I tend to think we should just aim to gradually expand our range of spoofings. Note we do suppress KeyboardEvents for a few modifier keys because combination key presses can reveal a user's locale when they are typing special characters: See #17009 (moved) and patch at https://gitweb.torproject.org/tor-browser.git/patch/?id=2679132

Replying to arthuredelstein:

Replying to sysrqb: Hi -- my thinking is, to minimize disruption to usability, we should try to spoof the physical key (KeyboardEvent.code and KeyboardEvent.keyCode) that is most commonly used (roughly) for a given character across different locales' physical keyboard layouts. So for example, I imagine we might want to use either the Spanish or French physical key for the ç character. (Unfortunately they are different, so we have to choose.)

And, I think likely it makes sense for more than one character to spoof the same physical key, or physical key combination. We're not trying to simulate any particular whole keyboard layout, but rather we want to spoof individual keys so they don't reveal the true keyboard.

Thanks, okay, so instead of providing a single custom layout (say, based on the US-International keyboard), the result of this will basically overlay most of the existing layouts (QWERTY, QWERTZ, AZERTY, etc) and resolve any conflicting key locations such that there is a proper one-to-one mapping from key to location. I expect I'll choose the wrong keycode for some of them, but hopefully not too many.

What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?

I think so, yes. We could also consider Cyrillic characters (see Russian vs Serbian keyboard layouts), and maybe other kinds of characters, too. Although if that turns out to be too much for one ticket, I think it would be reasonable to open tickets for categories we don't want to cover here.

What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.

Currently we're not dropping most keys, to minimize the usability impact. If necessary, in some cases we could simply drop the .code and .keyCode members of KeyboardEvent without suppressing the event itself. But I tend to think we should just aim to gradually expand our range of spoofings. Note we do suppress KeyboardEvents for a few modifier keys because combination key presses can reveal a user's locale when they are typing special characters: See #17009 (moved) and patch at https://gitweb.torproject.org/tor-browser.git/patch/?id=2679132

Yes, I noticed the suppression both during my testing and in the current fingerprinting resistence code. I expect that'll require some tweaking with the additions we're adding here.

I'll do some more research on keyboard layouts and come back with a patch.

Trac:
unicode_keyboard_keys

I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts. This does not include Cyrillic characters (or other charsets), yet, although I agree that would be a great addition.

The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.

The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.

$ sort -t, -k 3 unicode_keyboard_keys | sed 's/, /,/g' | awk -F, '{ print $3", "$2", "$5; }' | sort | uniq -c | less

[0] https://en.wikipedia.org/wiki/QWERTY [1] https://en.wikipedia.org/wiki/QWERTZ [2] https://en.wikipedia.org/wiki/AZERTY [3] https://github.com/sysrqb/tor-browser/tree/bug16678_1

Trac:
Status: needs_information to needs_review

Trac:
Keywords: N/A deleted, TorBrowserTeam201709R added

https://github.com/sysrqb/tor-browser/tree/bug16678_1a now contains a commit that uses the constant values defined by nsIDOMKeyEvent (dom/interfaces/events/nsIDOMKeyEvent.idl) and replaces the existing magic numbers. Unfortunately, this touches nearly every line in dom/events/KeyCodeConsensus.h, so the code review will be a little tedious.

Replying to sysrqb:

I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts.

Thank you for the patch. I think this is a significant enhancement to our previous patch. I wrote some comments and suggested revisions on the github commit at https://github.com/sysrqb/tor-browser/commit/52b021674c6885d30e851557b14a8d70b5702a75#diff-8e201eb85e7d7abe2bb6b78e12c5081aR411

Additionally (though not necessarily for the deadline) I would suggest adding a comment for each key mentioning which keyboard layout each key came from. (All previous keys came from the US keyboard.) Once the annotations are added, it would be prudent to have another review to carefully check each of the mappings to make sure they are correct.

Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.

The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.

I guess we can't do anything about that confusion, correct? Do you think it would somewhat to block the key codes or match them for those doppelganger characters?

The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.

{{{ $ sort -t, -k 3 unicode_keyboard_keys | sed 's/, /,/g' | awk -F, '{ print $3", "$2", "$5; }' | sort | uniq -c | less }}}

That's an interesting shell one-liner. Could you post the instructions on what it does and how to reproduce it for future work? :)

Trac:
Status: needs_review to needs_revision
Keywords: TorBrowserTeam201709R deleted, TorBrowserTeam201709 added

Replying to arthuredelstein:

Replying to sysrqb:

I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts.

Thank you for the patch. I think this is a significant enhancement to our previous patch. I wrote some comments and suggested revisions on the github commit at https://github.com/sysrqb/tor-browser/commit/52b021674c6885d30e851557b14a8d70b5702a75#diff-8e201eb85e7d7abe2bb6b78e12c5081aR411

Thanks for the review! There are still a few outstanding questions I need to answer. I'll improve the branch as much as possible before the deadline.

Additionally (though not necessarily for the deadline) I would suggest adding a comment for each key mentioning which keyboard layout each key came from. (All previous keys came from the US keyboard.) Once the annotations are added, it would be prudent to have another review to carefully check each of the mappings to make sure they are correct.

Yes, agreed. I'll add these as I make changes, but there will likely be some that will need updating after the deadline.

Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.

Complicated. On Windows, it seems AltGr (AltRight) is converted into two distinct keydown KeyboardEvents, one for key='Alt' and keyCode='AltRight' and a second event with key='Control' and keyCode='CtrlLeft', while in Gnome I see key='AltGraph' and keyCode='AltRight'. The Wikipedia page [WIKIALTGR] says the Option key on a Mac keyboard is similar, but I don't have a Mac availbale for testing so unfortunately I don't know how that translates into KeyEvents in the browser when using different keyboard layouts.

[WIKIALTGR] https://en.wikipedia.org/wiki/AltGr_key

The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.

I guess we can't do anything about that confusion, correct? Do you think it would somewhat to block the key codes or match them for those doppelganger characters?

I think it's better if every character codepoint is mapped to a key per language/locale. I was considering mapping all doppelgangers to the same key, but after our discussion this seems like the wrong behavior. Every doppelganger is coupled with a specific set of languages, if we add an explicit mapping for a character then it should be relative to that language. So, in the case of 'Ð', we can map U+0110 with (letter D with stroke) respect to its location on a Slavic (or similar) keyboard, and we can map U+00D0 (letter Eth) to its location on the Norwegian or Finnish keyboard.

The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.

That's an interesting shell one-liner. Could you post the instructions on what it does and how to reproduce it for future work? :)

I'm adjusting the mappings so they are based on the locale that is most likely to use a key rather than mapping it based on the most common key used for that character across all the keyboard layouts. So, this one-liners, which counts the number of times a key is mapped to a specific location on the keyboard (used the file attached), will not be very useful in the end.

I'm thinking about where ¢ (U+00A2, cent sign) should be mapped. It's on the French-Canadian, Dutch and Brazilian keyboards, but I don't think any of them are necessary the most likely to use it. I was attempting to cover a breadth with this patch, but maybe quantity isn't quality and we may be better off letting some characters fall-through to a sane default rather than choose between a few minor options. This is also difficult because I'm basing decisions on what I see and read on Wikipedia and some additional small amount of research; natives of the respective locales would likely provide better data points for making a decision.

Okay, following up on the comment Arthur made [0], I think we can mitigate this by suppressing the keydown events on dead keys and track these keys as modifier keys. The current behavior when a dead key is pressed is an event is dispatched with key="Dead". In Firefox, the javascript keydown callback's event.code reflects the key pressed (ex. BracketLeft), and charCode=which=keyCode=location=0 and altKey=ctrlKey=metaKey=false. With this patch, Tor Browser sends key="Dead" and checks the hashmap for the proper code (of which there isn't a mapping, so it chooses the default). When the next character is pressed, Firefox and Tor Browser dispatch another event that contains the raw (unmodified) character that was pressed (ex. key='o'). It does not make the substitution. I believe we can use the functionality already available in the TextInputProcessor for tracking a dead key and dispatching an event with the modified character.

I think in the short term, it's safe to suppress keydown events dead keys. As with shift/alt/altgr this only filters dead keys from javascript keydown callbacks, I confirmed this does not affec t input in chrome fields or using dead keys on interactive javascript websites like etherpad.

[0] https://github.com/sysrqb/tor-browser/commit/52b021674c6885d30e851557b14a8d70b5702a75#commitcomment-24553008

As mentioned on the ML [0], I pushed a final branch [1] for this.

One commit suppressed the keydown event for Dead keys [2], and the other commit is mostly fixups and improves based on Arthur's feedback [3].

[0] https://lists.torproject.org/pipermail/tbb-dev/2017-September/000620.html [1] https://github.com/sysrqb/tor-browser/tree/bug16678_2 [2] https://github.com/sysrqb/tor-browser/commit/29d8c9ffec2340e64ad26a0dbc48315b47ac6028 [3] https://github.com/sysrqb/tor-browser/commit/962194b1151768ddc6d3beb30132d833c1a4a81f

Trac:
Status: needs_revision to needs_review

After I read Arthur's comment on the github branch[0] I realized there's a bug in the patch. I pushed a fixup commit that corrects it[1].

So it's in this ticket, the relevant part is:

Every call to KEY/SHIFT/ALTGR updates the mapping in all hashmaps, therefore
calling both ALTGR and SHIFT for the same key results in the ALTGR state being
overwritten. I'll leave these comments above and instead I'll create a fourth
macro for ALTGRSHIFT that correctly inserts the key into the hashmaps. I'll
then replace all occurrences of ALTGR()+SHIFT() with ALTGRSHIFT().

Commit 1e094349cd679d8592411d741512d71bc29185fc does this.

[0] https://github.com/sysrqb/tor-browser/commit/bug16678_2#commitcomment-24566167 [1] https://github.com/sysrqb/tor-browser/commit/1e094349cd679d8592411d741512d71bc29185fc

Trac:
Keywords: TorBrowserTeam201709 deleted, TorBrowserTeam201709R added

Moving reviews to October.

Trac:
Keywords: TorBrowserTeam201709R deleted, TorBrowserTeam201710R added

Replying to sysrqb:

After I read Arthur's comment on the github branch[0] I realized there's a bug in the patch. I pushed a fixup commit that corrects it[1].

So it's in this ticket, the relevant part is: {{{ Every call to KEY/SHIFT/ALTGR updates the mapping in all hashmaps, therefore calling both ALTGR and SHIFT for the same key results in the ALTGR state being overwritten. I'll leave these comments above and instead I'll create a fourth macro for ALTGRSHIFT that correctly inserts the key into the hashmaps. I'll then replace all occurrences of ALTGR()+SHIFT() with ALTGRSHIFT(). }}}

Commit 1e094349cd679d8592411d741512d71bc29185fc does this.

[0] https://github.com/sysrqb/tor-browser/commit/bug16678_2#commitcomment-24566167 [1] https://github.com/sysrqb/tor-browser/commit/1e094349cd679d8592411d741512d71bc29185fc

That's a good improvement; thanks.

Moving review to November

Trac:
Keywords: TorBrowserTeam201710R deleted, TorBrowserTeam201711R added

Kathy and I reviewed the bug16678_2 changes. We have not tried to test the code yet, and we have not tried to determine if all of the mappings are optimal. That said, we have a few comments:

Should we remove the "TODO needs more information" mappings? Or do we plan to do more research and include a mapping for them?
Typo in a comment: Italisn
Typo in a comment: FInnish
Do we have automated tests for our key event fingerprinting changes?

Trac:
Status: needs_review to needs_revision

Trac:
Keywords: TorBrowserTeam201711R deleted, TorBrowserTeam201711 added

Replying to sysrqb:

Replying to arthuredelstein:

[snip]

Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.

Complicated. On Windows, it seems AltGr (AltRight) is converted into two distinct keydown KeyboardEvents, one for key='Alt' and keyCode='AltRight' and a second event with key='Control' and keyCode='CtrlLeft', while in Gnome I see key='AltGraph' and keyCode='AltRight'. The Wikipedia page [WIKIALTGR] says the Option key on a Mac keyboard is similar, but I don't have a Mac availbale for testing so unfortunately I don't know how that translates into KeyEvents in the browser when using different keyboard layouts.

[WIKIALTGR] https://en.wikipedia.org/wiki/AltGr_key

FWIW: https://bugzilla.mozilla.org/show_bug.cgi?id=900750 (which landed in Firefox 63) might change things on Windows. We should revisit that when working on this bug again.

mentioned in issue #18780 (moved)

mentioned in issue #21390 (moved)

mentioned in issue #31591 (moved)

moved to tpo/applications/tor-browser#16678

mentioned in issue tpo/applications/tor-browser#18780 (closed)

Enhance KeyboardEvent fingerprinting protection for unusual characters

Child items 0

Activity