Opened 4 years ago

Last modified 18 months ago

#16678 needs_revision enhancement

Enhance KeyboardEvent fingerprinting protection for unusual characters

Reported by: arthuredelstein Owned by: sysrqb
Priority: Medium Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: tbb-fingerprinting, TorBrowserTeam201711
Cc: gk, arthuredelstein, brade, mcs Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #15646, we introduced protection against KeyboardEvent-based fingerprinting of keyboard layout when characters normally found on a US-English keyboard are typed. Let's extend that protection to all characters one might expect from various Latin keyboards, such as German, French, Polish, etc.

Child Tickets

Attachments (1)

unicode_keyboard_keys (21.6 KB) - added by sysrqb 19 months ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 4 years ago by gk

Cc: gk added

comment:2 Changed 22 months ago by arthuredelstein

Cc: arthuredelstein added

comment:3 Changed 22 months ago by mcs

Cc: brade mcs added

comment:4 Changed 19 months ago by sysrqb

Owner: changed from tbb-team to sysrqb
Severity: Blocker
Status: newassigned

comment:5 Changed 19 months ago by sysrqb

Severity: BlockerNormal

That's not what I wanted to do

comment:6 Changed 19 months ago by sysrqb

Status: assignedneeds_information

Basically we are implementing a virtual customized keyboard layout. This layout does not contain Right-keys (location 2, keys on right side). It is a QWERTY keyboard based on the "English (US)" layout, therefore any non-English characters will be mapped onto US-centric keys when combined with a modifier. We'll need both shift and AltGr (as the combination of asserting ctrl and alt) for this, else we don't have enough combinations available.

The US-International keyboard layout [0] provides a nice base, so beginning with that we gain:

With AltGr:

¡ ² ³ ¤ € ¼ ½ ¾ ‘ ’ ¥ ×
 ä å é ® þ ü ú í ó ö « »
  á ß ð           ø ¶ ´ ¬
   æ   ©     ñ µ ç   ¿

With Shift-AltGr:

¹     £               ÷
 Ä Å É   Þ Ü Ú Í Ó Ö
  Á § Ð           Ø ° ¨ ¦
   Æ   ¢     Ñ   Ç

What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?

What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.

[0] https://en.wikipedia.org/wiki/AltGr_key#US-International
[1] https://en.wikipedia.org/wiki/AZERTY

comment:7 Changed 19 months ago by sysrqb

Can we map any unrecognized keys to a single keycode? There will certainly be a long tail of characters that aren't explicitly included here, we should handle them gently.

comment:8 in reply to:  6 ; Changed 19 months ago by arthuredelstein

Replying to sysrqb:

Basically we are implementing a virtual customized keyboard layout. This layout does not contain Right-keys (location 2, keys on right side). It is a QWERTY keyboard based on the "English (US)" layout, therefore any non-English characters will be mapped onto US-centric keys when combined with a modifier. We'll need both shift and AltGr (as the combination of asserting ctrl and alt) for this, else we don't have enough combinations available.

The US-International keyboard layout [0] provides a nice base, so beginning with that we gain:

With AltGr:

¡ ² ³ ¤ € ¼ ½ ¾ ‘ ’ ¥ ×
 ä å é ® þ ü ú í ó ö « »
  á ß ð           ø ¶ ´ ¬
   æ   ©     ñ µ ç   ¿

With Shift-AltGr:

¹     £               ÷
 Ä Å É   Þ Ü Ú Í Ó Ö
  Á § Ð           Ø ° ¨ ¦
   Æ   ¢     Ñ   Ç

Hi -- my thinking is, to minimize disruption to usability, we should try to spoof the physical key (KeyboardEvent.code and KeyboardEvent.keyCode) that is most commonly used (roughly) for a given character across different locales' physical keyboard layouts. So for example, I imagine we might want to use either the Spanish or French physical key for the ç character. (Unfortunately they are different, so we have to choose.)

And, I think likely it makes sense for more than one character to spoof the same physical key, or physical key combination. We're not trying to simulate any particular whole keyboard layout, but rather we want to spoof individual keys so they don't reveal the true keyboard.

What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?

I think so, yes. We could also consider Cyrillic characters (see Russian vs Serbian keyboard layouts), and maybe other kinds of characters, too. Although if that turns out to be too much for one ticket, I think it would be reasonable to open tickets for categories we don't want to cover here.

What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.

Currently we're not dropping most keys, to minimize the usability impact. If necessary, in some cases we could simply drop the .code and .keyCode members of KeyboardEvent without suppressing the event itself. But I tend to think we should just aim to gradually expand our range of spoofings. Note we do suppress KeyboardEvents for a few modifier keys because combination key presses can reveal a user's locale when they are typing special characters:
See #17009 and patch at https://gitweb.torproject.org/tor-browser.git/patch/?id=2679132

comment:9 in reply to:  8 Changed 19 months ago by sysrqb

Replying to arthuredelstein:

Replying to sysrqb:
Hi -- my thinking is, to minimize disruption to usability, we should try to spoof the physical key (KeyboardEvent.code and KeyboardEvent.keyCode) that is most commonly used (roughly) for a given character across different locales' physical keyboard layouts. So for example, I imagine we might want to use either the Spanish or French physical key for the ç character. (Unfortunately they are different, so we have to choose.)

And, I think likely it makes sense for more than one character to spoof the same physical key, or physical key combination. We're not trying to simulate any particular whole keyboard layout, but rather we want to spoof individual keys so they don't reveal the true keyboard.

Thanks, okay, so instead of providing a single custom layout (say, based on the US-International keyboard), the result of this will basically overlay most of the existing layouts (QWERTY, QWERTZ, AZERTY, etc) and resolve any conflicting key locations such that there is a proper one-to-one mapping from key to location. I expect I'll choose the wrong keycode for some of them, but hopefully not too many.

What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?

I think so, yes. We could also consider Cyrillic characters (see Russian vs Serbian keyboard layouts), and maybe other kinds of characters, too. Although if that turns out to be too much for one ticket, I think it would be reasonable to open tickets for categories we don't want to cover here.

What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.

Currently we're not dropping most keys, to minimize the usability impact. If necessary, in some cases we could simply drop the .code and .keyCode members of KeyboardEvent without suppressing the event itself. But I tend to think we should just aim to gradually expand our range of spoofings. Note we do suppress KeyboardEvents for a few modifier keys because combination key presses can reveal a user's locale when they are typing special characters:
See #17009 and patch at https://gitweb.torproject.org/tor-browser.git/patch/?id=2679132

Yes, I noticed the suppression both during my testing and in the current fingerprinting resistence code. I expect that'll require some tweaking with the additions we're adding here.

I'll do some more research on keyboard layouts and come back with a patch.

Last edited 19 months ago by sysrqb (previous) (diff)

Changed 19 months ago by sysrqb

Attachment: unicode_keyboard_keys added

comment:10 Changed 19 months ago by sysrqb

Status: needs_informationneeds_review

I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts. This does not include Cyrillic characters (or other charsets), yet, although I agree that would be a great addition.

The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.

The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.

$ sort -t, -k 3 unicode_keyboard_keys | sed 's/, /,/g' | awk -F, '{ print $3", "$2", "$5; }' | sort | uniq -c | less

[0] https://en.wikipedia.org/wiki/QWERTY
[1] https://en.wikipedia.org/wiki/QWERTZ
[2] https://en.wikipedia.org/wiki/AZERTY
[3] https://github.com/sysrqb/tor-browser/tree/bug16678_1

comment:11 Changed 19 months ago by gk

Keywords: TorBrowserTeam201709R added

comment:12 Changed 19 months ago by sysrqb

https://github.com/sysrqb/tor-browser/tree/bug16678_1a now contains a commit that uses the constant values defined by nsIDOMKeyEvent (dom/interfaces/events/nsIDOMKeyEvent.idl) and replaces the existing magic numbers. Unfortunately, this touches nearly every line in dom/events/KeyCodeConsensus.h, so the code review will be a little tedious.

comment:13 in reply to:  10 ; Changed 19 months ago by arthuredelstein

Keywords: TorBrowserTeam201709 added; TorBrowserTeam201709R removed
Status: needs_reviewneeds_revision

Replying to sysrqb:

I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts.

Thank you for the patch. I think this is a significant enhancement to our previous patch. I wrote some comments and suggested revisions on the github commit at
https://github.com/sysrqb/tor-browser/commit/52b021674c6885d30e851557b14a8d70b5702a75#diff-8e201eb85e7d7abe2bb6b78e12c5081aR411

Additionally (though not necessarily for the deadline) I would suggest adding a comment for each key mentioning which keyboard layout each key came from. (All previous keys came from the US keyboard.) Once the annotations are added, it would be prudent to have another review to carefully check each of the mappings to make sure they are correct.

Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.

The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.

I guess we can't do anything about that confusion, correct? Do you think it would somewhat to block the key codes or match them for those doppelganger characters?

The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.

$ sort -t, -k 3 unicode_keyboard_keys | sed 's/, /,/g' | awk -F, '{ print $3", "$2", "$5; }' | sort | uniq -c | less

That's an interesting shell one-liner. Could you post the instructions on what it does and how to reproduce it for future work? :)

comment:14 in reply to:  13 Changed 19 months ago by sysrqb

Replying to arthuredelstein:

Replying to sysrqb:

I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts.

Thank you for the patch. I think this is a significant enhancement to our previous patch. I wrote some comments and suggested revisions on the github commit at
https://github.com/sysrqb/tor-browser/commit/52b021674c6885d30e851557b14a8d70b5702a75#diff-8e201eb85e7d7abe2bb6b78e12c5081aR411

Thanks for the review! There are still a few outstanding questions I need to answer. I'll improve the branch as much as possible before the deadline.

Additionally (though not necessarily for the deadline) I would suggest adding a comment for each key mentioning which keyboard layout each key came from. (All previous keys came from the US keyboard.) Once the annotations are added, it would be prudent to have another review to carefully check each of the mappings to make sure they are correct.

Yes, agreed. I'll add these as I make changes, but there will likely be some that will need updating after the deadline.

Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.

Complicated. On Windows, it seems AltGr (AltRight) is converted into two distinct keydown KeyboardEvents, one for key='Alt' and keyCode='AltRight' and a second event with key='Control' and keyCode='CtrlLeft', while in Gnome I see key='AltGraph' and keyCode='AltRight'. The Wikipedia page [WIKIALTGR] says the Option key on a Mac keyboard is similar, but I don't have a Mac availbale for testing so unfortunately I don't know how that translates into KeyEvents in the browser when using different keyboard layouts.

[WIKIALTGR] https://en.wikipedia.org/wiki/AltGr_key

The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.

I guess we can't do anything about that confusion, correct? Do you think it would somewhat to block the key codes or match them for those doppelganger characters?

I think it's better if every character codepoint is mapped to a key per language/locale. I was considering mapping all doppelgangers to the same key, but after our discussion this seems like the wrong behavior. Every doppelganger is coupled with a specific set of languages, if we add an explicit mapping for a character then it should be relative to that language. So, in the case of 'Ð', we can map U+0110 with (letter D with stroke) respect to its location on a Slavic (or similar) keyboard, and we can map U+00D0 (letter Eth) to its location on the Norwegian or Finnish keyboard.

The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.

That's an interesting shell one-liner. Could you post the instructions on what it does and how to reproduce it for future work? :)

I'm adjusting the mappings so they are based on the locale that is most likely to use a key rather than mapping it based on the most common key used for that character across all the keyboard layouts. So, this one-liners, which counts the number of times a key is mapped to a specific location on the keyboard (used the file attached), will not be very useful in the end.

Last edited 19 months ago by sysrqb (previous) (diff)

comment:15 Changed 19 months ago by sysrqb

I'm thinking about where ¢ (U+00A2, cent sign) should be mapped. It's on the French-Canadian, Dutch and Brazilian keyboards, but I don't think any of them are necessary the most likely to use it. I was attempting to cover a breadth with this patch, but maybe quantity isn't quality and we may be better off letting some characters fall-through to a sane default rather than choose between a few minor options. This is also difficult because I'm basing decisions on what I see and read on Wikipedia and some additional small amount of research; natives of the respective locales would likely provide better data points for making a decision.

comment:16 Changed 19 months ago by sysrqb

Okay, following up on the comment Arthur made [0], I think we can mitigate this by suppressing the keydown events on dead keys and track these keys as modifier keys. The current behavior when a dead key is pressed is an event is dispatched with key="Dead". In Firefox, the javascript keydown callback's event.code reflects the key pressed (ex. BracketLeft), and charCode=which=keyCode=location=0 and altKey=ctrlKey=metaKey=false. With this patch, Tor Browser sends key="Dead" and checks the hashmap for the proper code (of which there isn't a mapping, so it chooses the default). When the next character is pressed, Firefox and Tor Browser dispatch another event that contains the raw (unmodified) character that was pressed (ex. key='o'). It does not make the substitution. I believe we can use the functionality already available in the TextInputProcessor for tracking a dead key and dispatching an event with the modified character.

I think in the short term, it's safe to suppress keydown events dead keys. As with shift/alt/altgr this only filters dead keys from javascript keydown callbacks, I confirmed this does not affec
t input in chrome fields or using dead keys on interactive javascript websites like etherpad.

[0] https://github.com/sysrqb/tor-browser/commit/52b021674c6885d30e851557b14a8d70b5702a75#commitcomment-24553008

comment:17 Changed 19 months ago by sysrqb

Status: needs_revisionneeds_review

comment:18 Changed 19 months ago by sysrqb

After I read Arthur's comment on the github branch[0] I realized there's a bug in the patch. I pushed a fixup commit that corrects it[1].

So it's in this ticket, the relevant part is:

Every call to KEY/SHIFT/ALTGR updates the mapping in all hashmaps, therefore
calling both ALTGR and SHIFT for the same key results in the ALTGR state being
overwritten. I'll leave these comments above and instead I'll create a fourth
macro for ALTGRSHIFT that correctly inserts the key into the hashmaps. I'll
then replace all occurrences of ALTGR()+SHIFT() with ALTGRSHIFT().

Commit 1e094349cd679d8592411d741512d71bc29185fc does this.

[0] https://github.com/sysrqb/tor-browser/commit/bug16678_2#commitcomment-24566167
[1] https://github.com/sysrqb/tor-browser/commit/1e094349cd679d8592411d741512d71bc29185fc

comment:19 Changed 19 months ago by gk

Keywords: TorBrowserTeam201709R added; TorBrowserTeam201709 removed

comment:20 Changed 19 months ago by gk

Keywords: TorBrowserTeam201710R added; TorBrowserTeam201709R removed

Moving reviews to October.

comment:21 in reply to:  18 Changed 19 months ago by arthuredelstein

Replying to sysrqb:

After I read Arthur's comment on the github branch[0] I realized there's a bug in the patch. I pushed a fixup commit that corrects it[1].

So it's in this ticket, the relevant part is:

Every call to KEY/SHIFT/ALTGR updates the mapping in all hashmaps, therefore
calling both ALTGR and SHIFT for the same key results in the ALTGR state being
overwritten. I'll leave these comments above and instead I'll create a fourth
macro for ALTGRSHIFT that correctly inserts the key into the hashmaps. I'll
then replace all occurrences of ALTGR()+SHIFT() with ALTGRSHIFT().

Commit 1e094349cd679d8592411d741512d71bc29185fc does this.

[0] https://github.com/sysrqb/tor-browser/commit/bug16678_2#commitcomment-24566167
[1] https://github.com/sysrqb/tor-browser/commit/1e094349cd679d8592411d741512d71bc29185fc

That's a good improvement; thanks.

comment:22 Changed 18 months ago by gk

Keywords: TorBrowserTeam201711R added; TorBrowserTeam201710R removed

Moving review to November

comment:23 Changed 18 months ago by mcs

Status: needs_reviewneeds_revision

Kathy and I reviewed the bug16678_2 changes. We have not tried to test the code yet, and we have not tried to determine if all of the mappings are optimal. That said, we have a few comments:

  1. Should we remove the "TODO needs more information" mappings? Or do we plan to do more research and include a mapping for them?
  1. Typo in a comment: Italisn
  1. Typo in a comment: FInnish
  1. Do we have automated tests for our key event fingerprinting changes?

comment:24 Changed 18 months ago by gk

Keywords: TorBrowserTeam201711 added; TorBrowserTeam201711R removed
Note: See TracTickets for help on using tickets.