In #15646 (moved), we introduced protection against KeyboardEvent-based fingerprinting of keyboard layout when characters normally found on a US-English keyboard are typed. Let's extend that protection to all characters one might expect from various Latin keyboards, such as German, French, Polish, etc.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Basically we are implementing a virtual customized keyboard layout. This layout does not contain Right-keys (location 2, keys on right side). It is a QWERTY keyboard based on the "English (US)" layout, therefore any non-English characters will be mapped onto US-centric keys when combined with a modifier. We'll need both shift and AltGr (as the combination of asserting ctrl and alt) for this, else we don't have enough combinations available.
The US-International keyboard layout [0] provides a nice base, so beginning with that we gain:
What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?
What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.
Can we map any unrecognized keys to a single keycode? There will certainly be a long tail of characters that aren't explicitly included here, we should handle them gently.
Basically we are implementing a virtual customized keyboard layout. This layout does not contain Right-keys (location 2, keys on right side). It is a QWERTY keyboard based on the "English (US)" layout, therefore any non-English characters will be mapped onto US-centric keys when combined with a modifier. We'll need both shift and AltGr (as the combination of asserting ctrl and alt) for this, else we don't have enough combinations available.
The US-International keyboard layout [0] provides a nice base, so beginning with that we gain:
With Shift-AltGr:
{{{
¹ £ ÷
Ä Å É Þ Ü Ú Í Ó Ö
Á § Ð Ø ° ¨ ¦
Æ ¢ Ñ Ç
}}}
Hi -- my thinking is, to minimize disruption to usability, we should try to spoof the physical key (KeyboardEvent.code and KeyboardEvent.keyCode) that is most commonly used (roughly) for a given character across different locales' physical keyboard layouts. So for example, I imagine we might want to use either the Spanish or French physical key for the ç character. (Unfortunately they are different, so we have to choose.)
And, I think likely it makes sense for more than one character to spoof the same physical key, or physical key combination. We're not trying to simulate any particular whole keyboard layout, but rather we want to spoof individual keys so they don't reveal the true keyboard.
What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?
I think so, yes. We could also consider Cyrillic characters (see Russian vs Serbian keyboard layouts), and maybe other kinds of characters, too. Although if that turns out to be too much for one ticket, I think it would be reasonable to open tickets for categories we don't want to cover here.
What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.
Currently we're not dropping most keys, to minimize the usability impact. If necessary, in some cases we could simply drop the .code and .keyCode members of KeyboardEvent without suppressing the event itself. But I tend to think we should just aim to gradually expand our range of spoofings. Note we do suppress KeyboardEvents for a few modifier keys because combination key presses can reveal a user's locale when they are typing special characters:
See #17009 (moved) and patch at https://gitweb.torproject.org/tor-browser.git/patch/?id=2679132
Replying to sysrqb:
Hi -- my thinking is, to minimize disruption to usability, we should try to spoof the physical key (KeyboardEvent.code and KeyboardEvent.keyCode) that is most commonly used (roughly) for a given character across different locales' physical keyboard layouts. So for example, I imagine we might want to use either the Spanish or French physical key for the ç character. (Unfortunately they are different, so we have to choose.)
And, I think likely it makes sense for more than one character to spoof the same physical key, or physical key combination. We're not trying to simulate any particular whole keyboard layout, but rather we want to spoof individual keys so they don't reveal the true keyboard.
Thanks, okay, so instead of providing a single custom layout (say, based on the US-International keyboard), the result of this will basically overlay most of the existing layouts (QWERTY, QWERTZ, AZERTY, etc) and resolve any conflicting key locations such that there is a proper one-to-one mapping from key to location. I expect I'll choose the wrong keycode for some of them, but hopefully not too many.
What other keys are missing? Some layouts provide 1/8, 3/8, 5/8, 7/8, ™, ˆ. Should these be included?
I think so, yes. We could also consider Cyrillic characters (see Russian vs Serbian keyboard layouts), and maybe other kinds of characters, too. Although if that turns out to be too much for one ticket, I think it would be reasonable to open tickets for categories we don't want to cover here.
What is the expected result if a key is not recognized? Should torbrowser drop it? I'm worried about the impact on usability if torbrowser does something surprising when a user presses a key that "should work". With that said, any keys not included in this custom layout continue to be a potential fingerprint.
Currently we're not dropping most keys, to minimize the usability impact. If necessary, in some cases we could simply drop the .code and .keyCode members of KeyboardEvent without suppressing the event itself. But I tend to think we should just aim to gradually expand our range of spoofings. Note we do suppress KeyboardEvents for a few modifier keys because combination key presses can reveal a user's locale when they are typing special characters:
See #17009 (moved) and patch at https://gitweb.torproject.org/tor-browser.git/patch/?id=2679132
Yes, I noticed the suppression both during my testing and in the current fingerprinting resistence code. I expect that'll require some tweaking with the additions we're adding here.
I'll do some more research on keyboard layouts and come back with a patch.
I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts. This does not include Cyrillic characters (or other charsets), yet, although I agree that would be a great addition.
The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.
The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.
https://github.com/sysrqb/tor-browser/tree/bug16678_1a now contains a commit that uses the constant values defined by nsIDOMKeyEvent (dom/interfaces/events/nsIDOMKeyEvent.idl) and replaces the existing magic numbers. Unfortunately, this touches nearly every line in dom/events/KeyCodeConsensus.h, so the code review will be a little tedious.
I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts.
Additionally (though not necessarily for the deadline) I would suggest adding a comment for each key mentioning which keyboard layout each key came from. (All previous keys came from the US keyboard.) Once the annotations are added, it would be prudent to have another review to carefully check each of the mappings to make sure they are correct.
Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.
The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.
I guess we can't do anything about that confusion, correct? Do you think it would somewhat to block the key codes or match them for those doppelganger characters?
The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.
I surveyed the different layouts shown on the QWERTY [0], QWERTZ [1], and AZERTY [2] pages on Wikipedia, and I documented (roughly) the different keys (attached). From this, the patch [3] contains 131 unicode characters, covering most Latin charset-based keyboard layouts.
Thanks for the review! There are still a few outstanding questions I need to answer. I'll improve the branch as much as possible before the deadline.
Additionally (though not necessarily for the deadline) I would suggest adding a comment for each key mentioning which keyboard layout each key came from. (All previous keys came from the US keyboard.) Once the annotations are added, it would be prudent to have another review to carefully check each of the mappings to make sure they are correct.
Yes, agreed. I'll add these as I make changes, but there will likely be some that will need updating after the deadline.
Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.
Complicated. On Windows, it seems AltGr (AltRight) is converted into two distinct keydown KeyboardEvents, one for key='Alt' and keyCode='AltRight' and a second event with key='Control' and keyCode='CtrlLeft', while in Gnome I see key='AltGraph' and keyCode='AltRight'. The Wikipedia page [WIKIALTGR] says the Option key on a Mac keyboard is similar, but I don't have a Mac availbale for testing so unfortunately I don't know how that translates into KeyEvents in the browser when using different keyboard layouts.
The patch falls back on code "IntlBackslash" and keycode 220, when a mapping does not exist for a key. Something unfortunate/annoying I found while working on this is that unicode provides more than one code for the same glyph (such as U+0110 (capital letter D with stroke) and U+00D0 (capital letter eth) for Ð), so I am worried some keyboard drivers/platforms use different codes for characters that are visually the same, thus this patch may result in slightly strange behavior.
I guess we can't do anything about that confusion, correct? Do you think it would somewhat to block the key codes or match them for those doppelganger characters?
I think it's better if every character codepoint is mapped to a key per language/locale. I was considering mapping all doppelgangers to the same key, but after our discussion this seems like the wrong behavior. Every doppelganger is coupled with a specific set of languages, if we add an explicit mapping for a character then it should be relative to that language. So, in the case of 'Ð', we can map U+0110 with (letter D with stroke) respect to its location on a Slavic (or similar) keyboard, and we can map U+00D0 (letter Eth) to its location on the Norwegian or Finnish keyboard.
The key-to-code mappings were decided by taking the results of the survey and choosing the most common keyboard key per character/symbol. There were many symbols that were in a unique location on different layouts, so I chose a key that seemed reasonable.
That's an interesting shell one-liner. Could you post the instructions on what it does and how to reproduce it for future work? :)
I'm adjusting the mappings so they are based on the locale that is most likely to use a key rather than mapping it based on the most common key used for that character across all the keyboard layouts. So, this one-liners, which counts the number of times a key is mapped to a specific location on the keyboard (used the file attached), will not be very useful in the end.
I'm thinking about where ¢ (U+00A2, cent sign) should be mapped. It's on the French-Canadian, Dutch and Brazilian keyboards, but I don't think any of them are necessary the most likely to use it. I was attempting to cover a breadth with this patch, but maybe quantity isn't quality and we may be better off letting some characters fall-through to a sane default rather than choose between a few minor options. This is also difficult because I'm basing decisions on what I see and read on Wikipedia and some additional small amount of research; natives of the respective locales would likely provide better data points for making a decision.
Okay, following up on the comment Arthur made [0], I think we can mitigate this by suppressing the keydown events on dead keys and track these keys as modifier keys. The current behavior when a dead key is pressed is an event is dispatched with key="Dead". In Firefox, the javascript keydown callback's event.code reflects the key pressed (ex. BracketLeft), and charCode=which=keyCode=location=0 and altKey=ctrlKey=metaKey=false. With this patch, Tor Browser sends key="Dead" and checks the hashmap for the proper code (of which there isn't a mapping, so it chooses the default). When the next character is pressed, Firefox and Tor Browser dispatch another event that contains the raw (unmodified) character that was pressed (ex. key='o'). It does not make the substitution. I believe we can use the functionality already available in the TextInputProcessor for tracking a dead key and dispatching an event with the modified character.
I think in the short term, it's safe to suppress keydown events dead keys. As with shift/alt/altgr this only filters dead keys from javascript keydown callbacks, I confirmed this does not affec
t input in chrome fields or using dead keys on interactive javascript websites like etherpad.
After I read Arthur's comment on the github branch[0] I realized there's a bug in the patch. I pushed a fixup commit that corrects it[1].
So it's in this ticket, the relevant part is:
Every call to KEY/SHIFT/ALTGR updates the mapping in all hashmaps, thereforecalling both ALTGR and SHIFT for the same key results in the ALTGR state beingoverwritten. I'll leave these comments above and instead I'll create a fourthmacro for ALTGRSHIFT that correctly inserts the key into the hashmaps. I'llthen replace all occurrences of ALTGR()+SHIFT() with ALTGRSHIFT().
Commit 1e094349cd679d8592411d741512d71bc29185fc does this.
After I read Arthur's comment on the github branch[0] I realized there's a bug in the patch. I pushed a fixup commit that corrects it[1].
So it's in this ticket, the relevant part is:
{{{
Every call to KEY/SHIFT/ALTGR updates the mapping in all hashmaps, therefore
calling both ALTGR and SHIFT for the same key results in the ALTGR state being
overwritten. I'll leave these comments above and instead I'll create a fourth
macro for ALTGRSHIFT that correctly inserts the key into the hashmaps. I'll
then replace all occurrences of ALTGR()+SHIFT() with ALTGRSHIFT().
}}}
Commit 1e094349cd679d8592411d741512d71bc29185fc does this.
Kathy and I reviewed the bug16678_2 changes. We have not tried to test the code yet, and we have not tried to determine if all of the mappings are optimal. That said, we have a few comments:
Should we remove the "TODO needs more information" mappings? Or do we plan to do more research and include a mapping for them?
Typo in a comment: Italisn
Typo in a comment: FInnish
Do we have automated tests for our key event fingerprinting changes?
Could you also comment here for the record on AltGr vs Alt vs AltLeft? Is AltGr they expected modifier in KeyboardEvents from most modern keyboards? It doesn't seem to appear on my Mac, if I recall correctly.
Complicated. On Windows, it seems AltGr (AltRight) is converted into two distinct keydown KeyboardEvents, one for key='Alt' and keyCode='AltRight' and a second event with key='Control' and keyCode='CtrlLeft', while in Gnome I see key='AltGraph' and keyCode='AltRight'. The Wikipedia page [WIKIALTGR] says the Option key on a Mac keyboard is similar, but I don't have a Mac availbale for testing so unfortunately I don't know how that translates into KeyEvents in the browser when using different keyboard layouts.