Opened 5 years ago

Last modified 5 years ago

#8182 new task

Explicitly figure out handling of internationalized domain names

Reported by: schoen Owned by: pde
Priority: High Milestone:
Component: HTTPS Everywhere/EFF-HTTPS Everywhere Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We should explicitly figure out and document how HTTPS Everywhere works with internationalized domain names (IDN), and make sure that it actually works according to the documented behavior. Do you write rules using UTF-8 internationalized names, punycode-encoded names, or neither? Do the rules actually trigger and rewrite correctly?

A potential problem about this was reported at

https://mail1.eff.org/pipermail/https-everywhere/2012-May/001435.html

Child Tickets

Attachments (2)

a-uni.xml (948 bytes) - added by mikkoharhanen 5 years ago.
UTF-8 encoding
a-latin.xml (966 bytes) - added by mikkoharhanen 5 years ago.
ISO-8859-15 encoding

Download all attachments as: .zip

Change History (3)

Changed 5 years ago by mikkoharhanen

Attachment: a-uni.xml added

UTF-8 encoding

Changed 5 years ago by mikkoharhanen

Attachment: a-latin.xml added

ISO-8859-15 encoding

comment:1 Changed 5 years ago by mikkoharhanen

I created rulesets for unused domain 'ä.fi'. In these rulesets, the letter a-umlaut (ä) was created with the following methods:

  • html entities
  • punycodes
  • UTF-8 characters
  • ISO-8859-15 characters

The test URL reveals the rules 'from' and 'to' fields. For example, with the URL 'http://ä.fi/entity-to-puny/' the from field uses html entities and to field uses punycodes to indicate a-umlaut. If the rule works, the address should be redirected to https.

a-uni.xml (file encoding: UTF-8)

[[Typed URL]]			[[Resulted URL]]
http://ä.fi/entity-to-entity/	--> [OK] https://ä.fi/entity-to-entity/
http://ä.fi/entity-to-puny/	--> [OK] https://ä.fi/entity-to-puny/
http://ä.fi/entity-to-uni/	--> [FAIL] https://ã¤.fi/entity-to-uni/

http://ä.fi/puny-to-puny/	--> [FAIL] http://www.ä.fi/puny-to-puny/
http://ä.fi/puny-to-entity/	--> [FAIL] http://www.ä.fi/puny-to-entity/
http://ä.fi/puny-to-uni/	--> [FAIL] http://www.ä.fi/puny-to-uni/

http://ä.fi/uni-to-uni/		--> [FAIL] http://www.ä.fi/uni-to-uni/
http://ä.fi/uni-to-entity/	--> [FAIL] http://www.ä.fi/uni-to-entity/
http://ä.fi/uni-to-puny/	--> [FAIL] http://www.ä.fi/uni-to-puny/
[[Typed URL]]				[[Resulted URL]]
http://ä.fi/entity-to-entity/	--> [FAIL] http://www.&.com/#228;.fi/entity-to-entity/
http://ä.fi/entity-to-puny/	--> [FAIL] http://www.&.com/#228;.fi/entity-to-puny/
http://ä.fi/entity-to-uni/		--> [FAIL] http://www.&.com/#228;.fi/entity-to-uni/

http://xn--4ca.fi/entity-to-entity/	--> [OK] https://ä.fi/entity-to-entity/
http://xn--4ca.fi/entity-to-puny/	--> [OK] https://ä.fi/entity-to-puny/
http://xn--4ca.fi/entity-to-uni/	--> [FAIL] https://ã¤.fi/entity-to-uni/

http://xn--4ca.fi/puny-to-puny/		--> [FAIL] http://www.ä.fi/puny-to-puny/
http://xn--4ca.fi/puny-to-entity/	--> [FAIL] http://www.ä.fi/puny-to-entity/
http://xn--4ca.fi/puny-to-uni/		--> [FAIL] http://www.ä.fi/puny-to-uni/

http://ã¤.fi/uni-to-uni/		--> [FAIL] http://www.ã¤.fi/uni-to-uni/
http://ã¤.fi/uni-to-entity/		--> [FAIL] http://www.ã¤.fi/uni-to-entity/
http://ã¤.fi/uni-to-puny/		--> [FAIL] http://www.ã¤.fi/uni-to-puny/

*

a-latin.xml (file encoding: ISO-8859-15)

[[Typed URL]]			[[Resulted URL]]
http://ä.fi/latin-to-latin/	--> [OK] https://ä.fi/latin-to-latin/
http://ä.fi/latin-to-entity/	--> [OK] https://ä.fi/latin-to-entity/
http://ä.fi/latin-to-puny/	--> [OK] https://ä.fi/latin-to-puny/

http://ä.fi/entity-to-latin/	--> [OK] https://ä.fi/entity-to-latin/
http://ä.fi/puny-to-latin/	--> [FAIL] http://www.ä.fi/puny-to-latin/

http://xn--4ca.fi/latin-to-latin/	--> [OK] https://ä.fi/latin-to-latin/
http://xn--4ca.fi/entity-to-latin/	--> [OK] https://ä.fi/entity-to-latin/
http://xn--4ca.fi/puny-to-latin/	--> [FAIL] http://www.ä.fi/puny-to-latin/

Conclusions:

  • HTML entities always work
  • Latin1 characters always work
  • Unicode characters never work
  • Puny-codes work in output ('to') fields but not in input ('from') fields
  • Firefox converts punycodes before HTTPS Everywhere has the opportunity to redirect them
Note: See TracTickets for help on using tickets.