Opened 3 months ago

Closed 3 months ago

#24467 closed enhancement (implemented)

Enable -Wnormalized=nfkc when available to avoid source code identifier confusion

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: security-low
Cc: Actual Points:
Parent ID: Points: 0.2
Reviewer: Sponsor:


In , nickm asks:

We use -Wnormalized=id now; should we switch?

Yes, we should switch to -Wnormalized=nfkc, as a precaution against patches that are submitted with similar-looking characters. Ideally, we would use -Wnormalized=ban-unicode-in-identifiers, but that's not something gcc has implemented yet.


Some characters in ISO 10646 have distinct meanings but look identical in some fonts or display methodologies, especially once formatting has been applied. For instance \u207F, “SUPERSCRIPT LATIN SMALL LETTER N”, displays just like a regular n that has been placed in a superscript. ISO 10646 defines the NFKC normalization scheme to convert all these into a standard form as well, and GCC warns if your code is not in NFKC if you use -Wnormalized=nfkc. This warning is comparable to warning about every identifier that contains the letter O because it might be confused with the digit 0, and so is not the default, but may be useful as a local coding convention if the programming environment cannot be fixed to display these characters distinctly.

clang hasn't implemented -Wnormalized yet:

Child Tickets

Attachments (1)

0001-Switch-Wnormalized-id-to-Wnormalized-nfkc.patch (1.3 KB) - added by ffmancera 3 months ago.

Download all attachments as: .zip

Change History (5)

comment:1 Changed 3 months ago by ffmancera

If everyone agree with switch, I can work on it.

comment:2 Changed 3 months ago by ffmancera

Status: newneeds_review

Patch ready, needs review!

comment:3 Changed 3 months ago by teor

Status: needs_reviewmerge_ready

Looks fine to me, let's get this merged.
We could even put it in 0.3.2 if you want, nickm, but given how unlikely it is that someone will submit Unicode identifiers in a patch, master should be fine.

comment:4 Changed 3 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

merged to master!

Note: See TracTickets for help on using tickets.