Opened 3 years ago

Last modified 2 years ago

#25729 reopened defect

UTF8 encoded TORRC does NOT parse non-Latin paths

Reported by: Fleming Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: needs-proposal
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


Unpack this Tor archive to C:\

It will create the following hierarchy:
C:\Проверка\Tor (for executables, libraries and torrc)
C:\Проверка\Tor\Data (for data and geoip)

Configuration file, torrc, is encoded UTF8.
It has this line: DataDirectory C:\Проверка\Tor\Data

If I run tor.exe -f torrc, the output as follows
[warn] Error creating directory C:\Проверка\Tor\Data: No such file or directory
[warn] Failed to parse/validate config: Couldn't create private data directory "C:\Проверка\Tor\Data"

Now let’s replace UTF8 encoded torrc with ANSI encoded torrc and Tor works as expected.

Child Tickets

#10416assignedahfTor won't start on Windows when path contains non-ascii charactersCore Tor/Tor
#28256newAdd tests for UTF-8 encoded torrcs on WindowsCore Tor/Tor
#31827closedtbb-teamTor unexpectedly exited.Core Tor/Tor

Change History (8)

comment:1 Changed 3 years ago by teor

Component: - Select a componentCore Tor/Tor
Milestone: Tor: unspecified
Resolution: wontfix
Status: newclosed
Version: Tor: unspecified

The Windows fopen() implementation assumes that file paths are in the default system encoding, which is win-1252 on the system that created this archive.

Windows does not support UTF-8 as a default system encoding, because its system encodings are limited to two bytes per character.

To support UTF-8 on Windows, tor would need to rewrite all our filesystem code to use the Windows unicode filesystem APIs. This would be a breaking change for existing configs, and a lot of work for us.

As a workaround, please save your torrc file in your Windows system encoding.

comment:2 Changed 3 years ago by Fleming

Win1252? ANSI codepage which was used is Win1251.
Also I have tried setting “chcp 65001” before running UTF8 encoded torrc, didn’t help.
Let other developers take a peek, why the rush?

comment:3 Changed 3 years ago by teor

Keywords: needs-proposal added
Resolution: wontfix
Status: closedreopened

comment:4 Changed 3 years ago by Fleming

Keywords: needs-proposal removed

Am not a developer, but my user’s experience says that DataDirectory path is a string, right? Check its existence on disk as ANSI first, then internally convert that string to UTF8 and check again.

comment:5 Changed 3 years ago by teor

Keywords: needs-proposal added

This has potential backwards compatibility and security implications, so it needs a proposal.

comment:6 Changed 2 years ago by teor

Parent ID: #27380

comment:7 Changed 2 years ago by teor

Here is one possible scheme:

  • tor defaults to using the default codepage for filenames
  • add a torrc option, extra option argument, or filename prefix that makes tor use the specified codepage or UTF-8

comment:8 Changed 2 years ago by teor

Parent ID: #27380

Unparenting, we do not need to make the torrc UTF-8 yet.

Note: See TracTickets for help on using tickets.