Opened 4 months ago

Closed 5 weeks ago

#27380 closed defect (wontfix)

require torrc to be UTF-8

Reported by: cyberpunks Owned by:
Priority: Medium Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: rust-wants, prop285
Cc: teor Actual Points:
Parent ID: #24033 Points:
Reviewer: dgoulet Sponsor:

Description


Child Tickets

TicketStatusOwnerSummaryComponent
#27428closedrequire ContactInfo line to be UTF-8Core Tor/Tor

Change History (25)

comment:1 Changed 4 months ago by cyberpunks

branch unicode-torrc1 at https://gitgud.io/onionk/tor.git

comment:2 Changed 4 months ago by teor

Status: newneeds_review

comment:3 Changed 4 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.6.x-final

comment:4 Changed 3 months ago by teor

I'd like to do a limited version of this check in 0.3.5, then do the full check in 0.3.6.

In 0.3.5:

  • relays check that the ContactInfo line is UTF-8

In 0.3.6:

  • the whole torrc must be UTF-8 for every tor instance

comment:5 Changed 3 months ago by teor

Status: needs_reviewneeds_revision

comment:6 Changed 3 months ago by teor

Milestone: Tor: 0.3.6.x-finalTor: 0.3.5.x-final

comment:7 in reply to:  4 ; Changed 3 months ago by cyberpunks

Replying to teor:

In 0.3.5:

  • relays check that the ContactInfo line is UTF-8

See #27428 for this. Please review.

comment:8 in reply to:  7 ; Changed 3 months ago by teor

Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

Replying to cyberpunks:

Replying to teor:

In 0.3.5:

  • relays check that the ContactInfo line is UTF-8

See #27428 for this. Please review.

Thanks!

I'm busy fixing bugs in 0.3.4 right now.
I'll try to review these tickets before the 0.3.5 deadline, otherwise they will go in 0.3.6.

comment:9 in reply to:  8 Changed 8 weeks ago by cyberpunks

This branch already made the requested revision on September 4th.

comment:10 Changed 7 weeks ago by nickm

Status: needs_revisionneeds_review

comment:11 Changed 7 weeks ago by dgoulet

Reviewer: dgoulet

comment:12 Changed 6 weeks ago by dgoulet

Status: needs_reviewneeds_information

Created a PR based on 036 branch: https://github.com/torproject/tor/pull/459

Waiting for CI to pass... The code lgtm;

However, how many tor will we break with this? In other words, if bunch of relays update to 036 stable and then tor doesn't start because torrc not in UTF-8... bad?

comment:13 in reply to:  12 ; Changed 6 weeks ago by teor

Replying to dgoulet:

Created a PR based on 036 branch: https://github.com/torproject/tor/pull/459

Waiting for CI to pass... The code lgtm;

However, how many tor will we break with this? In other words, if bunch of relays update to 036 stable and then tor doesn't start because torrc not in UTF-8... bad?

We found one relay that has a ContactInfo that's not UTF-8.

But we also need to fix Windows to use UTF-8 encoded paths before we make this change (see #25729). Otherwise, Windows users with non-ASCII usernames won't be able to use Tor.

comment:14 in reply to:  13 ; Changed 6 weeks ago by dgoulet

Replying to teor:

Replying to dgoulet:

Created a PR based on 036 branch: https://github.com/torproject/tor/pull/459

Waiting for CI to pass... The code lgtm;

However, how many tor will we break with this? In other words, if bunch of relays update to 036 stable and then tor doesn't start because torrc not in UTF-8... bad?

We found one relay that has a ContactInfo that's not UTF-8.

Well that is fixed and merged with #27428

But we also need to fix Windows to use UTF-8 encoded paths before we make this change (see #25729). Otherwise, Windows users with non-ASCII usernames won't be able to use Tor.

That sounds fine.

But ContactInfo and Nickname are unfortunately optional so if they do not appear, I do not see a reason why we have to enforce UTF-8 on the entire torrc...?

If someone then *adds* ContactInfo then we'll be enforcing UTF-8 which is good but before the operator does that, what is the rationale?

comment:15 in reply to:  14 ; Changed 6 weeks ago by teor

Replying to dgoulet:

Replying to teor:

Replying to dgoulet:

Created a PR based on 036 branch: https://github.com/torproject/tor/pull/459

Waiting for CI to pass... The code lgtm;

However, how many tor will we break with this? In other words, if bunch of relays update to 036 stable and then tor doesn't start because torrc not in UTF-8... bad?

We found one relay that has a ContactInfo that's not UTF-8.

Well that is fixed and merged with #27428

But we also need to fix Windows to use UTF-8 encoded paths before we make this change (see #25729). Otherwise, Windows users with non-ASCII usernames won't be able to use Tor.

That sounds fine.

But ContactInfo and Nickname are unfortunately optional so if they do not appear, I do not see a reason why we have to enforce UTF-8 on the entire torrc...?

Nickname is required to be ASCII:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1271

ContactInfo and Platform can be non-ASCII, but platform is set by code, not the torrc.

If someone then *adds* ContactInfo then we'll be enforcing UTF-8 which is good but before the operator does that, what is the rationale?

You're right - there isn't a good reason to make the whole torrc UTF-8.
Directory documents should be UTF-8 for consistency.
But we shouldn't break someone's tor just because their comments are in a legacy encoding.

That said, we will eventually want to do torrc processing with Rust, and Rust's strings are UTF-8. (There are ways around that, but they will make our code more complicated.)

Do you want to downgrade this error to a warning that the torrc may not work with future tor versions?
Or should we defer this task until a later release, and decide then?

comment:16 in reply to:  15 Changed 6 weeks ago by dgoulet

Status: needs_informationneeds_review

Replying to teor:

Do you want to downgrade this error to a warning that the torrc may not work with future tor versions?

This is a good idea. We can notify the user and tell them that UTF-8 is desirable and will get quite important in future versions.

I pushed a fixup commit in ticket27380_036_01.
See the PR: ​https://github.com/torproject/tor/pull/459

Simply warning with a longer message and then continuing.

comment:17 Changed 6 weeks ago by teor

But we also need to fix Windows to use UTF-8 encoded paths before we make this change (see #25729). Otherwise, Windows users with non-ASCII usernames won't be able to use Tor.

Now Windows users with non-ASCII usernames will get warnings.

Do you think we should do the Windows fix before merging this fix?

comment:18 in reply to:  17 Changed 6 weeks ago by dgoulet

Replying to teor:

But we also need to fix Windows to use UTF-8 encoded paths before we make this change (see #25729). Otherwise, Windows users with non-ASCII usernames won't be able to use Tor.

Now Windows users with non-ASCII usernames will get warnings.

Do you think we should do the Windows fix before merging this fix?

Yes most likely. Worst case, we are talking about two warnings about torrc not being UTF-8 and then a second one specific to the Username.

comment:19 Changed 6 weeks ago by teor

When a non-ASCII username appears in a path on Windows, then the torrc won't be UTF-8.
So I think there will only be one warning.

comment:20 in reply to:  19 ; Changed 5 weeks ago by dgoulet

Replying to teor:

When a non-ASCII username appears in a path on Windows, then the torrc won't be UTF-8.
So I think there will only be one warning.

Hmmm... From the branch I proposed above, tor would check the torrc for UTF-8 _before_ the Username is even looked at thus this will lead to a double warning?

comment:21 in reply to:  20 ; Changed 5 weeks ago by teor

Replying to dgoulet:

Replying to teor:

When a non-ASCII username appears in a path on Windows, then the torrc won't be UTF-8.
So I think there will only be one warning.

Hmmm... From the branch I proposed above, tor would check the torrc for UTF-8 _before_ the Username is even looked at thus this will lead to a double warning?

I can't find any code that checks if the Username is UTF-8. I don't think we need a username check for UTF-8.

Do you mean the ContactInfo?

Because I am talking about path to the Tor Browser directory, which can contain the username on Windows.

comment:22 Changed 5 weeks ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:23 in reply to:  21 ; Changed 5 weeks ago by dgoulet

Status: needs_reviewneeds_information

Replying to teor:

Do you mean the ContactInfo?

Yes I think I've mistaken the previous comment using "username". Rather ContactInfo and I'm guessing paths would need to be UTF-8?

If we only err for that, we don't need this branch/ticket imo.

comment:24 in reply to:  23 Changed 5 weeks ago by teor

Replying to dgoulet:

Replying to teor:

Do you mean the ContactInfo?

Yes I think I've mistaken the previous comment using "username". Rather ContactInfo and I'm guessing paths would need to be UTF-8?

At the moment, Windows paths need to be encoded in the default codepage, because we don't use UTF-8 file APIs on Windows.

macOS HFS+ and later support UTC-8, I have no idea about Linux or BSD.

If we only err for that, we don't need this branch/ticket imo.

We might eventually want this ticket for Rust parsing torrc. But we'll have to implement a workaround for non-UTF-8 torrcs in Rust anyway. So I don't think this ticket is needed at the moment.

comment:25 Changed 5 weeks ago by teor

Resolution: wontfix
Status: needs_informationclosed
Note: See TracTickets for help on using tickets.