Opened 6 months ago

Closed 5 months ago

#33042 closed task (fixed)

switch servers to a UTF-8 locale by default

Reported by: anarcat Owned by: anarcat
Priority: High Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Major Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

no one probably ever noticed this, but our servers all run on a plain "C" locale, which means many programs cannot parse UTF-8 characters by default.

this was mostly not a problem: until very recently, SSH client's environment would propagate to the server, which means many users had, in effect, a UTF-8 locale. mine, for example, is fr_CA.UTF-8.

but that also meant that error messages would get translated into whatever language the user is using, which made collaboration harder. so this was turned off in:

commit 1e1fc35ef105b854456586815dd093e27f80635e
Author: Antoine Beaupré <anarcat@debian.org>
Date:   Tue Jan 21 16:02:34 2020 -0500

    do accept LANG or LC_* variables in sshd
    
    While it's great that Debian GNU/Linux translates its messages for
    users all over the world, the Tor Project is an international
    community where it's preferable to communicate in the current "lingua
    franca", english. By accepting client-specific locales, we are
    breaking that common language and making it harder to collaborate.
    
    This also happens to irritate weasel because my locale is french and
    that leaks out to some error messages and output in my
    copy-pastes. This should solve the problem for him, hopefully,
    eventually.

diff --git a/modules/ssh/templates/sshd_config.erb b/modules/ssh/templates/sshd_config.erb
index c79b3a9b..6b63b429 100644
--- a/modules/ssh/templates/sshd_config.erb
+++ b/modules/ssh/templates/sshd_config.erb
@@ -113,7 +113,7 @@ TCPKeepAlive yes
 #Banner none
 
 # Allow client to pass locale environment variables
-AcceptEnv LANG LC_*
+#AcceptEnv LANG LC_*
 
 # override default of no subsystems
 Subsystem sftp /usr/lib/openssh/sftp-server

Unfortunately, this had the effect of turning my environment into a plain, default locale, which is "Not set" according to dpkg-reconfigure locales, which probably means the default C locale.

This meant I started getting weird errors. One of those happened while starting vim, for example:

root@fsn-node-03:~# vim
Error detected while processing /root/.vimrc:
line   13:
E474: Invalid argument: listchars=tab:»·,trail:·
Press ENTER or type command to continue

My first reaction was to remove this line at all, which I did in commit 0bcad064. But then I realized this problem was triggered by the absence of a UTF-8 locale.

So I'm proposing we switch to a C.UTF-8 locale everywhere by default. That way at least we'd get unicode to display properly everywhere.

I'm marking this as high priority because those weird things could happen to other people and i don't want to leave this fuzzy situation going on for too long.

Child Tickets

Change History (4)

comment:1 Changed 6 months ago by anarcat

Owner: changed from tpa to anarcat
Status: newaccepted

comment:2 Changed 6 months ago by anarcat

Status: acceptedneeds_review

i've pushed this to the default-utf8 branch on puppet, and i'll merge it on monday unless there are any objections:

commit 35a01dc2c0b45a9b5e63a83533f1562cc3520e1f
Author: Antoine Beaupré <anarcat@debian.org>
Date:   Thu Jan 23 15:43:20 2020 -0500

    enable default LANG=C.UTF-8 locale
    
    Ever since this commit:
    
    1e1fc35e do accept LANG or LC_* variables in sshd
    
    i've had those weird warnings when loading the vimrc. I finally
    figured out why that was happening: it's because we don't get a UTF-8
    locale anymore.
    
    To fix this, we set the default locale to the base default locale (C)
    and unicode (so C.UTF-8). The only difference with the default locale
    is the addition of the UTF-8 codepages, which is backwards compatible
    with ASCII.
    
    I don't expect problems to come out of this: until 1e1fc35e various
    locales were in use on the servers and it didn't seem to cause
    significant problems. In particular, I've been using fr_CA.UTF-8 which
    did cause some messages to show up in french, but in general I expect
    software should tolerate locale changes like this without too much
    problems.
    
    Going *back* to a regular C locale could be harder: that could mean
    some characters, exactly like the ones in that .vimrc, could not
    display properly and cause the kind of errors I was seeing and that
    caused me to start going down the path of:
    
    0bcad064 remove listchars parameter
    
    So, in a way, this is complementary to the change that stops accepting
    locales in sshd: we don't accept those locales, but at least we have a
    sensible, UTF-8 ready default locale.

diff --git a/modules/torproject_org/files/root-dotfiles/vimrc b/modules/torproject_org/files/root-dotfiles/vimrc
index cf50c15f..d99e4d68 100644
--- a/modules/torproject_org/files/root-dotfiles/vimrc
+++ b/modules/torproject_org/files/root-dotfiles/vimrc
@@ -10,8 +10,8 @@ set ai
 :syn on
 :set title
 :set pastetoggle=<F10>
-":set listchars=tab:»·,trail:·
-":set list
+:set listchars=tab:»·,trail:·
+:set list
 :nmap <F11> :set invlist<return>
 :imap <F11> <C-O>:set invlist<return>
 :set clipboard^=autoselectml guioptions+=A
diff --git a/modules/torproject_org/manifests/init.pp b/modules/torproject_org/manifests/init.pp
index 568dba81..f19b5605 100644
--- a/modules/torproject_org/manifests/init.pp
+++ b/modules/torproject_org/manifests/init.pp
@@ -33,7 +33,7 @@ class torproject_org {
             source  => "puppet:///modules/torproject_org/etc/cron.weekly/make-dh-params",
             ;
         "/etc/default/locale":
-            content => "",
+            content => "LANG=C.UTF-8\n",
             ;
         "/etc/nsswitch.conf":
             mode   => '0755',

comment:3 Changed 5 months ago by anarcat

Status: needs_reviewmerge_ready

kicking this can now

comment:4 Changed 5 months ago by anarcat

Resolution: fixed
Status: merge_readyclosed

pushed, seems to have fixed the issue on fsn-node-03 at least.

Note: See TracTickets for help on using tickets.