Opened 5 months ago

Last modified 2 weeks ago

#30054 needs_information defect

Special characters are not escaped in translations and break the build

Reported by: gk Owned by: emmapeel
Priority: Medium Milestone:
Component: Community/Translations Version:
Severity: Normal Keywords: TorBrowserTeam201904, GeorgKoppen201904
Cc: emmapeel, igt0, gk, hans@…, ggus, pili Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As said in comment:14:ticket:27125 there are special characters (like the ') that need to get escaped in strings, otherwise the build breaks. While we did that in en-US that does not happen in translations (yet).

Child Tickets

TicketStatusOwnerSummaryComponent
#31350newtor-gitadmtest the locales for Android with a githook after updating from transifexInternal Services/Service - git

Change History (14)

comment:1 Changed 5 months ago by gk

I agree we should not require translators to know which entities to escape and which not. However, importing the strings as they currently are breaks the build. Could we have a script running over the translated strings adding necessary escaping if that is missed before the final strings are available in the translations repo?

I wonder how other projects are dealing with that problem as I guess we are hardly the only ones getting hit by it...

comment:2 Changed 5 months ago by emmapeel

I will talk with erinm and find out if that can be a feature request for Transifex

comment:4 Changed 5 months ago by emmapeel

I think i cleaned up a bit, let me know if it is still broken.

i also added a note for the translators, but it does not force the escaping.

comment:5 in reply to:  4 Changed 5 months ago by gk

Parent ID: #30016

Replying to emmapeel:

I think i cleaned up a bit, let me know if it is still broken.

i also added a note for the translators, but it does not force the escaping.

Thanks, this works for me. I'll unparent the ticket. Not sure if you want to keep it open for having a less ad hoc way of dealing with this problem implemented on the server-/repository-side. (Assuming that's the place where we want to make sure the escaping is properly done).

comment:6 Changed 5 months ago by emmapeel

Yes, I'd consider this ticket done when transifex can escape the chars themselves and present translators with a normal string, doing the escaping on the background.

This will prevent future problems, because right now we have to review this 'by hand'.

comment:7 in reply to:  6 Changed 4 months ago by gk

Replying to emmapeel:

Yes, I'd consider this ticket done when transifex can escape the chars themselves and present translators with a normal string, doing the escaping on the background.

This will prevent future problems, because right now we have to review this 'by hand'.

Or maybe there is some other automated step we can do, keeping Transifex out of the loop. I'll ask around what Mozilla is doing.

comment:8 Changed 4 months ago by gk

Keywords: TorBrowserTeam201904 GeorgKoppen201904 added

comment:9 Changed 4 months ago by gk

Okay, I talked to Pike who helped me a lot (thanks for that). Mozilla does not have an automatic way to escape strings properly. However, they have a tool called compare-locales that is used to make sure to check for issues before a build breaks. See: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/compare-locales for usage info. Moreover, I got told that we can run their tool by creating an l10n.toml to run against the l10n repo, with the android-dtd tests as defined in
https://searchfox.org/mozilla-central/rev/d143f8ce30d1bcfee7a1227c27bf876a85f8cede/mobile/android/locales/l10n.toml#131-136.

Thus, I guess instead of using their l10n repo we could point to a location of our choice to check for Android issues.

emmapeel: Is that enough to get you started?

comment:10 Changed 4 months ago by emmapeel

well there are other types of documents that escape characters, like PO files I believe. It would be a matter of creating a different type of document. I need to check with erinm on that.

comment:11 Changed 4 months ago by emmapeel

regarding how other projectss do it, they do checks by themselves, see for example in Tails:

https://git-tails.immerda.ch/jenkins-tools/tree/slaves/check_po

transifex does not provide any kind of checks (Weblate does check some stuff tho, tsck tsck).

Last edited 4 months ago by emmapeel (previous) (diff)

comment:12 Changed 4 months ago by eighthave

I think it is important to have a CI test that ensures that the translations aren't breaking anything. For F-Droid, I wrote an emulator test that loads all strings in all locales. It has found many crasher bugs. Feel free to take it under the license you need, it will need some tweaking to work in a different app (it is GPLv3+ in F-Droid):
https://gitlab.com/fdroid/fdroidclient/blob/master/app/src/androidTest/java/org/fdroid/fdroid/LocalizationTest.java

IMHO, the translation tool should properly escape the translator's input whenever possible. I believe Weblate does a pretty good job of this. Weblate checks are also quite useful for this kind of thing

comment:13 Changed 7 weeks ago by gk

Cc: pili added
Status: newneeds_information

emmapeel: what do we need to get this bug resolved/unblocked? Luckily, I ran a test build before starting the build for 8.5.4 in order to figure whether we could use the latest translations for mobile. But it turns out we can't as 21677307d87d3216eede51a6ad36bf26e7937667 is the last good commit before ''s get added unescaped to ca. That alone is unfortunate but it's even more so that this blocks other unaffected locales from getting the latest translations.

comment:14 Changed 2 weeks ago by emmapeel

The people of transifex says they cannot make a different escaping than the one on the dtd specs, so we should create a githook for the files at https://gitweb.torproject.org/translation.git/tree/en-US/torbrowser_strings.dtd?h=tba-torbrowserstringsdtd to be tested with the compare-locales mentioned by geko.

Note: See TracTickets for help on using tickets.