#5733 closed defect (fixed)
default.rulesets is obfuscated by not having line terminators
Reported by: | cypherpunks | Owned by: | |
---|---|---|---|
Priority: | Low | Milestone: | |
Component: | HTTPS Everywhere/EFF-HTTPS Everywhere | Version: | |
Severity: | Keywords: | lf cr crlf linebreak newline | |
Cc: | sunspyre@… | Actual Points: | |
Parent ID: | Points: | ||
Reviewer: | Sponsor: |
Description
HTTPS Everywhere 2012.5.1_1, running in Chromium on linux. I wanted to alter a rule without making a wholly new one.
default.rulesets is impossible to parse with human eyes. The lines have no line terminators so it's all in just one incredibly long string; *literally* a wall of text.
As an exception a small number of lines end in the long-deprecated CR terminator (carriage return), as for instance Macs used pre-OSX before changing to the Unix standard LF (linefeed). DOS/Windows use CRLF, which is why files saved under Unix-like OSs don't work well with notepad. Please see the Wikipedia page on newlines for more information.
File description;
2012.5.1_1/rules$ file default.rulesets default.rulesets: UTF-8 Unicode text, with very long lines, with no line terminators
Count of lines and characters;
2012.5.1_1/rules$ wc -lc < default.rulesets 1 1107900 # ergo 1 line of more than a million charactes
Count of occurences of CR and LF respectively;
2012.5.1_1/rules$ od -a < default.rulesets | grep -o cr | wc -l 145 # 145 CR occurences *total* 2012.5.1_1/rules$ od -a < default.rulesets | grep -o lf | wc -l 0 # 0 LF
Regardless of which scheme to follow (Unix LF, DOS CRLF, old-old CR) there should naturally be one linebreak per line.
(Incidentally I used BR ? to break lines in this post, so if it ends up looking really weird let's all chuckle at the irony!)
Child Tickets
Attachments (1)
Change History (9)
comment:1 Changed 8 years ago by
Component: | HTTPS Everywhere: Chrome → EFF-HTTPS Everywhere |
---|
comment:2 Changed 8 years ago by
comment:4 Changed 7 years ago by
Something like that?
It flattens the entire thing into some unholy cosmic superstring, then adds appropriate linebreaks and some basic indentation on a per-tag setting.
Builds for me, at least.
Example: http://paste.ubuntu.com/1316689 (unsure of how long Ubuntu pastes stay alive)
comment:5 Changed 7 years ago by
This looks great. We should give it a review for possible bugs and the commit to master.
comment:6 Changed 7 years ago by
Owner: | changed from pde to git@… |
---|---|
Status: | new → assigned |
FauxFaux, any chance you could review this fancier upgrade of your earlier script?
Changed 7 years ago by
Attachment: | merge-rulesets.sh added |
---|
minor improvements but really just cosmetics at this point
comment:7 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Okay, I have merged this in git master. I also had an ugly hack translation of the old script into Python lying around, which is a lot faster and therefore useful for rapid-iteration debugging of HTTPS Everywhere things. I tweaked that to do one line per ruleset, but the shell version is much more elegant ATM.
comment:8 Changed 7 years ago by
Just a followup note: the pretty merge-rulesets.sh script is actually buggy, so we can't use it, but our current merge-rulesets.py script still addresses the underlying problem (not as prettily, mind) and is 5x faster.
This is a hilarious bug report :).
The reason there are some CRs in there is because ruleset authors committed rules containing them, but the script that strips comments and whitespace from default.rulesets didn't anticipate that.
One solution to your problem is to edit rulesets by a different method:
<edit the ruleset in src/chrome/content/rules>
<uninstall the upstream .crx file; install the one you just built>
The other solution would be to send a patch against the ruleset library compression script to make it preserve a small number of judiciously selected whitespaces. Maybe one per ruleset? It's can be found here:
https://gitweb.torproject.org/https-everywhere.git/blob/b50b89f917497996409f87f2469237c9016237e0:/merge-rulesets.sh