Opened 9 years ago

Closed 5 years ago

#1614 closed enhancement (wontfix)

list

Reported by: bee Owned by: pde
Priority: Very Low Milestone:
Component: HTTPS Everywhere/EFF-HTTPS Everywhere Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi!!!!!!!!!

So far, this is how i've understand how HTTPS Everywhere works!!! but I ain't sure if everything i've got is right!!!
It loads all rules in one array "this.rules" of the "HTTPSRules" class!!!
Every rule is an object of the "RuleSet" class!!
So, when you need to trigger a filter, you fetch all "this.rules" rules to test if a selected address (which is saved into an URI object) matches against one of the filters of "RuleSet"!!
That should be slow if you suppose to have plenty of filters!!!!!!!!! even if you've only a few rules!!!
This is why you added one thing called: "match_rule"!!! Yeah, it's because every rule can have more than a single filter, and having a "generic" matching rule, speeds up the search a bit!!!! It allows you to do only one test for a rule!!!
Anyway, i think you can improve it further!!!!
I think you've to write in the XML match_rule, only the host name of a server!!! Yeah, it's like "server.com", and it should be ok even for all subdomains, like "this.server.com"!!
Then, you've to create the list of rules and sort them by name!!! So, you'll have "aaa.com", "bee.net", "eff.org", "paypal.com", "server.com", "zzz.com" and so on!!!!
When, you need to lookup one main rule, you don't need to execute any slow regexp!!!!!!!!!!! All you've to do, is a binary search on the list!!! I think that it'll speed up a lot the addon, at least when the rules set will grow up!!!!!
http://en.wikipedia.org/wiki/Binary_search_algorithm
I've even found a demo code: http://www.nczonline.net/blog/2009/09/01/computer-science-in-javascript-binary-search/ !!!!! This is funny!! There is also one link to one example of a so called "binary search tree"!! You could create one tree having for each node the length of an host name!!!!! So, you can have for every node, a list with only host names of the same length!!!! though this seems to me too much complex!!!!! but it could be useful to split into groups lists with millions of items!!!!!!!!!!!!!!! yeah!!

~bee!!!!!!!!!!

Child Tickets

Change History (8)

comment:1 Changed 9 years ago by bee

Almost done!!!!!!!!!!!!! you've to wait a bit more for the code, but it's here!!!!!!!!!!!!!!!!!!!! yeah!!!

~bee!!!!!!!!!

comment:2 Changed 9 years ago by bee

Done!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

It's here: http://honeybeenet.altervista.org/fun/https-everywhere/HoneyEverywhere.tar.bz2 !!!!!!!!!!!!!!!!

To install it, you've to move in:
/home/$USERNAME/.mozilla/firefox/$PROFILENAME/extensions/https-everywhere @ eff.org/chrome/content
And remove the folder named "rules" (you've to remove it, because i renamed a file, GoogleServices -> GMail.xml, so if you copy/paste it won't get deleted nor replaced!!)
Now, you've to unpack my patch!!!!!!! you've to unpack it in that directory and it'll restore the rules, as i changed the format i had to rewrite them all, and it'll overwrite a file named "HTTPSRules.js"!!!! I didn't made a DIFF file of it, because it's mostly rewritten!! so a diff file is just useless hahahah!!!!!!!!!!!!!!! this is funny!!!!!!!!

The new system to look-up rules is very super fast!!!!!!!!!!!!!!!!!!!!!! it's fast like a bee!!!!!!!! (or almost!!!) It uses multiple binary lists!!!!!!

I tested it and it works!!!!!!! Tell me what you think of it!!!!!!!!!!!!

bye!!!!!!!!!!
~bee!!!!!!!

comment:3 Changed 9 years ago by bee

BUMP!!!!!!!

I just updated my patch!!!!!! It's now completed!!! I added the support for "RULES" that have got an empty "match_rule" value in their XML files!!!!!!!!!!
Yeah!!!! Well, i think this to be a very good patch, i think you can apply it!!! it hasn't a backward compatibility for filters, because the "match_rule" field in XML files musn't be a regexp!!!!!! BTW i adjusted all the rules!!!!!!!! yeah! lol!!!!!!!!!!!!

bye!!!!!!!!!!!!
~bee!!!!

comment:4 Changed 9 years ago by bee

Updated!!!!

YEAH! i forgot to add wiktionary in the filters list but it's ok now, and i also added the ruleset for WSWS.org!!!!!!! Well, so i patched the code, adding the workaround to avoid crashes!!!! (the other ticket "Segmentation fault while ..."!!)

bye!!!!!!

comment:5 Changed 9 years ago by mikeperry

Can you please create a git patch of your changes so it is easier to review?

See: https://www.eff.org/https-everywhere/development#git-howto

comment:6 Changed 9 years ago by bee

Hi!!!!!!!!

Thank you mikeperry!!! i'm very happy you're asking that to me!!!!!! and with a please too!!!!!!!!!!!!!!!
As you become interested in this, and finally you've acknowledged my patch!!!!, i'm sure i'll review it to make it work in the newest version of EFF-HTTPS Everywhere!!!!!
Anyway, i can't work on it right now!!! i've too many things to do in these days, in this weekend!!! a lot of hauseworks!!!! things to learn!!!!things to handwrite i can't fake!!!!!!!!!!!!!
Yeah!!! because of my super busy honey bee life!!!! i can't play much with the PC at the moment!!!!!!!!!!

bye!!!!!!!!!!
~bee!!!!!

comment:7 Changed 9 years ago by bee

Hi!!!!!!!!!! Good morning!!!!!!!!!!!!!!!!!!!!!!!!!!! YEAH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Yesterday in the late evening i updated the code and now did the upload of it!!!!!!!!!!
Anyway, i hadn't the time to test it!!!! but i'm very sure it works like it was working before!!!!!!!!!!

Same place!!! http://honeybeenet.altervista.org/fun/https-everywhere/HoneyEverywhere.tar.bz2
but it also has the "diff" files!!!!!!!
"HTTPSRules.diff" is the diff result between "HTTPSRules-or.js" (the original file) and "HTTPSRules.js" (the new file!!!!!)
I also re-wrote some rules!!!!! "rules.diff" tells you the differences!!!!! So, you know how to update the rules (it's very easy!!!) to push them into the binary list/index!!!!!!!!!

The only two things i really did are renaming "match_rule" to "match_host"!!! So, it keeps the backward compatibility!!! because the rules with a regexp in the "match_rule" field will have it ignored and they're indexed in a normal list (not in the binary list, but another list, so they do work!!!) and i added the javascript functions "rewrittenURI" because i noticed they have been added in the newest versions of HTTPS Everywhere!!!! (though i cannot use it, because i crashes firefox!!!!!)
I also made some esthetic improvements!!!!!! yeah!! i'm that good!!!!!
I replaced this ugly thing!!!

var match_rl = null;
var dflt_off = null;
if (xmlrules.@match_rule.length() > 0) match_rl = xmlrules.@match_rule;
if (xmlrules.@default_off.length() > 0) dflt_off = xmlrules.@default_off;
var ret = new RuleSet(xmlrules.@name, match_rl, dflt_off);

with this beauty honey bee's code!!!!!!!!!

var ret = new RuleSet(xmlrules.@name, 
    xmlrules.@match_host.length()  > 0 ? xmlrules.@match_host  : null, 
    xmlrules.@default_off.length() > 0 ? xmlrules.@default_off : null);

YEAH!!!!!! Test if everything works and merge the code!!!!!!!!!!!!!!!!!!!!!!
SUPER!!!!!!!!!!!!

~bee!!!!!!!!

comment:8 Changed 5 years ago by jsha

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.