Opened 6 years ago

Closed 5 years ago

#10815 closed enhancement (wontfix)

4.dev.15 ruleset db is not a proper db?

Reported by: Faziri Owned by: pde
Priority: Medium Milestone:
Component: HTTPS Everywhere/EFF-HTTPS Everywhere Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

First off, thank you for resolving https://trac.torproject.org/projects/tor/ticket/10174

However, the implementation looks wrong. The database does not appear to be made the way an SQL db is supposed to be made. There could be a reason that'd warrant an exception, so I'm mostly just asking for a clarification as to why this odd choice was made, not complaining that the db is bad.

http://puu.sh/6M0HH.png

The db contains a column "id" to link domains and a column "contents", which simply contains the lines of XML. You've essentially replaced the text file that contains lines of XML by a db that contains lines of XML.

Normally speaking, the db should have a column for each of the properties of a rule.

This allows for lazy loading of the ruleset, so the problem was solved on the filesystem level, but it was not solved on the per-rule level since it's still just XML that needs to be parsed. And now, you have a database that violates database design rules. Using the db to the fullest by transforming the XML rules into db columns would further improve the system.

So why was the choice made to do it this way?
Cheers

Child Tickets

Change History (5)

comment:1 Changed 6 years ago by Faziri

Allow me to rephrase:
"This allows for lazy loading of the ruleset, so the problem was solved on the filesystem level, but it was not solved on the per-rule level since it's still just XML that needs to be parsed."

It does solve the problem of having to parse a huge XML file at startup, thus fixing the RAM usage and startup slowdown, so thanks for that. But creating a db without normalizing it and restructuring the data to work as a proper db is kind of a hacky solution, it's not proper development. I understand it'd be a lot of work to replace the XML rule-parser by an sql equivalent, but not doing so is not a solution either.

Version 0, edited 6 years ago by Faziri (next)

comment:2 Changed 5 years ago by zyan

Forwarding this to jsha by email; not sure if he has a trac.torproject.org account yet. Will keep you updated.

comment:3 Changed 5 years ago by zyan

Status: newneeds_information

comment:4 Changed 5 years ago by jsha

You make a good point, Faziri: It would be cleaner to fully transform the XML into a database schema. The main reason I didn't do that initially was that it was the minimal change necessary to speed up DB loading to the point where we could unblock the 4.0 beta Firefox release.

Additionally, someone later pointed out that some users take advantage of the 'custom ruleset' feature, where they can author their own rulesets and put them in their profile directory. Since the format of these custom rulesets is also XML, we'd have to continue to maintain an XML-ingesting code path in addition to a newer, more SQL-native DB-ingesting code path, which seems like a recipe for bugs.

Besides conceptual cleanliness, what do you see as the advantages of transforming the XML schema more fully into a DB schema?

comment:5 Changed 5 years ago by pde

Resolution: wontfix
Status: needs_informationclosed

It doesn't seem useful to make the DB schema richer, since we don't query it in any rich ways, and if it had its own internal semantics we'd just be parsing out of that back into JS datastructures.

Perhaps one could argue that each row of the database should be a JSON blob, since those map a little more cleanly into JS than XML does. But unless someone has an engineering argument that there's a payoff for the work that change would involve, I'm inclined to leave this as wontfix for now.

Note: See TracTickets for help on using tickets.