Opened 4 years ago

Last modified 6 months ago

#12031 needs_revision enhancement

Create a Key-Value database system for simple/flat datatypes in BridgeDB

Reported by: isis Owned by: isis
Priority: High Milestone:
Component: Obfuscation/BridgeDB Version:
Severity: Normal Keywords: bridgedb-1.0.x, bridgedb-db, proposal-226, isis2015Q1Q2, isisExB, isisExC, TorCoreTeam201608
Cc: isis, sysrqb, wfn Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

BridgeDB will likely need some rather high-level code for interacting with Redis through Twisted in order to complete Section 2 of proposal #226.

We'll probably want to start with storing the (emailAddress,timestamp) pairs for the bridgedb.Storage.getEmailTime and bridgedb.Storage.getWarnedEmail inside Redis. Other simple datatypes which can be stored in Redis should eventually get listed here (and eventually in bridgedb-spec.txt.

Child Tickets

Change History (10)

comment:1 Changed 4 years ago by wfn

Cc: wfn added

fwiw, I can vouch for Redis being an excellent product.

Set intersections are crazy fast. Main datatypes (if you could call them that) (sets, hashtables) work very well indeed.

Very minor nitpick: a key-value store is not an RDBMS. :) this probably doesn't matter, but it could confuse some onlookers / people trying to help (or maybe not.)

comment:2 in reply to:  1 Changed 4 years ago by isis

Status: newneeds_revision

Replying to wfn:

fwiw, I can vouch for Redis being an excellent product.

Set intersections are crazy fast. Main datatypes (if you could call them that) (sets, hashtables) work very well indeed.

Very minor nitpick: a key-value store is not an RDBMS. :) this probably doesn't matter, but it could confuse some onlookers / people trying to help (or maybe not.)

Derp. I meant "an RDBMS for interacting with key-value stores in Redis".

I intended there to be a database management system, which one one side would have an API for taking/answering queries from bridge distributors, and on the other side, would handle talking to whatever the databases on the backend are.

I decided to go with using `txredis`, even though it has some problems. Work so far on this is in my fix/12031-redis branch. Tests are totally not passing yet. See the commit messages in that branch for design a couple major design choices and some problems encountered so far.

comment:3 Changed 4 years ago by isis

Summary: Create a Key-Value RDBMS system for simple/flat datatypes in BridgeDBCreate a Key-Value database system for simple/flat datatypes in BridgeDB

comment:4 Changed 4 years ago by isis

My work on this so far is going faster than expected, but it's still in no way ready to merge. You can see my latest branch, fix/12031-redis_r1 here.

So far I have finished:

  • I've tested my implementation so far by using the bridgedb.parse.descriptors module (from my fix/9380-stem_r2 branch, which is not merged yet) to parse 250 @type bridge-networkstatus documents with Stem, serialise them, and store them in Redis. That entire process takes a mean time 12ms, and scales linearly with additional descriptors. (As opposed to the linearithmic scaling which BridgeDB is currently dealing with.)


There is a tiny bit of entanglement with my branch for #12029, fix/12029-dist-api_r1, due to there being some commits at the beginning of the fix/12031-redis_r1 branch which the fix/12029-dist-api_r1 needs, these will need to be separated properly, depending on whichever one is merged first. The commits which are necessary for both branches are:

5f1243d75af619b8c39fe98be48ba1bc43f63944 Make `answerParameters` be a @property of `Distributor`s.
2e0f705e0edbdea0c4db3ad91e5e7bbed9b670e5 Move `b.Bridges.BridgeRingParameters` → `b.bridgerequest.AnswerParameters`.
5495af3b0877b58af1005caaa14655f32b20d084 Rename `b.hashring.BridgeRing` to `b.hashring.Hashring` and cleanup docstrings.
f9632e97c1ca0bed825def9e5dee503e6d87849d Make `b.hashring.BridgeRing` an implementer of `b.hashring.IHashring`.
fb28ea8b9dfa9cc3e0d04bd0a405c5058d125f4f Move `b.Bridges.BridgeRing` → `b.hashring.BridgeRing`.
8b4b08c0f38ea3dae723eff2b27e7b338bf9edb7 Add IHashring interface specification.
60814f6b7f6d7f952cd6dfd62eb1bfdd7db6d769 Move old `b.Dist.Distributor` doctest to `b.Dist.IPBasedDistributor`.
b93b8410b075f09576984d5819a92e5dfa5eba3f Remove old bridgedb.Dist.Distributor class.
36069a806bd3f0aa5bef27fb7a14e59745f43ecc Add basic implementation of the IDistribute interface.
d956f1011af69ea6e0513d0907e4d7d440812eda Add initial IDistribute interface specification for a distributor.
99fd8189785917dd09d9122270692a89f896b061 Fix old Tests.py; networkstatus.parseALine returns a string.
5c8bf76f2379d0039637877ef9ae4bdf11a72877 Support requiring distribution of bridges with the "Running" flag.
ca25134def3ac064cdd1f5d21ce583013d7318fd Rip out `BridgeHolder.assignmentsArePersistent()`, it's never used.
541f927475e8574a96a03c94fc99f74b37b4d6b4 Make `isValidIP()` backwards compatible with deprecated `is_valid_ip()`.
6a59652077571d700266c5ffa01ead8c55d58325 Move old b.Bridges.is_valid_ip() doctest to b.p.addr.isValidIP().
20ddda862453494bfbc98c34ff70f4e00ace3bae Add note that bridgedb.Bridges.ID_LEN isn't used.
fbde0bf1702347bf524452401672ac263c2ace04 Remove bridgedb.Bridges.HEX_DIGEST_LEN; it's completely unused.
4d7a9f34d66d71f71c6814f219a3e3c207431d8a Deprecate bridgedb.Bridges.is_valid_fingerprint().
afa33faaf90982c68e4f501c0d44646d3437eb04 Add bridgedb.interfaces module which simply collects all interfaces.
d78e56e424f644da7d9f8da99ff7012b2cc922c2 Rename IBridgeRequest → IRequestBridges. It's cuter.
0ab4fbc2a531b9f7895655a18e702e3d48647f51 Add Bridges.is_valid_ip doctests to b.p.addr.isValidIP docstring.


...and some of those may even be general enough to warrant inclusion into the develop branch before anything gets merged.

I estimate that this ticket is somewhere between 30-50% complete.

comment:5 Changed 4 years ago by isis

This is currently blocked on #9380, #12029, and #12505. Otherwise we'll have to redesign schemas twice.

comment:6 Changed 4 years ago by isis

I closed #13578 as a duplicate of this ticket. I'm not actually sure if it's a duplicate, but it requested that we store PT bridge information in the database separate from the vanilla bridge which is running the PT, so that we can query for PT information specifically. Since this is extra work to do for the old databases and the new, we should probably only do it for the new databases.

#13570 was depending on #13578, so now #13570 is dependent on this ticket.

comment:7 Changed 3 years ago by hellais

Has there been done any progress on this? Is there something I can help out with?

We would like to have this be done by the end of the year so that we can publish the bridge reachability study and have it be well integrated with bridge_db.

comment:8 Changed 3 years ago by isis

Keywords: isis2015Q1Q2 isisExB isisExC added

comment:9 Changed 22 months ago by isis

Keywords: TorCoreTeam201608 added

Adding to my august tickets.

comment:10 Changed 6 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

Note: See TracTickets for help on using tickets.