Opened 5 years ago

Closed 4 years ago

#12029 closed defect (fixed)

Redesign BridgeDB's class inheritance to make designing new distributors easier

Reported by: isis Owned by: isis
Priority: High Milestone:
Component: Obfuscation/BridgeDB Version:
Severity: Keywords: bridgedb-dist, bridgedb-0.2.4, bridgedb-0.2.x, isis2014Q3Q4, isis2015Q3Q4, isisExB, isisExC
Cc: isis, sysrqb, wfn, mikeperry, gk Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Everything is piled in lib/bridgedb/Dist.py right now, with a lack of organisation and documentation, making it rather overwhelming for others to design new distributors.

Here are some of the things which should be revised:

  1. Distributor Class Interface The class inheritance design of current distributors is all within one file and doesn't have any clear indication of what methods/attributes are necessary to create a new distributor. There should probably be a zope.interface.Interface class so that implementations of new distributors can be easily checked to see that they meet the specification.
  1. Distributor Class Inheritance The current Distributor "base class" in lib/bridgedb/Dist.py inherits from bridgedb.Bridges.BridgeHolder which is one of the hashring classes. I've never looked into why this is done, but these should likely be distinct and separate classes. Either way the code in lib/bridgedb/Bridges.py is a complete mess as well, and will likely need some cleanup.
  1. Hide Database Transactions There should probably be some sort of getBridgesFromDBManager() method for distributors, so that they can send a bridgedb.bridgerequest.BridgeRequest (or subclass) to a database manager.
    • This method should expect to interact with a system, called the DatabaseManager, that doesn't exist yet because I'm currently building it, and also expect to receive some JSON in return which contains appropriate bridges for the client's request.
    • This is so that the DatabaseManager can handle BridgeRequest queuing and database transactions, and the distributor can continue going about its business without having to worry about interacting with hashrings or databases or any of the complicated backend stuff.
    • This should permit the distributors to be run in separate processes (or on separate systems), so that they can interact with the DatabaseManager remotely, which will help later when/if we wish to run future distributors with a larger potential attack surface (such as an OTR/XMPP distributor).

There might be more things which will be necessary to change which will be discovered either while hacking on the revision, or while a new distributor is being created through subclassing and working with the new API; these things should be added to this ticket (or a new ticket, then mentioned here).

We might want some sort of DistributorTests mixin class for creating twisted.trial.unittest.TestCases from, so that they have some default checks.

Child Tickets

Change History (12)

comment:1 Changed 5 years ago by isis

Status: newneeds_information

I have a design question for BridgeDB that I've been thinking about for a couple weeks, and I can't decide which is the more secure/better thing to do.

I want there to be a separation between the actual databases, and the bridge Distributors. Hopefully, it can be in a way, such that the bridge Distributors could be run on separate machines (as has been discussed plenty before in #tor-dev on IRC).

In my head, I had this idea that there would be a DatabaseManager that the Distributors could talk to, either locally, or through an API where they send it some simply-formatted (probably JSON) requests for bridges from the databases.

Currently, a Distributor is the one who "owns" and accesses its own hashring. This has been this way for years (it is because the bridgedb.Dist.Distributor classes actually inherits from bridgedb.Bridges.BridgeHolder, the later of which is the base class for all hashrings), this choice was made because the threat model specifies that the compromise of one hashring shouldn't effect another distributor. This requires the Distributor to hold its hashring in memory (it can be substantially large), and (unless each Distributor is going to run its own Redis instance) this is much slower (in terms of CPU/memory overhead).

So the design question is this: who "owns" the hashrings with bridges in them? the DatabaseManager? or the Distributor?

Here is the layout of the Pros and Cons of each design (at least those I've been able to think of so far):

  1. The DatabaseManager "owns" a Distributor's hashring.

Pros

  • Enables less complication for people implementing new Distributors. (Implementers wouldn't need to deal with hashrings at all; they'd just need to have a way to authenticate to and ask the DatabaseManager for bridges.)
  • Requires substantially less CPU and memory for a new Distributor, whether it's on the same machine as the databases or otherwise.

Cons

  • Requires potentially significantly more network overhead.
  • If the Distributor is run remotely, we still can't reliably determine if the Distributor was compromised, and so we should still cling to the threat model that a compromised Distributor may obtain/hand out all bridges under its control.
  1. A Distributor owns its own hashring.

Pros

  • No additional network overhead each time the Distributor wants to hand out bridges.

Cons

  • There is still the network overhead of receiving the bridges from the DatabaseManager in the first place.
  • Requires large CPU/memory requirements for each new Distributor.
  • Implementers of new Distributors will potentially have additional complication, due to needing to deal with hashrings and storing their own bridges. (Though this might be simplified by hiding the functionality in a "private" methods in the parent class.)

Feel free to add Pros/Cons or argue for whichever design, or propose another one, because I'm not sure what to do yet.

comment:2 in reply to:  1 Changed 5 years ago by isis

Cc: mikeperry gk added
Status: needs_informationaccepted

Replying to isis:

I have a design question for BridgeDB that I've been thinking about for a couple weeks, and I can't decide which is the more secure/better thing to do.

I want there to be a separation between the actual databases, and the bridge Distributors. Hopefully, it can be in a way, such that the bridge Distributors could be run on separate machines (as has been discussed plenty before in #tor-dev on IRC).

In my head, I had this idea that there would be a DatabaseManager that the Distributors could talk to, either locally, or through an API where they send it some simply-formatted (probably JSON) requests for bridges from the databases.

Currently, a Distributor is the one who "owns" and accesses its own hashring. This has been this way for years (it is because the bridgedb.Dist.Distributor classes actually inherits from bridgedb.Bridges.BridgeHolder, the later of which is the base class for all hashrings), this choice was made because the threat model specifies that the compromise of one hashring shouldn't effect another distributor. This requires the Distributor to hold its hashring in memory (it can be substantially large), and (unless each Distributor is going to run its own Redis instance) this is much slower (in terms of CPU/memory overhead).

So the design question is this: who "owns" the hashrings with bridges in them? the DatabaseManager? or the Distributor?

Here is the layout of the Pros and Cons of each design (at least those I've been able to think of so far):

  1. The DatabaseManager "owns" a Distributor's hashring.

Pros

  • Enables less complication for people implementing new Distributors. (Implementers wouldn't need to deal with hashrings at all; they'd just need to have a way to authenticate to and ask the DatabaseManager for bridges.)
  • Requires substantially less CPU and memory for a new Distributor, whether it's on the same machine as the databases or otherwise.

Cons

  • Requires potentially significantly more network overhead.
  • If the Distributor is run remotely, we still can't reliably determine if the Distributor was compromised, and so we should still cling to the threat model that a compromised Distributor may obtain/hand out all bridges under its control.
  1. A Distributor owns its own hashring.

Pros

  • No additional network overhead each time the Distributor wants to hand out bridges.

Cons

  • There is still the network overhead of receiving the bridges from the DatabaseManager in the first place.
  • Requires large CPU/memory requirements for each new Distributor.
  • Implementers of new Distributors will potentially have additional complication, due to needing to deal with hashrings and storing their own bridges. (Though this might be simplified by hiding the functionality in a "private" methods in the parent class.)

Feel free to add Pros/Cons or argue for whichever design, or propose another one, because I'm not sure what to do yet.


This was discussed in the "Bridging the Gap" breakout session at the Summer 2014 Tor Developer Meeting in Paris, between myself, wfn, mikeperry, gk, and some others.

We decided that, in order for future client-side distributor mechanisms to easily integrate with Tor Browser, it is necessary to implement case #1 described above, such that the Distributor API can be reused in a language-agnostic manner. I'm going to proceed with implementing #1.

comment:3 Changed 5 years ago by isis

Status: acceptedneeds_revision

FWIW, and this will very likely change, my current progress is in my fix/12029-dist-api_r1 branch, and there is another fix/12029-hashring-refactor branch based on it which I've been doing the hashring code cleanup in.

I believe that I can likely create the rest of the Distributor API without needing to refactor the hashrings immediately, so I'm putting the hashring work on hold for a bit.

comment:4 Changed 5 years ago by isis

By the way, here's the BridgeDB interface design outline from the developer meeting "Bridging the Gap". It includes some bits of where the Distributor API needs to go and which systems it needs to interact with.

comment:5 Changed 5 years ago by isis

I separated some of the commits which were generally for refacting the bridgedb.Bridges module, which were in the fix/12029-dist-api_r1 branch, into a new fix/12029-refactor-Bridges branch, because I would prefer that no one else need to duplicate refactoring I've already finished (as is now happening with joelanders in #9380).

The commits were all chopped off the bottom of the fix/12029-dist-api_r1 branch, and were these:

99fd8189785917dd09d9122270692a89f896b061 Fix old Tests.py; networkstatus.parseALine returns a string.
5c8bf76f2379d0039637877ef9ae4bdf11a72877 Support requiring distribution of bridges with the "Running" flag.
ca25134def3ac064cdd1f5d21ce583013d7318fd Rip out `BridgeHolder.assignmentsArePersistent()`, it's never used.
541f927475e8574a96a03c94fc99f74b37b4d6b4 Make `isValidIP()` backwards compatible with deprecated `is_valid_ip()`.
6a59652077571d700266c5ffa01ead8c55d58325 Move old b.Bridges.is_valid_ip() doctest to b.p.addr.isValidIP().
20ddda862453494bfbc98c34ff70f4e00ace3bae Add note that bridgedb.Bridges.ID_LEN isn't used.
fbde0bf1702347bf524452401672ac263c2ace04 Remove bridgedb.Bridges.HEX_DIGEST_LEN; it's completely unused.
4d7a9f34d66d71f71c6814f219a3e3c207431d8a Deprecate bridgedb.Bridges.is_valid_fingerprint().
afa33faaf90982c68e4f501c0d44646d3437eb04 Add bridgedb.interfaces module which simply collects all interfaces.
d78e56e424f644da7d9f8da99ff7012b2cc922c2 Rename IBridgeRequest ? IRequestBridges. It's cuter.
0ab4fbc2a531b9f7895655a18e702e3d48647f51 Add Bridges.is_valid_ip doctests to b.p.addr.isValidIP docstring.


In addition, in order to get the tests to pass, I added one commit, and fixup'd another one.

I'm merging fix/12029-refactor-Bridges now (for BridgeDB-0.2.4) because it's needed for work on other tickets.

comment:6 Changed 5 years ago by isis

Okay, latest changes for this ticket are in my fix/12029-dist-api_r2 branch. The commits mentioned above are, of course, no longer there.

Additionally, the following commits from the original fix/12029-dist-api_r1 branch have been moved to a new fix/12031-redis_r2 branch because they are specific to #12031 (which might need some changes from this ticket):

5d587c1dc860374355b429116a606aa26c7cd805
fba0575322e5a89096df8b8fa70561846eaf68b2
540871d2f5a84f1ff04598603c9188d3853b500a
2c4c4d09aeeddaeda452d8b151bd1e2d84440758
8ac3957ba3b88a065b978a3bff258e0c6d2f3c80
7ced3853144bc6eb20ccf3b5a7b033c197022c97
e7d01c6e567e31a7d5f216084fa4be56999282d0
df908da250a74099b09a5d5823bd454184c3e552
ab740a36fb56740525a01ca8b0c24c9c99bd1cf1
e1c7e2906696744cafd4f83120ae90b31a853894
a1ad026d51a8741c1157df15ebd41f019e16720e
fd37e68a153f3cf200e91d23eb7b1f87c8203e93
28f678f08bf0b9054106ae3c35fceb2711e4349c
160949d0e62390922f061708be807dd3a36078dc


So now the changes (so far) for this ticket are considerably smaller.

comment:7 Changed 5 years ago by isis

This is currently blocked on #9380.

comment:8 Changed 4 years ago by isis

Keywords: bridgedb-0.2.4 added

My fix/12029-refactor-Bridges branch was merged into develop for the release of bridgedb-0.2.4, which contained changes necessary before finishing #9380. This ticket is partly done, but there will be more to do after #9380 is finished.

comment:9 Changed 4 years ago by isis

Keywords: isis2014Q3Q4 added

comment:10 Changed 4 years ago by isis

Keywords: isis2015Q3Q4 added

comment:11 Changed 4 years ago by isis

Keywords: isisExB isisExC added

comment:12 Changed 4 years ago by isis

Resolution: fixed
Status: needs_revisionclosed

While working on #12505, I realised that all of the work for #12505, #12506 (this ticket), #11330, and #12029 are all closely coupled, i.e. something which should be committed as part of the work for one ticket would often depend on a specific commit from the changes for another ticket, and then depend on something else from one of the other tickets, and so on. Because of this, I did all my development for all four tickets in one branch.

I've finally separated out the changes for just this ticket, and put them in their own fix/12029-distribute-modulebranch, which is now being merged into develop for BridgeDB-0.3.3.

As part of the remaining work on #12505, there may still be slight changes make to the API for distributor creation, but overall it should be considered mostly stable. I'm looking forward to hearing feedback on this the next time someone creates a distributor, and suggestions for improvements are welcome.

Note: See TracTickets for help on using tickets.