Opened 8 years ago

Closed 5 years ago

#5232 closed defect (fixed)

Import bridges into BridgeDB in a separate thread and database transaction

Reported by: karsten Owned by: sysrqb
Priority: High Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords: bridgedb-email, bridgedb-db, bridgedb-https, bridgedb-0.1.7
Cc: nickm, isis, sysrqb Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Last week, kaner briefly mentioned that BridgeDB doesn't accept client connections during startup. I then asked aagbsn whether he can look at the relevant code parts to see if this also applies to reloading bridges, which happens twice an hour. He said it does.

To be clear: if BridgeDB takes 1 minute to load bridges (20K bridges on aagbsn's linode VM, 8K bridges on my desktop machine), it won't give out any bridges to clients for 2 minutes per hour. That's a maximum availability of 96.6 %. This is going to get worse the more bridges we add.

aagbsn and I think that we could make BridgeDB import bridges in a separate thread and in a single database transaction. There should probably be a check that it doesn't serve too old bridges (e.g., not more than a few hours old) during startup if it hasn't run for a while.

Child Tickets

Attachments (3)

reload_profile_master.out (48.0 KB) - added by sysrqb 6 years ago.
reload_profile_master_no_stability.out (20.0 KB) - added by sysrqb 6 years ago.
telnet-an-email (2.5 KB) - added by isis 6 years ago.
Adding my embarrassing pexpect + raw SMTP/telnet script that I use for testing the email server. (I didn't want to commit it to the repo.) To use it, do: ./telnet-an-email $USER

Download all attachments as: .zip

Change History (25)

comment:1 Changed 8 years ago by aagbsn

After some coffee:

Bridges are placed in BridgeRings in memory; so this will be more complicated than just a db transaction.

One possible solution could be to create a new copy of these datastructures and then swap atomically.

It might look something like this:

(in Main.py, load())

  1. create a new thread
  2. create new splitter(s) and rings
  3. get a db transaction
  4. insert bridges
  5. get a lock
  6. swap the datastructures
  7. commit the transaction
  8. release the lock

Things to check when testing:

  1. BridgeDB should serve up bridges during the update process
  2. The email distributor touches the database. Make sure it works properly during the update.
  3. Additional HUPs should not deadlock.
  4. Make sure that any references held by distributors are updated properly.

Anything else I forgot?

comment:2 Changed 8 years ago by karsten

Cc: nickm added

If you need to get a lock to swap the data structures and commit the transaction, does that mean readers also need to get a lock to get the data structure they're supposed to use? Also, you may want to commit the transaction before swapping the data structures, as the commit may fail.

This change sounds plausible to fix the problem, but it's also kinda invasive. Are there things we could refactor before implementing this change, e.g., create a clean interface how distributors access the bridge copies in memory and in the database? Ideally, the email distributor should not touch the database directly, and distributors shouldn't hold any references that we need to worry about when updating bridges. I'm worried that this complexity is going to bite us.

Speaking of invasive changes: do we have good ways to test BridgeDB? If not, is it a good time to add a bunch of unit/integration tests before starting to code on this change?

We should also ask nickm for his opinion. I don't know the BridgeDB code well enough to brainstorm about code changes like this. I cc'ed him in this ticket.

comment:3 in reply to:  2 Changed 8 years ago by aagbsn

Replying to karsten:

If you need to get a lock to swap the data structures and commit the transaction, does that mean readers also need to get a lock to get the data structure they're supposed to use? Also, you may want to commit the transaction before swapping the data structures, as the commit may fail.

good catch, thanks. Yes, readers will need to be multithreading-aware.

This change sounds plausible to fix the problem, but it's also kinda invasive. Are there things we could refactor before implementing this change, e.g., create a clean interface how distributors access the bridge copies in memory and in the database? Ideally, the email distributor should not touch the database directly, and distributors shouldn't hold any references that we need to worry about when updating bridges. I'm worried that this complexity is going to bite us.

the Email distributor holds a reference to the database as part of the email rate-limiting support. BridgeDB temporarily stores hashes of email addresses caught aggressively requesting bridges. These emails are stored in a separate table, however, sqlite may lock the entire database which could block if the transaction takes a long time to commit. Further investigation needed.

If we think re-implementing the rate limiting support to avoid having a reference to the database we'll need to make sure state persists through a reload (HUP) of BridgeDB.

And any multithreaded

Speaking of invasive changes: do we have good ways to test BridgeDB? If not, is it a good time to add a bunch of unit/integration tests before starting to code on this change?

BridgeDB actually has a number of tests (Tests.py), and I have been adding new tests. The email distributor needs better test coverage.

We should also ask nickm for his opinion. I don't know the BridgeDB code well enough to brainstorm about code changes like this. I cc'ed him in this ticket.

comment:4 Changed 8 years ago by karsten

Parent ID: #4499

Removing parent ticket relationship to #4499 which I cannot close while this ticket is still open.

comment:5 Changed 6 years ago by sysrqb

I thought I was close to having a patch for this, but it's slightly more complicated than I anticipated.

for example:

ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 3929742227200 and this is thread id 3929574946560

It appears that the majority of the time is spent during sqlite calls within the bridge weighting procedure. Attached will be a profiling of reload() during startup followed by a profiling after sending SIGHUP. I used a set of 1000 randomly generated descriptors and I updated the publication time of 187 descriptors prior to sending the SIGHUP (just so the stability code actually updated some bridge stats).

I'm also attaching a profile of reload() with the call to the Stability code commented out. On my computer this reduces startup time from 124.377 seconds to 4.889 seconds. I'm curious if we should comment out the call to addOrUpdateBridgeHistory() in load() for the time being until we are able to move the processing to a background thread and we have a mechanism for reachability testing implemented.

We're currently not using the BridgeHistory information, but it has occurred to me that long-term bridge history will be useful. I'm unsure which is more important.

Changed 6 years ago by sysrqb

Attachment: reload_profile_master.out added

Changed 6 years ago by sysrqb

comment:6 Changed 6 years ago by sysrqb

Keywords: important added

comment:7 Changed 6 years ago by isis

Status: newneeds_review

Loading the bridges in a separate thread is fixed in this commit on my develop branch.

comment:8 Changed 6 years ago by isis

Owner: changed from aagbsn to isis
Status: needs_reviewassigned

comment:9 Changed 6 years ago by isis

Status: assignedneeds_review

comment:10 Changed 6 years ago by isis

Ignore my "fix" above. It doesn't fix anything, it just breaks things.

Sysrqb fixed this in their branch, bug5232_r2_based_on_fix/9462_r9462C_r2. CI build passes and is here, however, some of the old unittests in lib/bridgedb/Tests.py fail. The output follows. All tests which fail due to Bridges.parseORAddressLine() missing are my fault, that functionality was moved to the rewritten parsers in the bridgedb.parse.networkstatus module for #9462. I'm more concerned about the errors saying:

exceptions.ValueError: database parameter must be string or APSW Connection object

My local test run produces the following results:

(bridgedb)∃!isisⒶwintermute:(testing/sysrqb/bug5232_r2_based_on_fix/9462_r9462C_r2 *$=)~/code/torproject/bridgedb ∴ bridgedb test
bridgedb.test.test_EmailServer
  EmailGnuPGTest
    test_getGPGContext_bad_keyfile ...                                     [OK]
    test_getGPGContext_good_keyfile ...                               [SKIPPED]
    test_getGPGContext_missing_keyfile ...                                 [OK]
bridgedb.Tests
  BridgeStabilityTests
    testAddOrUpdateSingleBridgeHistory ...                              [ERROR]
    testDeletingSingleBridgeHistory ...                                 [ERROR]
    testDiscountAndPruneBridgeHistory ...                               [ERROR]
    testFamiliar ...                                                    [ERROR]
    testLastSeenWithDifferentAddressAndPort ...                         [ERROR]
    testTOSA ...                                                        [ERROR]
  DictStorageTests
    testComplexDict ...                                                    [OK]
    testSimpleDict ...                                                     [OK]
  EmailBridgeDistTests
    testEmailRateLimit ...                                                 [OK]
    testUnsupportedDomain ...                                              [OK]
  IPBridgeDistTests
    testBasicDist ...                                                      [OK]
    testDistWithCategories ...                                             [OK]
    testDistWithFilterAll ...                                           [ERROR]
    testDistWithFilterBlockedCountries ...                              [ERROR]
    testDistWithFilterBlockedCountriesAdvanced ...                      [ERROR]
    testDistWithFilterBoth ...                                          [ERROR]
    testDistWithFilterIP4 ...                                           [ERROR]
    testDistWithFilterIP6 ...                                           [ERROR]
  ParseDescFileTests
    testConvolutedOrAddress ...                                            [OK]
    testMultipleOrAddress ...                                              [OK]
    testParseCountryBlockFile ...                                          [OK]
    testSimpleDesc ...                                                     [OK]
    testSingleOrAddress ...                                                [OK]
  SQLStorageTests
    testBridgeStorage ...                                                  [OK]
bridgedb.test.test_Tests
  TrialAdaptedOldUnittests
    test_allOldUnittests ...                                            [ERROR]
bridgedb.test.test_bridgedb
  BridgeDBCliTest
    test_bridgedb_commands ...
Copying config:
  '/home/isis/code/torproject/bridgedb/bridgedb.conf'
                ↓ ↓ ↓
  '/home/isis/code/torproject/bridgedb/_trial_temp/rundir/bridgedb.conf'

Copying GPG test key:
  '/home/isis/code/torproject/bridgedb/gnupghome/TESTING.subkeys.sec'
                ↓ ↓ ↓
  '/home/isis/code/torproject/bridgedb/_trial_temp/rundir/gnupghome/TESTING.subkeys.sec'

Running subcommands from directory:
  '/home/isis/code/torproject/bridgedb/_trial_temp/rundir'
Running '/home/isis/code/torproject/bridgedb/scripts/make-ssl-cert'...
Generating RSA private key, 4096 bit long modulus
.............................++
.................................................................................................................................................................................................................................++
e is 65537 (0x10001)
writing RSA key
Signature ok
subject=/C=DE/ST=Berlin-Brandenburg/O=PWND Consulting
Getting Private key
Done. Your private key was saved in /home/isis/code/torproject/bridgedb/privkey.pem
and your certificate is in /home/isis/code/torproject/bridgedb/cert
Copying certificate:
  '/home/isis/code/torproject/bridgedb/cert'
                ↓ ↓ ↓
  '/home/isis/code/torproject/bridgedb/_trial_temp/rundir/cert'

Copying SSL private key:
  '/home/isis/code/torproject/bridgedb/privkey.pem'
                ↓ ↓ ↓
  '/home/isis/code/torproject/bridgedb/_trial_temp/rundir/privkey.pem'

Running bridgedb script '/home/isis/.virtualenvs/bridgedb/bin/bridgedb'...
Running `bridgedb mock' to generate mock bridge descriptors...
WARNING: Can't import PyNaCl. NTOR key generation is disabled.
Generating 50 bridge descriptors...
..................................................
Writing descriptors to files...Done.
Sucessfully generated 50 descriptors.
`bridgedb mock' exited with status code 0
Running `bridgedb' to test server startups...
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 781, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/threadpool.py", line 191, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Main.py", line 212, in updateBridgeHistory
    bridge, timestamp)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object
`bridgedb' exited with status code 0
                                            [OK]
bridgedb.test.test_crypto
  CryptoTest
    test_getKey_keyexists ...                                              [OK]
    test_getKey_nokey ...                                                  [OK]
    test_getKey_tmpfile ...                                                [OK]
bridgedb.test.test_parse_addr
  ParseAddrIsIPAddressTests
    test_filehandle ...                                                    [OK]
    test_isIPAddress_IP4DefaultRoute ...                                   [OK]
    test_isIPAddress_IP4LimitedBroadcast ...                               [OK]
    test_isIPAddress_IP4LinkLocal ...                                      [OK]
    test_isIPAddress_IP4Localhost ...                                      [OK]
    test_isIPAddress_IP4Loopback ...                                       [OK]
    test_isIPAddress_IP4Multicast_224 ...                                  [OK]
    test_isIPAddress_IP4Multicast_239 ...                                  [OK]
    test_isIPAddress_IP4ReservedRFC1700 ...                                [OK]
    test_isIPAddress_IP4ReservedRFC1918_10 ...                             [OK]
    test_isIPAddress_IP4ReservedRFC1918_172_16 ...                         [OK]
    test_isIPAddress_IP4ReservedRFC1918_192_168 ...                        [OK]
    test_isIPAddress_IP4Unspecified ...                                    [OK]
    test_isIPAddress_IP6DefaultRoute ...                                   [OK]
    test_isIPAddress_IP6LinkLocal ...                                      [OK]
    test_isIPAddress_IP6SiteLocal ...                                      [OK]
    test_isIPAddress_IP6UniqueLocal ...                                    [OK]
    test_isIPAddress_IP6Unspecified ...                                    [OK]
    test_isIPAddress_randomIP4 ...                                         [OK]
    test_isIPAddress_randomIP6 ...                                         [OK]
    test_isIPAddress_withNonIP ...                                         [OK]
    test_returnUncompressedIP ...                                          [OK]
    test_unicode ...                                                       [OK]
  ParseAddrIsIPv4Tests
    test_isIPv4_IP4DefaultRoute ...                                        [OK]
    test_isIPv4_IP4LimitedBroadcast ...                                    [OK]
    test_isIPv4_IP4LinkLocal ...                                           [OK]
    test_isIPv4_IP4Localhost ...                                           [OK]
    test_isIPv4_IP4Loopback ...                                            [OK]
    test_isIPv4_IP4Multicast_224 ...                                       [OK]
    test_isIPv4_IP4Multicast_239 ...                                       [OK]
    test_isIPv4_IP4ReservedRFC1700 ...                                     [OK]
    test_isIPv4_IP4ReservedRFC1918_10 ...                                  [OK]
    test_isIPv4_IP4ReservedRFC1918_172_16 ...                              [OK]
    test_isIPv4_IP4ReservedRFC1918_192_168 ...                             [OK]
    test_isIPv4_IP4Unspecified ...                                         [OK]
    test_isIPv4_IP6DefaultRoute ...                                        [OK]
    test_isIPv4_IP6LinkLocal ...                                           [OK]
    test_isIPv4_IP6Localhost ...                                           [OK]
    test_isIPv4_IP6SiteLocal ...                                           [OK]
    test_isIPv4_IP6UniqueLocal ...                                         [OK]
    test_isIPv4_IP6Unspecified ...                                         [OK]
    test_isIPv4_randomIP4 ...                                              [OK]
    test_isIPv4_randomIP6 ...                                              [OK]
    test_isIPv4_withNonIP ...                                              [OK]
    test_isIPv4_withValidIPv4 ...                                          [OK]
    test_isIPv4_withValidIPv4_2 ...                                        [OK]
    test_isIPv4_withValidIPv4_3 ...                                        [OK]
    test_isIPv4_withValidIPv6 ...                                          [OK]
  ParseAddrIsIPv6Tests
    test_isIPv6_IP4DefaultRoute ...                                        [OK]
    test_isIPv6_IP4LimitedBroadcast ...                                    [OK]
    test_isIPv6_IP4LinkLocal ...                                           [OK]
    test_isIPv6_IP4Localhost ...                                           [OK]
    test_isIPv6_IP4Loopback ...                                            [OK]
    test_isIPv6_IP4Multicast_224 ...                                       [OK]
    test_isIPv6_IP4Multicast_239 ...                                       [OK]
    test_isIPv6_IP4ReservedRFC1700 ...                                     [OK]
    test_isIPv6_IP4ReservedRFC1918_10 ...                                  [OK]
    test_isIPv6_IP4ReservedRFC1918_172_16 ...                              [OK]
    test_isIPv6_IP4ReservedRFC1918_192_168 ...                             [OK]
    test_isIPv6_IP4Unspecified ...                                         [OK]
    test_isIPv6_IP6DefaultRoute ...                                        [OK]
    test_isIPv6_IP6LinkLocal ...                                           [OK]
    test_isIPv6_IP6Localhost ...                                           [OK]
    test_isIPv6_IP6SiteLocal ...                                           [OK]
    test_isIPv6_IP6UniqueLocal ...                                         [OK]
    test_isIPv6_IP6Unspecified ...                                         [OK]
    test_isIPv6_randomIP4 ...                                              [OK]
    test_isIPv6_randomIP6 ...                                              [OK]
    test_isIPv6_withNonIP ...                                              [OK]
    test_isIPv6_withValidIPv4 ...                                          [OK]
    test_isIPv6_withValidIPv4_2 ...                                        [OK]
    test_isIPv6_withValidIPv4_3 ...                                        [OK]
    test_isIPv6_withValidIPv6 ...                                          [OK]
  PortListTest
    test_contains ...                                                      [OK]
    test_getitem_long ...                                                  [OK]
    test_getitem_shouldContain ...                                         [OK]
    test_getitem_shouldNotContain ...                                      [OK]
    test_getitem_string ...                                                [OK]
    test_invalidPortNumber ...                                             [OK]
    test_invalidStringArgs ...                                             [OK]
    test_iter ...                                                          [OK]
    test_mixedArgs ...                                                     [OK]
    test_str ...                                                           [OK]
    test_tooFewPorts ...                                                   [OK]
    test_tooManyPorts ...                                                  [OK]
bridgedb.test.test_parse_networkstatus
  ParseNetworkStatusALineTests
    test_IPv4 ...                                                          [OK]
/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/parse/networkstatus.py:192: FutureWarning: Got IPv4 address in networkstatus 'a'-line! Networkstatus document format may have changed!
    test_IPv4BadPortSeparator ...                                          [OK]
    test_IPv4InvalidAddress ...                                            [OK]
    test_IPv4MissingPort ...                                               [OK]
    test_IPv6 ...                                                          [OK]
    test_IPv6BadPortSeparator ...                                          [OK]
    test_IPv6BadPortlist ...                                               [OK]
    test_missingPrefix ...                                                 [OK]
  ParseNetworkStatusRLineTests
    test_invalidDescriptorDigest ...                                       [OK]
    test_invalidDescriptorDigest_invalidBase64 ...                         [OK]
    test_invalidDescriptorDigest_singleQuoteChar ...                       [OK]
    test_invalidDescriptorDigest_withBase64padding ...                     [OK]
    test_invalidIPAddress ...                                              [OK]
    test_invalidIdentBase64 ...                                            [OK]
    test_invalidIdentSingleQuoteChar ...                                   [OK]
    test_invalidIdent_withBase64padding ...                                [OK]
    test_invalidNicknameNonAlphanumeric ...                                [OK]
    test_invalidNicknameTooLong ...                                        [OK]
    test_invalidTimestamp ...                                              [OK]
    test_invalidTimestampMissingDate ...                                   [OK]
    test_missingAfterDesc ...                                              [OK]
    test_missingAfterIdent ...                                             [OK]
    test_missingPrefix ...                                                 [OK]
    test_valid ...                                                         [OK]
    test_wrongFieldOrder ...                                               [OK]
    test_wrongNumberOfFields ...                                           [OK]
  ParseNetworkStatusSLineTests
    test_makeBelieveFlag ...                                               [OK]
    test_missingPrefix ...                                                 [OK]
    test_noFlags ...                                                       [OK]
bridgedb.test.test_persistent
  StateTest
    test_STATEFILE ...                                                     [OK]
    test_before_useChangedSettings_config ...                              [OK]
    test_before_useChangedSettings_state ...                               [OK]
    test_before_useChangedSettings_stateConfig ...                         [OK]
    test_configCreation ...                                                [OK]
    test_docstring_persistent ...                                          [OK]
    test_docstring_persistentState ...                                     [OK]
    test_existsLoad ...                                                    [OK]
    test_existsSave ...                                                    [OK]
    test_getStateFor ...                                                   [OK]
    test_optionsCreation ...                                               [OK]
    test_persistent_getState ...                                           [OK]
    test_persistent_state ...                                              [OK]
    test_stateCreation ...                                                 [OK]
    test_state_init ...                                                    [OK]
    test_useChangedSettings ...                                            [OK]
bridgedb.test.test_persistentSaveAndLoad
  StateSaveAndLoadTests
    test_load ...                                                          [OK]
    test_save ...                                                          [OK]
    test_stateLoadTempfile ...                                             [OK]
    test_stateSaveAndLoad ...                                              [OK]
    test_stateSaveTempfile ...                                             [OK]

===============================================================================
[SKIPPED]
See #5463 for why this test fails when it should pass

bridgedb.test.test_EmailServer.EmailGnuPGTest.test_getGPGContext_good_keyfile
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 655, in testAddOrUpdateSingleBridgeHistory
    bhe = bridgedb.Stability.addOrUpdateBridgeHistory(b, timestamp)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object

bridgedb.Tests.BridgeStabilityTests.testAddOrUpdateSingleBridgeHistory
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 664, in testDeletingSingleBridgeHistory
    bhe = bridgedb.Stability.addOrUpdateBridgeHistory(b, timestamp)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object

bridgedb.Tests.BridgeStabilityTests.testDeletingSingleBridgeHistory
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 753, in testDiscountAndPruneBridgeHistory
    bridgedb.Stability.addOrUpdateBridgeHistory(b, i)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object

bridgedb.Tests.BridgeStabilityTests.testDiscountAndPruneBridgeHistory
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 721, in testFamiliar
    [ bridgedb.Stability.addOrUpdateBridgeHistory(b, i) for i in time_series ]
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object

bridgedb.Tests.BridgeStabilityTests.testFamiliar
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 698, in testLastSeenWithDifferentAddressAndPort
    [ bridgedb.Stability.addOrUpdateBridgeHistory(b, t) for t in ts ]
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object

bridgedb.Tests.BridgeStabilityTests.testLastSeenWithDifferentAddressAndPort
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 682, in testTOSA
    bridgedb.Stability.addOrUpdateBridgeHistory(b,t)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Stability.py", line 137, in addOrUpdateBridgeHistory
    if not db: db = bridgedb.Storage.getDB(True)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 512, in getDB
    return Database(db_filename) if newHandle else _THE_DB
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 209, in __init__
    self._conn = openDatabase(sqlite_fname)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Storage.py", line 428, in openDatabase
    conn = sqlite3.Connection(sqlite_file)
exceptions.ValueError: database parameter must be string or APSW Connection object

bridgedb.Tests.BridgeStabilityTests.testTOSA
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 336, in testDistWithFilterAll
    d.insert(fakeBridge6(or_addresses=True))
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 136, in fakeBridge6
    address,portlist = bridgedb.Bridges.parseORAddressLine(
exceptions.AttributeError: 'module' object has no attribute 'parseORAddressLine'

bridgedb.Tests.IPBridgeDistTests.testDistWithFilterAll
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 347, in testDistWithFilterBlockedCountries
    d.insert(fakeBridge6(or_addresses=True))
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 136, in fakeBridge6
    address,portlist = bridgedb.Bridges.parseORAddressLine(
exceptions.AttributeError: 'module' object has no attribute 'parseORAddressLine'

bridgedb.Tests.IPBridgeDistTests.testDistWithFilterBlockedCountries
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 373, in testDistWithFilterBlockedCountriesAdvanced
    d.insert(fakeBridge6(or_addresses=True, transports=True))
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 136, in fakeBridge6
    address,portlist = bridgedb.Bridges.parseORAddressLine(
exceptions.AttributeError: 'module' object has no attribute 'parseORAddressLine'

bridgedb.Tests.IPBridgeDistTests.testDistWithFilterBlockedCountriesAdvanced
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 312, in testDistWithFilterBoth
    d.insert(fakeBridge6(or_addresses=True))
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 136, in fakeBridge6
    address,portlist = bridgedb.Bridges.parseORAddressLine(
exceptions.AttributeError: 'module' object has no attribute 'parseORAddressLine'

bridgedb.Tests.IPBridgeDistTests.testDistWithFilterBoth
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 296, in testDistWithFilterIP4
    d.insert(fakeBridge6(or_addresses=True))
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 136, in fakeBridge6
    address,portlist = bridgedb.Bridges.parseORAddressLine(
exceptions.AttributeError: 'module' object has no attribute 'parseORAddressLine'

bridgedb.Tests.IPBridgeDistTests.testDistWithFilterIP4
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 280, in testDistWithFilterIP6
    d.insert(fakeBridge6(or_addresses=True))
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/Tests.py", line 136, in fakeBridge6
    address,portlist = bridgedb.Bridges.parseORAddressLine(
exceptions.AttributeError: 'module' object has no attribute 'parseORAddressLine'

bridgedb.Tests.IPBridgeDistTests.testDistWithFilterIP6
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.0.1_305_gc590861_dirty-py2.7.egg/bridgedb/test/test_Tests.py", line 47, in test_allOldUnittests
    testSuite.run(testresult, debug=True)
exceptions.NameError: global name 'testresult' is not defined

bridgedb.test.test_Tests.TrialAdaptedOldUnittests.test_allOldUnittests
-------------------------------------------------------------------------------
Ran 167 tests in 125.998s

FAILED (skips=1, errors=13, successes=153)
(bridgedb)∃!isisⒶwintermute:(testing/sysrqb/bug5232_r2_based_on_fix/9462_r9462C_r2 *$=)~/code/torproject/bridgedb ∴

comment:11 Changed 6 years ago by sysrqb

I finally found some time to rebase my branch onto master. Travis seems to like it also [0]. bug5232_rebased_1 in my public repo. It's not great, but it's a slightly-hacky improvement to what we have now. The short description of this patch is that we allow one database connection per thread. Whenever we open a connection, it is added to a dict that is indexed by TIDs. Storage.getDB() provides an interface for opening a new connection or retrieving an existing connection.

[0] https://travis-ci.org/sysrqbci/bridgedb/builds/17997258

comment:12 Changed 6 years ago by sysrqb

Cc: isis added
Owner: changed from isis to sysrqb
Status: needs_reviewassigned

comment:13 Changed 6 years ago by sysrqb

Status: assignedneeds_review

(and back again)

comment:14 Changed 6 years ago by sysrqb

I think I'm actually happier with bug5232_newdesign_dev. It's more more conservative than bug5232_rebased_1, but I think it accomplishes exactly what bridgedb needs. The main difference between the two branches is that rebased_1 is based around the problem that a database connection can not be used in multiple threads. newdesign_dev relies on twisted's threading implementation and tries to make threading in bridgedb as simple as possible while handling SIGHUP.

[0] https://travis-ci.org/sysrqbci/bridgedb/builds/18390186

comment:15 Changed 6 years ago by isis

Cc: sysrqb added
Keywords: bridgedb-email bridgedb-db bridgedb-https bridgedb-0.1.x added; important removed
Status: needs_reviewneeds_revision

Sweet. I had to deal with a bit of merge conflicts to get it into master... do you mind if I separate the additions to the unittest in 2a21dfcb55e659775fcde9dd4f668b98f41d0fd6 into another unittest? If I do it, then you won't have to deal with the merge conflicts too.

So, this seems to work great, the parsing is done in a separate thread! However, the call which takes longer, especially at start up time, is the call to bridgedb.Stability.addOrUpdateBridgeHistory(). However, after start up, the HTTPS distributor continues to function and hand out bridges while the new descriptors are being parsed.

For 10,000 bridge descriptors, with addOrUpdateBridges():

 * Starting the servers took:      1h 6m 58s
 * Restarting (SIGHUP) took:          2m 13s
 * Dumping buckets (SIGUSR1) took:       11s

Same test, only 1,000 descriptors:

 * Starting the servers took:         3m 02s
 * Restarting (SIGHUP) took:              3s
 * Dumping buckets (SIGUSR1) took:        1s

However, this breaks the email distributor. It doesn't work (at any time) because it's still trying to use an old cursor. This error is unhandled, and it gets sent to stdout rather than the logfile:

Unhandled Error
Traceback (most recent call last):
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/log.py", line 88, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/log.py", line 73, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 601, in _doReadOrWrite
    why = selectable.doRead()
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/internet/tcp.py", line 215, in doRead
    return self._dataReceived(data)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/internet/tcp.py", line 221, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/protocols/basic.py", line 454, in dataReceived
    self.lineReceived(line)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/mail/smtp.py", line 568, in lineReceived
    return getattr(self, 'state_' + self.mode)(line)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/twisted/mail/smtp.py", line 795, in dataLineReceived
    m.eomReceived() for m in self.__messages
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.1.4_34_gc253a64-py2.7.egg/bridgedb/EmailServer.py", line 336, in eomReceived
    replyToMail(self.lines, self.ctx)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.1.4_34_gc253a64-py2.7.egg/bridgedb/EmailServer.py", line 243, in replyToMail
    sendToUser, response = getMailResponse(lines, ctx)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.1.4_34_gc253a64-py2.7.egg/bridgedb/EmailServer.py", line 175, in getMailResponse
    bridgeFilterRules=bridgeFilterRules)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.1.4_34_gc253a64-py2.7.egg/bridgedb/Dist.py", line 471, in getBridgesForEmail
    wasWarned = db.getWarnedEmail(emailaddress)
  File "/home/isis/.virtualenvs/bridgedb/local/lib/python2.7/site-packages/bridgedb-0.1.4_34_gc253a64-py2.7.egg/bridgedb/Storage.py", line 365, in getWarnedEmail
    " email = ?", (addr,))
sqlite3.ProgrammingError: Cannot operate on a closed cursor.

Changed 6 years ago by isis

Attachment: telnet-an-email added

Adding my embarrassing pexpect + raw SMTP/telnet script that I use for testing the email server. (I didn't want to commit it to the repo.) To use it, do: ./telnet-an-email $USER

comment:16 in reply to:  15 Changed 5 years ago by sysrqb

Replying to isis:

Sweet. I had to deal with a bit of merge conflicts to get it into master... do you mind if I separate the additions to the unittest in 2a21dfcb55e659775fcde9dd4f668b98f41d0fd6 into another unittest? If I do it, then you won't have to deal with the merge conflicts too.

Nope, that's fine by me. I have some more, but I've made them shorter this time.

So, this seems to work great, the parsing is done in a separate thread! However, the call which takes longer, especially at start up time, is the call to bridgedb.Stability.addOrUpdateBridgeHistory(). However, after start up, the HTTPS distributor continues to function and hand out bridges while the new descriptors are being parsed.

Indeed. This was a tradeoff between complexity and availability. In theory this branch should significantly increase the latter with a small amount of the former.

For 10,000 bridge descriptors, with addOrUpdateBridges():

 * Starting the servers took:      1h 6m 58s
 * Restarting (SIGHUP) took:          2m 13s
 * Dumping buckets (SIGUSR1) took:       11s

10,000 bridges, time taken until normal operation resumed
(time in parenthesis describe additional time taken for stability calculations)

  * Starting the server took:      32s (3s + 11s (for cleanup))
  * Restarting (SIGHUP) took:      43s (4s + 10s (for cleanup))

I also checked the availability of the email and http distributors during reload and found that they do become unavailable for a few seconds (which is much better than the current situation, but still not good). There's only one blocking operation during reload which is when we overwrite the current data structures (with the new ones we just created in a background thread) in the main thread, so this seems like the obvious place for the bottleneck.

More testing and a comparison of startup timing compared to master will follow.

comment:17 Changed 5 years ago by sysrqb

Status: needs_revisionneeds_review

Timings from another run:

1,000 bridges:

  * Starting the servers took:      6s
  * Restarting (SIGHUP) took:       7s

10,000 bridges:

  * Starting the servers took:      1m 17s
  * Restarting (SIGHUP) took:          54s 

Master, 10,000

  * Starting the server took:      56m 44s

These numbers seem...inconsistent(?).

Pushed bug5232_adding_concurrent_processing_squashed. Please review

https://travis-ci.org/sysrqbci/bridgedb/builds/21812112

comment:18 in reply to:  17 ; Changed 5 years ago by isis

Keywords: bridgedb-0.1.7 added; bridgedb-0.1.x removed
Resolution: fixed
Status: needs_reviewclosed

Replying to sysrqb:

Timings from another run:

1,000 bridges:

  * Starting the servers took:      6s
  * Restarting (SIGHUP) took:       7s

10,000 bridges:

  * Starting the servers took:      1m 17s
  * Restarting (SIGHUP) took:          54s 

Master, 10,000

  * Starting the server took:      56m 44s

These numbers seem...inconsistent(?).


They do... that 56m 44s is from a second run? I didn't timeit, but I got something like ~15s startup on your bug5232_adding_concurrent_processing_squashed branch just now, for 250 descriptors.

Pushed bug5232_adding_concurrent_processing_squashed. Please review

https://travis-ci.org/sysrqbci/bridgedb/builds/21812112


Here goes!

1) For 43c6f8778b244ca4b19628677a293a83ae1d1e09

Reparse descriptors in background thread on signal:


In Main.updateBridgeHistory():

+                for ID in bridges:
+                    bridge = bridges[ID]

I'm pretty sure we had this code like this previously, but I don't know why we're not using:

+                for bridge in bridges.values():

instead. But this is shouldn't make any functional difference, it was probably done like this way back when, probably to solve some weird Python2.5 bug. Not really a problem, just nitpickings and archaeological ponderings.


In the same function:

+            with bridgedb.Storage.getDB() as db:
[...]
+                        for timestamp in ts:
+                            logging.debug(
+                                "Updating BridgeHistory timestamps for %s: %s"
+                                % (bridge.fingerprint, timestamp))
+                            reactor.callFromThread(
+                                bridgedb.Stability.addOrUpdateBridgeHistory,
+                                bridge, timestamp)
+            logging.debug("Stability calculations complete")
+
+    reactor.callInThread(updateBridgeHistory, bridges, timestamps)

Calling reactor.callFromThread will run the call inside the main reactor loop... unless the with bridgedb.Storage.getDB() as db is doing some context magic to put it back into a separate thread again. Unless I'm misunderstanding something, which I probably am. Or, does it call it in the parent thread, i.e. the reactor.callInThread(_reloadFn) thread? Because the t.i.interfaces.IReactorThreads docs make it seem like reactor.callFromThread() will always call the main reactor thread.


Wow. This distributor swapping. Is just wow. OMGWTFBBQ, Python.

2) For e2275120ea8d95e0b2c9f25f1f42a293957363dc

Add unit tests for bridgedb.Storage:


I like this function:

+    def _runAndDie(self, timeout, func):
+        with func():
+            sleep(timeout)


Just thought I'd say that somewhere. :)


Thanks for adding unittests!

3) The context wrapping for the with bridgedb.Storage.getDB() as db idiom is some serious hacking. I'm impressed. It works. :D

4) In b67862d317a282d81e2f53726e4dbff8bb8cb1f1

More unit tests for Email Server:

Why from StringIO import StringIO? Isn't that the old, deprecated one?

5) In 91dc3a6e079b318a3af1bbd4bb63cf249e2bbdb6

Rewrite database lock acquisition function:
A typo got accidentally added:

-    Always return a :class:`bridgedb.Storage.Database` that is
+    lways return a :class:`bridgedb.Storage.Database` that is

6) In 3c9e04ac9006ff21f30aeb033f6e2560f09541f6

Make sure leekspin created our descriptors:
Sorry for the /usr/share/dict problem in leekspin, it's a good idea to add these checks.

7) Super! CHANGELOG entries! Awesome! :D

Overall, the context manager wrapping for the semaphores on db accesses is a bit of madness, but doing threading in Twisted Python is a bit of madness, and I'm amazed this is working. Threading in Twisted is seriously no small feat. You did a really brilliant job on this!

Unless someone has something more to add, I'm merging this now for 0.1.7. :D

comment:19 in reply to:  18 ; Changed 5 years ago by sysrqb

Replying to isis:

Replying to sysrqb:

Pushed bug5232_adding_concurrent_processing_squashed. Please review

https://travis-ci.org/sysrqbci/bridgedb/builds/21812112


Here goes!


I decided to rewrite history, again because they were relatively minor changes.

1) For 43c6f8778b244ca4b19628677a293a83ae1d1e09

Reparse descriptors in background thread on signal:
In Main.updateBridgeHistory():

+                for ID in bridges:
+                    bridge = bridges[ID]

I'm pretty sure we had this code like this previously, but I don't know why we're not using:

+                for bridge in bridges.values():

instead. But this is shouldn't make any functional difference, it was probably done like this way back when, probably to solve some weird Python2.5 bug. Not really a problem, just nitpickings and archaeological ponderings.


Fixed.


In the same function:

+            with bridgedb.Storage.getDB() as db:
[...]
+                        for timestamp in ts:
+                            logging.debug(
+                                "Updating BridgeHistory timestamps for %s: %s"
+                                % (bridge.fingerprint, timestamp))
+                            reactor.callFromThread(
+                                bridgedb.Stability.addOrUpdateBridgeHistory,
+                                bridge, timestamp)
+            logging.debug("Stability calculations complete")
+
+    reactor.callInThread(updateBridgeHistory, bridges, timestamps)

Calling reactor.callFromThread will run the call inside the main reactor loop... unless the with bridgedb.Storage.getDB() as db is doing some context magic to put it back into a separate thread again. Unless I'm misunderstanding something, which I probably am. Or, does it call it in the parent thread, i.e. the reactor.callInThread(_reloadFn) thread? Because the t.i.interfaces.IReactorThreads docs make it seem like reactor.callFromThread() will always call the main reactor thread.


Nope, you're completey correct. I *think* that was a typo and I wanted to use callInThread(), but I don't actually know why I wanted to do that either, it's already in a different thread than reactor loop. I can't justify spawning a couple thousand threads for this so I deleted that invocation and now it directly calls addOrUpdateBridgeHistory().


Wow. This distributor swapping. Is just wow. OMGWTFBBQ, Python.

3) The context wrapping for the with bridgedb.Storage.getDB() as db idiom is some serious hacking. I'm impressed. It works. :D


Yeah. I know, me too!

4) In b67862d317a282d81e2f53726e4dbff8bb8cb1f1

More unit tests for Email Server:

Why from StringIO import StringIO? Isn't that the old, deprecated one?

100% correct. That was my mistake. But then fixing that caused some breakage because we used StringIO.StringIO in EmailServer.py. When I switched that to io.StringIO it broke because io.StringIO only accepts unicode and we used MimeWriter.MimeWriter which used string.

Traceback (most recent call last):
  File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-packages/bridgedb-0.1.5_78_gca67179_dirty-py2.7.egg/bridgedb/test/test_EmailServer.py", line 215, in test_getMailResponseMailContent
    ret = EmailServer.getMailResponse(lines, self.ctx)
  File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-packages/bridgedb-0.1.5_78_gca67179_dirty-py2.7.egg/bridgedb/EmailServer.py", line 213, in getMailResponse
    gpgContext=ctx.gpgContext)
  File "/home/sysrqb/.virtualenvs/virt-bridgedb/lib/python2.7/site-packages/bridgedb-0.1.5_78_gca67179_dirty-py2.7.egg/bridgedb/EmailServer.py", line 443, in composeEmail
    mailbody = w.startbody("text/plain")
  File "/usr/lib64/python2.7/MimeWriter.py", line 141, in startbody
    self.flushheaders()
  File "/usr/lib64/python2.7/MimeWriter.py", line 125, in flushheaders
    self._fp.writelines(self._headers)
exceptions.TypeError: unicode argument expected, got 'str'

As a result, #11370. Which should be resolved now.

5) In 91dc3a6e079b318a3af1bbd4bb63cf249e2bbdb6

Rewrite database lock acquisition function:
A typo got accidentally added:

-    Always return a :class:`bridgedb.Storage.Database` that is
+    lways return a :class:`bridgedb.Storage.Database` that is


Nice catch

6) In 3c9e04ac9006ff21f30aeb033f6e2560f09541f6

Make sure leekspin created our descriptors:
Sorry for the /usr/share/dict problem in leekspin, it's a good idea to add these checks.


I debated deleting this commit but I decided to leave it because it caused no harm to check for the files.

7) Super! CHANGELOG entries! Awesome! :D


And now there's another one!

Overall, the context manager wrapping for the semaphores on db accesses is a bit of madness, but doing threading in Twisted Python is a bit of madness, and I'm amazed this is working. Threading in Twisted is seriously no small feat. You did a really brilliant job on this!


This comment made it all worth it. Thanks! :)

Unless someone has something more to add, I'm merging this now for 0.1.7. :D

yay! Ok, so after the changes,

5092364f7e3ce42b61b0336a95519639d0e59308 needs review for #11370
bd05227c5eefc528b72d57d53ac3f139fc2f0d5e (was 91dc3a6e) contains the re-added letter (5)
0556e7daa8504756b34384a0f9b5dfa753556c09 (was b67862d3) now uses io.StringIO (4)
bd652aa560cfaf15b9b37979aba73e9f8f548b4e (was 43c6f877) fixes callFromThread() and iterator (1)

And I added 8dfcd5d7f77950828d3c95f93417baf7ea70c6b7 for a doc and to fix long lines.

Pushed bug5232_adding_concurrent_processing_squashed_r1 for rereview!

comment:20 Changed 5 years ago by sysrqb

Resolution: fixed
Status: closedreopened

one last review? :)

comment:21 Changed 5 years ago by sysrqb

Status: reopenedneeds_review

comment:22 in reply to:  19 Changed 5 years ago by isis

Resolution: fixed
Status: needs_reviewclosed

Replying to sysrqb:

Replying to isis:

Replying to sysrqb:

1) For 43c6f8778b244ca4b19628677a293a83ae1d1e09

[...]

In the same function:

+            with bridgedb.Storage.getDB() as db:
[...]
+                        for timestamp in ts:
+                            logging.debug(
+                                "Updating BridgeHistory timestamps for %s: %s"
+                                % (bridge.fingerprint, timestamp))
+                            reactor.callFromThread(
+                                bridgedb.Stability.addOrUpdateBridgeHistory,
+                                bridge, timestamp)
+            logging.debug("Stability calculations complete")
+
+    reactor.callInThread(updateBridgeHistory, bridges, timestamps)

Calling reactor.callFromThread will run the call inside the main reactor loop... unless the with bridgedb.Storage.getDB() as db is doing some context magic to put it back into a separate thread again. Unless I'm misunderstanding something, which I probably am. Or, does it call it in the parent thread, i.e. the reactor.callInThread(_reloadFn) thread? Because the t.i.interfaces.IReactorThreads docs make it seem like reactor.callFromThread() will always call the main reactor thread.


Nope, you're completey correct. I *think* that was a typo and I wanted to use callInThread(), but I don't actually know why I wanted to do that either, it's already in a different thread than reactor loop. I can't justify spawning a couple thousand threads for this so I deleted that invocation and now it directly calls addOrUpdateBridgeHistory().


Okay, my server startup and restart timings came out to the same as with the callFromThread()s in there, this makes more sense now. It seems to be roughly as fast, and adding thousands of threads sounds a bit frightening with how little memory ponticum has.

4) In b67862d317a282d81e2f53726e4dbff8bb8cb1f1

More unit tests for Email Server:

Why from StringIO import StringIO? Isn't that the old, deprecated one?

100% correct. That was my mistake. But then fixing that caused some breakage because we used StringIO.StringIO in EmailServer.py. When I switched that to io.StringIO it broke because io.StringIO only accepts unicode and we used MimeWriter.MimeWriter which used string.
As a result, #11370. Which should be resolved now.


Cool, I've reviewed #11370 and am merging it too (they are both part of the same branch of yours anyway).

6) In 3c9e04ac9006ff21f30aeb033f6e2560f09541f6

Make sure leekspin created our descriptors:
Sorry for the /usr/share/dict problem in leekspin, it's a good idea to add these checks.


I debated deleting this commit but I decided to leave it because it caused no harm to check for the files.


Yeah, at some point it took me almost an hour to figure out where the TravisCI boxes were putting the files. Definitely doesn't hurt to have some extra checks.

7) Super! CHANGELOG entries! Awesome! :D


And now there's another one!


Thanks!

Overall, the context manager wrapping for the semaphores on db accesses is a bit of madness, but doing threading in Twisted Python is a bit of madness, and I'm amazed this is working. Threading in Twisted is seriously no small feat. You did a really brilliant job on this!


This comment made it all worth it. Thanks! :)


Still seriously impressed.

If anyone ever doubts your Python skills, tell them to call me. :)

5092364f7e3ce42b61b0336a95519639d0e59308 needs review for #11370


Review is on the #11370 ticket.

bd05227c5eefc528b72d57d53ac3f139fc2f0d5e (was 91dc3a6e) contains the re-added letter (5)
0556e7daa8504756b34384a0f9b5dfa753556c09 (was b67862d3) now uses io.StringIO (4)
bd652aa560cfaf15b9b37979aba73e9f8f548b4e (was 43c6f877) fixes callFromThread() and iterator (1)


All good.

And I added 8dfcd5d7f77950828d3c95f93417baf7ea70c6b7 for a doc and to fix long lines.


Good too.

Pushed bug5232_adding_concurrent_processing_squashed_r1 for rereview!


Merged! Thanks sysrqb! You are awesome!

Note: See TracTickets for help on using tickets.