Opened 15 months ago

Last modified 15 months ago

#23251 new defect

Parsing a networkstatus-bridges with flags only causes BridgeDB to hang

Reported by: isis Owned by: isis
Priority: Medium Milestone:
Component: Obfuscation/BridgeDB Version:
Severity: Major Keywords: bridgedb-parsers
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor: SponsorM-can

Description

The following file:

bridgedb@polyanthum /srv/bridges.torproject.org/from-bifroest
 % cat networkstatus-bridges                                                                                    
published 2017-08-15 18:58:10                                                                                   
flag-thresholds stable-uptime=0 stable-mtbf=0 fast-speed=0 guard-wfu=0.000% guard-tk=0 guard-bw-inc-exits=0 guard-bw-exc-exits=0 enough-mtbf=1 ignoring-advertised-bws=0                                               

causes the bridgedb process to hang. The last log lines output are:

bridgedb@polyanthum ~ % tail -f /srv/bridges.torproject.org/log/bridgedb.log
19:46:33 INFO     L465:Bridges.insert()         BridgeSplitter placing bridge $$C44836BF2F42DB5B1AD3CF6085626056593D846A~Shizuokalibelous into hashring https (via n=5, pos=0).
19:46:33 DEBUG    L547:Bridges.insert()         Inserting $$C44836BF2F42DB5B1AD3CF6085626056593D846A~Shizuokalibelous into hashring...
19:46:33 DEBUG     L78:geo.getCountryCode()     Looked up country code: NL
19:46:33 INFO     L465:Bridges.insert()         BridgeSplitter placing bridge $$2CC6A05D7F52D7B936ABEE13C782780E4B23B64F~conglutinatoreri into hashring email (via n=14, pos=1).
19:46:33 DEBUG    L547:Bridges.insert()         Inserting $$2CC6A05D7F52D7B936ABEE13C782780E4B23B64F~conglutinatoreri into hashring...
19:46:33 INFO     L174:Main.load()              Done inserting 1959 bridges into hashring.
19:46:33 DEBUG    L208:persistent.save()        Saving state to:        '/srv/bridges.torproject.org/run/bridgedb.state'
19:46:33 INFO      L80:Main.load()              Processing descriptors in ../from-bifroest directory...
19:46:33 INFO      L86:Main.load()              Opening networkstatus file: /srv/bridges.torproject.org/from-bifroest/networkstatus-bridges
19:46:33 INFO     L124:descriptors.parseNetwo() Parsing networkstatus file: /srv/bridges.torproject.org/from-bifroest/networkstatus-bridges

Further, and this might be a separate issue, but when BridgeDB hangs in this state, the cronjob which calls bridgedb --reload launches an entirely new process of bridgedb, without killing the old one, since the old one is locked while doing blocking IO, and the signal handlers are in the async code that it's supposed to come back to. I think there's not really any way to fix this, since Stem is doing the IO there, and Stem isn't async aware/capable.

Child Tickets

Change History (3)

comment:1 Changed 15 months ago by atagar

Hi Isis. Added a quick test to see if that line's problematic for Stem. Seems to be just fine...

https://gitweb.torproject.org/stem.git/commit/?id=30990ab

As for Stem not being 'async aware' I'm not sure what you mean or why that's relevant. You're just using Stem as a descriptor parser. That has nothing to do with its controller functionality.

comment:2 in reply to:  1 Changed 15 months ago by isis

Replying to atagar:

Hi Isis. Added a quick test to see if that line's problematic for Stem. Seems to be just fine...

https://gitweb.torproject.org/stem.git/commit/?id=30990ab


Right, that should be fine for Stem. Also, BridgeDB's parsers actually skip the flags/headers entirely and tell Stem to start reading after them, so it wouldn't be a Stem issue anyway. :)

As for Stem not being 'async aware' I'm not sure what you mean or why that's relevant. You're just using Stem as a descriptor parser. That has nothing to do with its controller functionality.


Oh, I mean that BridgeDB is Twisted python, so its main loop is asynchronous. But while BridgeDB is doing its bridgedb.parse.descriptors.* functions (which call Stem) the handling of the file isn't done in the way that Twisted is happy with because it's blocking (see the FileWriter(proto.Protocol) class in the bridgedb.git/scripts/get-tor-exits script for an example of how Twisted wants IO to be done). So because this IO is all blocking, and because of the hang bug, it never returns to BridgeDB's main loop, which is where the handlers for the SIGUSR1 and SIGHUPs signals are, which are required to restart BridgeDB when it receives new descriptors from the BridgeAuth.

To be clear, none of this is a bug in Stem. It's all BridgeDB.

Last edited 15 months ago by isis (previous) (diff)

comment:3 Changed 15 months ago by atagar

Ahhh, got it. Thanks Isis. :P

If ya need anything from me just let me know.

Note: See TracTickets for help on using tickets.