Opened 6 years ago

Closed 6 years ago

#10796 closed defect (fixed)

Bridgedb became unresponsive

Reported by: sysrqb Owned by: isis
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords:
Cc: isis Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Bridgedb stopped responding to requests. The last entries in the logs were from a web request but there's no indication of an error or exception.

Feb 03 01:50:39 [DEBUG] Accept-Language: ['en-US']
Feb 03 01:50:39 [DEBUG] Languages: []
Feb 03 09:23:11 [INFO] Logger Started.

SIGHUP had no affect, too.

Child Tickets

Change History (2)

comment:1 Changed 6 years ago by isis

Owner: set to isis
Status: newaccepted

The old BridgeDB was started via the run-bridgedb script, which did:

PYTHONPATH=$PYTHONPATH python -m TorBridgeDB \
  -c bridgedb.conf < /dev/null > /dev/null 2>&1 & disown

Which meant that unhandled exceptions in the templating system were displayed to the client (#6127) and that any other unhandled exceptions in the main reactor loop were piped to /dev/null.

I think I've handled catching any templating error in my branches fix/6127-web-server-tracebacks, fix/6127-render_GET-traceback, and fix/6127-simple-error-page. Though there may still be more exceptions that I couldn't foresee.

We need better exception handling in general in BridgeDB; I've generally been working on it at the same time as other tickets, adding catches and actions for previously unhandled errors when I find them. We could probably modify the above shell script to pipe unhandled errors to another file, but I'd rather fix it the real way and also have better tests. Maybe a fuzzer would be nice to try to dig these unhandled exceptions out.

As far as BridgeDB being unresponsive, I don't know. I looked at the machine and did some brief poking around to try to assess if something had caused this. Nothing I could find. The error was probably piped to /dev/null, like all the other ones. :(

To fix it, I redeployed BridgeDB-0.1.0, but after 55 minutes of BridgeDB running through the addOrUpdateBridgeHistory() code in bridgedb.Storage I realised that if BridgeDB took 55 minutes to process descriptors after every SIGHUP, it would never come online (see #5232). Then I decided to quickly tag and deploy BridgeDB-0.1.1, without staging it first, because it contains the patches for #10724.

comment:2 Changed 6 years ago by isis

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.