BridgeDB currently hands out plenty of bridges (in all flavours) that are offline. We need to understand why this is the case, and stop it from doing that.
For example, I just got the obfs4 bridge 4C480695650EDB6BAB006DB9FD81F6173122E973 over HTTPS. Nothing responds on its obfs4 port and Metrics says that it's currently offline -- or used to be, a few hours ago, to be precise. The bridge's IP address is part of Serge's most recent networkstatus-bridges file, but the bridge does not have the Running flag and should not have been given out. Also, the bridge's fingerprint isn't part of BridgeDB's latest assignments.log file. According to all of this, I should not have been given that bridge.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
In d15fe16c and 978f9be8 we improved log messages to get a better understanding of what's going on. The latest run produced these log messages:
Trying to insert 1291 bridges into hashring, 1062 of which have the 'Running' flag...
Tried to insert 1280 bridges into hashring. Resulting hashring is of length 1061.
We discussed this on IRC and figured that the ~330 snap bridges may be the culprit to some extent. There's quite a bit of churn among them, so Serge may deem a snap bridge running at hour t and once a user tries to use it at hour t+1 it may already be offline.
Roger got a non-snap obfs4 bridge from BridgeDB that was also offline. Its vanilla port worked (and hence it had the 'Running' flag and was distributed by BridgeDB) but its obfs4 port would just reset connections. It may be that the problem of "BridgeDB hands out offline bridges" is really just a lot of smaller problems that come together.
What is the plan here? Any updates? That's still an often reported issue on our blog.
We need to understand if this affects all bridge types, or if it is limited to obfs4.
In parallel, we should test if the TCP port of all of our obfs4 bridges is reachable. For those that aren't, we should contact the operator, or, as a last resort, remove them from BridgeDB.
Make it easier for bridge operators to test if their obfs4 port is reachable. #30472 (moved) will help with this.
I'll try to make progress with this in the coming days.
In parallel, we should test if the TCP port of all of our obfs4 bridges is reachable. For those that aren't, we should contact the operator, or, as a last resort, remove them from BridgeDB.
I built a tool that takes Serge's bridge files as input and scans the TCP port of obfs4 bridges: https://github.com/NullHypothesis/bridgeauth-obfs4-scanner
I believe one problem is that Serge's cached-extrainfo and cached-extrainfo.new do not contain all bridges that are in networkstatus-bridges, so the results only represent a lower bound of unreachable obfs4 bridges.
Here's the output for a Serge dump from 2019-05-31 00:34:50:
[+] 1,304 bridges in network status; 1,024 (78.5%) have 'Running' flag. [+] 581 (56.7%) of 1,024 bridges with 'Running' flag support obfs4. [+] 75 (12.9%) of 581 running obfs4 bridges fail to establish TCP connection. [+] 47 (62.7%) of 75 unreachable obfs4 bridges have contact info.
I will send an email to the operators of these bridges and periodically re-run the script to catch new obfs4 bridges that are unreachable.
I sent an email to approximately 40 bridge operators whose obfs4 port is not reachable. About a dozen replied and took care of the issue. I will send another round of emails in a week or so. If we still don't hear back, we may have to add these bridges to BridgeDB's blacklisted-bridges file, along with the other ~50 unreachable obfs4 bridges that don't have contact information.
I believe one problem is that Serge's cached-extrainfo and cached-extrainfo.new do not contain all bridges that are in networkstatus-bridges, so the results only represent a lower bound of unreachable obfs4 bridges.
In tor, extrainfo descriptors are only created when statistics are on.
But we could change that so we create extrainfo descriptors that just contain the PT lines, even when statistics are off.
That would be a relatively easy fix to tor.
Would you like us to open a ticket for it?
I believe one problem is that Serge's cached-extrainfo and cached-extrainfo.new do not contain all bridges that are in networkstatus-bridges, so the results only represent a lower bound of unreachable obfs4 bridges.
In tor, extrainfo descriptors are only created when statistics are on.
But we could change that so we create extrainfo descriptors that just contain the PT lines, even when statistics are off.
Does this mean that when a, say, obfs4 bridge turns off its statistics, we wouldn't know that it runs obfs4 because we never received the transport line in its extrainfo document? If so, this seems worth fixing.
Also, what config option controls these statistics?
I believe one problem is that Serge's cached-extrainfo and cached-extrainfo.new do not contain all bridges that are in networkstatus-bridges, so the results only represent a lower bound of unreachable obfs4 bridges.
In tor, extrainfo descriptors are only created when statistics are on.
But we could change that so we create extrainfo descriptors that just contain the PT lines, even when statistics are off.
Does this mean that when a, say, obfs4 bridge turns off its statistics, we wouldn't know that it runs obfs4 because we never received the transport line in its extrainfo document?
Yes, we made this change in #29018 (moved) in 0.4.1.1-alpha, so it's quite a recent change.
In 0.4.0 and earlier, ServerTransportPlugin lines and bridge statistics were unconditionally published in extrainfo documents.
If so, this seems worth fixing.
Also, what config option controls these statistics?
ExtraInfoStatistics. Some statistics also have their own options.
I blacklisted 53 bridges whose obfs4 port was unreachable. We should also try to reject them via the bridge authority because it causes scary log messages to appear in the bridges' log file. Hopefully some operators will then get back to us.
Blacklisting these bridges won't make things worse because after implementing #28655 (moved) we don't hand out these bridges' vanilla line.
I added a log message to BridgeDB that tells us how many bridge requests resulted in 0, 1, 2, and 3 bridge lines. Here are the results for a few hours worth of logs:
(Interestingly, all requests that resulted in 0 bridges were HTTPS requests for obfs2, coming from Tor exit relays. BridgeDB no longer supports obfs2, which is why it responds with 0 bridges.)
Assuming that these numbers are correct, BridgeDB should be returning at least one bridge for every request it has seen over the last few hours. That clearly wasn't the case a few days ago but I wonder if it's the case now. The only thing that changed is that I added debug log messages and restarted BridgeDB a few times.