There seems to be a problem with BridgeDB writing transport lines to its bridge pool assignment files:
$ grep -c transport *2013-07-13-00-01-22:1232013-07-13-00-30-22:2002013-07-13-01-00-23:2752013-07-13-01-30-22:442013-07-13-02-00-25:1212013-07-13-02-30-27:1982013-07-13-03-00-33:2812013-07-13-03-30-29:73
Numbers shouldn't go up and down so fast.
Also, I saw Isis mention this the other day, but I'm not sure if there's already a ticket for it:
This might also be related to #9174 (moved). We found that the bridge of #9174 (moved) sometimes published extrainfo descriptors that didn't include a transport line.
We found that the bridge of #9174 (moved) sometimes published extrainfo descriptors that didn't include a transport line.
That bridge has AssumeReachable 1, because its always fails reachability check without it. I do not know if this could be related to not sometimes publishing transports.
asn just found that BridgeDB might only parse cached-extrainfo.new and ignores cached-extrainfo. That would explain the pattern above where bridges with "transport" lines go up for two hours and then start at zero again.
The fix is to parse both cached-extrainfo* files in BridgeDB.
asn just found that BridgeDB might only parse cached-extrainfo.new and ignores cached-extrainfo. That would explain the pattern above where bridges with "transport" lines go up for two hours and then start at zero again.
The fix is to parse both cached-extrainfo* files in BridgeDB.
BridgeDB was incorrectly using only the new (cached-extrainfo.new) descriptors.
I have reverted to using the cached-extrainfo file, but as noted above we don't learn about the extra-info (and pluggable transports) till 2hrs later. sysrqb's fix looks like it should address this (thanks!).
From Karsten's response in the email thread, clarifying extrainfo behavior:
No. Tonga writes descriptors to cached-extrainfo roughly every twohours. It then writes incoming descriptors to cached-extrainfo.newwithout touching cached-extrainfo. Only two hours later, it rewritescached-extrainfo and puts all descriptors in it that are still valid.It also deletes cached-extrainfo.new.
Note the duplicate entry for 112f02b648f19502831103fa50715ce31f8a4a2e. The first line shouldn't be there, I guess.
Setting priority to major, because the currently produced bridge pool assignments won't be useful for analysis.
Are those entries adjacent? Or was this output produced with sort? I noticed that for each assignment period a line is written 3 times for each bridge (but the lines are not consecutive). That suggests that the bug might have something to do with the function BridgeSplitter.dumpAssignments() in Bridges.py:923 ... but it isn't clear to me why that occurs yet.
Good point. Lines are not adjacent in the original assignments.log file, I think. The sanitizer in metrics-db sorts lines, and what I pasted above is from sanitized assignments.
Still no progress? Again, we're losing data here. The July pool assignment archives will be mostly useless for analysis, and Onionoo serves useless data to bridge operators. Please fix ASAP. Thanks.
As a hint on what might be going on, I read some BridgeDB logs with sysrqb and we noticed that when dumpAssignments() is run, we get multiple "%s supports %d transports" messages for each bridge.
This probably means that the list self.bridges contains duplicate bridges.
I read through some of the settings in the running instance of BridgeDB's bridgedb.conf file. There it specifies:
$ cat bridgedb.conf | grep descriptors# Files from which we read bridge descriptors, on start and on SIGHUP.BRIDGE_FILES = [ "../from-authority/bridge-descriptors" ]
and in Main.py, in the global CONFIG variable which is used for testing, these are:
In the from-authority directory I found all three of these files, all with recent modification times, and the running BridgeDB process was not called with the '--testing' option, so it should only be reading the 'bridge-descriptors' file.
I'm fairly sure that if I change BRIDGE_FILES in the running instance's config to ["./cached-descriptors", "./cached-descriptors.new"] that it'll fix this, but I am nervous to change things without someone else who knows BridgeDB really well being available to help (I don't know how maintenance and deployments are done yet).
It appears that whatever is causing this was put into production on 2013/08/14. Does ponticum have any sort of log that will be able to tell us what changed that day? I'm still going through commits that were made to bridgedb prior to that date. Does the bridgedb logfile go back that far?
I determined the date by doing a lovely binary search through the bridge-pool-assignments on metrics and running
for i in ./*/*; do echo -n "$i: "; cat $i | awk '{ print $1; }' \ | uniq -d | wc -l; done | grep -v 0$ | head
over the month. August 14 was the first day that contained duplicates.
I've written an ugly hack that should solve this problem temporarily until we're able to determine what is causing it. It's branch bug9264_temporary_fix on https://github.com/sysrqb/bridgedb.git
I've written an ugly hack that should solve this problem temporarily until we're able to determine what is causing it. It's branch bug9264_temporary_fix on https://github.com/sysrqb/bridgedb.git
Did I mention this is a temporary hack?
Got it. Looks good, thanks, sysrqb! I merged this here into my master branch, to go into the next redeploy that I am about to do.
Trac: Owner: N/Ato isis Status: needs_review to accepted
I can confirm that bridge pool assignment files now contain sane numbers of bridges with transports:
$ grep -c transport *2013-08-12-00-04-05:4742013-08-12-00-34-08:4752013-08-12-01-03-51:4752013-08-12-01-34-01:4752013-08-12-02-04-01:4902013-08-12-02-34-03:4942013-08-12-03-04-09:4932013-08-12-03-33-57:4942013-08-12-04-03-56:4932013-08-12-04-33-55:4932013-08-12-05-03-53:4922013-08-12-05-34-08:4922013-08-12-06-03-57:4902013-08-12-06-33-57:4912013-08-12-07-03-51:4902013-08-12-07-34-32:4592013-08-12-08-05-22:4652013-08-12-08-34-03:4762013-08-12-09-03-52:4912013-08-12-09-33-56:4902013-08-12-10-03-38:4902013-08-12-10-33-59:4902013-08-12-11-05-10:5102013-08-12-11-33-38:509
So, I think this concludes the ticket. Optimistically closing. If this was wrong, please re-open. Thanks everyone!
Trac: Resolution: N/Ato fixed Status: needs_information to closed