bridgedb's email responder should fuzzy match email addresses within time periods
tl;dr: We're getting trolled hardcore. We should have some sort of fuzzy matching on email addresses within a time limit.
While looking into #9277 (moved), in the directory which BridgeDB stores it's logfiles, I noticed several problems.
One of them is that BridgeDB's email response distributor is incredibly naive and susceptible to massive trolling. Forgetting the fact that there are five days worth of logfiles which include the full text of the response emails, including the client email addresses, it is actually lucky that I saw these email addresses, because there is a definite pattern to them.
There were 200 occurences of 'gmail.com':
$ grep -Er '@gmail\.com' | awk -Pe '{"From "} ; { print $2 }' | grep gmail\.com | wc -l
200
120 of which were unique:
$ grep -Er '@gmail\.com' | awk -Pe '{"From "} ; { print $2 }' | grep gmail\.com | sort | uniq | wc -l
120
The problem is that there are multiple addresses making requests in a row which are not only quite clearly related (i.e. <static_username>+<incremental_integer>@gmail.com, or <base32_80bit_hash>@gmail.com) but also are quite obviously snark/trolling from various adversaries.
For example, one of the usernames which had incremental integers, was 'feidanchaoren', and I saw it incremented 34 times, i.e.
feidanchaoren00001@
feidanchaoren00002@
[...]
feidanchaoren00034@
There were multiple requests (though at minimum 30 minutes apart) from precisely the same username+integer.
Also, 'fei dan' is romanji for 飞蛋, which means 'flying egg' in English. It is from Confucian parable which, if I understood it correctly (and I am well-versed in neither Traditional Chinese nor Confucianism), is about a man who pays so much attention to a bunch of eggs trying to ensure that they hatch, that he does not pay any attention to what to do afterwards. The eggs hatch, and the chickens fly away. Roughly, it means: "if you pay too much attention to details and not enough to the bigger picture, you are made of #fail". And 'cha oren' (超人) is 'superman' in English but more accurately Nietzsche's 'übermensch' in German. I would assume we're being trolled pretty hard.
One way to fix this might be to take the time period which we currently wait between responses, and in addition to rejecting emails from precisely the same username, we can block anything which fuzzy matches. However, going down the path of finding clever regexes to match things like the fake .onion address looking email addresses in addition to all the other things which are clearly patterns to a human sounds like a good way to either write unreadable code or accidentally block honest users.