Let's use this ticket to coordinate the future of BridgeDB's CAPTCHA. BridgeDB currently uses gimp-captcha to generate CAPTCHAs.
We believe that the GFW maintains a bot (which, ironically, uses Tor) that is successfully crawling BridgeDB while maintaining a CAPTCHA success rate that easily outperforms people. Not only does our CAPTCHA harm usability (see also #10831 (moved)), it also fails in the face of a real-world adversary.
Google provides a reCAPTCHA v3 API, which returns an anomaly score in the interval [0, 1] for each request, without any kind of friction. Ignoring for now that this is a Google service, it may be an option for BridgeDB's HTTPS distributor but not for moat or email.
There is plenty of research on new CAPTCHA schemes, sometimes leveraging more complex domains like video or adversarial examples, which are meant to confuse classifiers. None of these systems seems likely to make a difference in the long term.
We are in a particularly difficult situation because our CAPTCHA needs to work for a highly diverse set of people.
Cecylia had a chat with isis, who helpfully pointed out that BridgeDB serves a static set of pre-compiled CAPTCHAs. This set currently contains 10,000 CAPTCHAs, which were last updated in February 2014. (On an unrelated note, there's a good chance that the GFW isn't in fact using a classifier to break our CAPTCHAs (see #32117 (moved)) – it may have just solved these CAPTCHAs and then continued to recycle them.)
The tool gimp-captcha was last used to generate CAPTCHAs for BridgeDB. I made the code work on Debian buster and GIMP 2.10, and then experimented with making the CAPTCHAs easier to solve. In particular, I did the following:
Increase the spacing between letters.
Reduce the maximum angle tilt of letters.
Made the letters darker.
Here are three examples:
And here are two CAPTCHAs as they are currently used by BridgeDB:
In the long term, we should be moving away from CAPTCHAs but in the short term we can re-generate a new set that's easier for users to solve. Our BridgeDB metrics reveal (an approximation of) the success rate at which our users solve CAPTCHAs. We should deploy a new batch and then refine the CAPTCHAs if the success rate doesn't improve significantly.
Our BridgeDB metrics should soon reveal if 1) users will now be more successful at solving CAPTCHAs and if 2) the GFW will now fail more – either because their classifier (if they have one) does poorly on these new CAPTCHAs, or because they previously solved all 10,000 CAPTCHAs and are now unable to solve the new ones.
I took a look at our recent BridgeDB metrics to get an idea of how our new CAPTCHAs affected users and bots. Here's what we believe are bot requests for vanilla bridges over HTTPS (i.e., https.vanilla.zz):
= Date =
= # success =
= # failed =
= # total =
= Success rate =
2020-01-28
4,060
640
4,700
86%
2020-01-29
3,700
1,120
4,820
77%
2020-01-30
510
4,550
5,060
10%
And here's what we believe are user requests from the U.S. for vanilla bridges over HTTPS (i.e., https.vanilla.us):
= Date =
= # success =
= # failed =
= # total =
= Success rate =
2020-01-28
170
160
330
52%
2020-01-29
280
290
570
49%
2020-01-30
300
70
370
81%
Recall that we deployed new CAPTCHAs on January 29. This is when the success rate of bots began to decline, and the success rate of users began to increase. I expect the bots to improve over time but we managed to increase the success rate of users, which is what matters most. (Granted, I'm only looking at requests from the U.S. because we see most requests from this region. Other regions that don't have native English speakers may not be doing as well.)
Here's the number of moat requests over time (i.e., moat.obfs4.??). Note that we don't know the proportion of user and bot requests comprising all moat requests:
= Date =
= # success =
= # failed =
= # total =
= Success rate =
2020-01-28
3,780
2,780
6,560
58%
2020-01-29
3,800
2,830
6,630
57%
2020-01-30
3,910
590
4,500
87%
I find it surprising that the number of failed requests decreased on Jan 30 but the number of successful requests didn't increase. This may be a bug in the metrics collection. Another possibility is that bots requested CAPTCHAs, were not sufficiently confident in their classification, and subsequently didn't send a POST request with their solution to BridgeDB.
Google provides a reCAPTCHA v3 API, which returns an anomaly score in the interval [0, 1] for each request, without any kind of friction. Ignoring for now that this is a Google service, it may be an option for BridgeDB's HTTPS distributor but not for moat or email.
In theory, we may even be able to set up a reverse proxy for Google's CAPTCHA API, so Google doesn't get to see requests directly from our users. The NSA apparently set up such a system.