CAPTCHAs on BridgeDB seem to be getting more difficult

added actualpoints::1.1 anti-censorship-roadmap-2020Q1 component::circumvention/bridgedb owner::phw parent::31279 points::5 priority::medium resolution::fixed s30-o22a2 severity::normal sponsor::30-must status::closed type::defect labels

Trac:

bad-captcha-1

Trac:

looks-like-G-f-but-its-not

Trac:
Owner: isis to N/A
Status: new to assigned
Points: N/A to 5
Sponsor: N/A to Sponsor19

Trac:
Keywords: N/A deleted, anti-censorship-roadmap-2019 added

Moving from Sponsor 19 to Sponsor 30.

Trac:
Sponsor: Sponsor19 to Sponsor30-must

Trac:
Keywords: anti-censorship-roadmap-2019 deleted, anti-censorship-roadmap added

Trac:
Keywords: anti-censorship-roadmap deleted, anti-censorship-roadmap-november added

Trac:
Parent: N/A to #31268 (moved)

Trac:
Parent: #31268 (moved) to #31279 (closed)

Trac:
Keywords: N/A deleted, s30-o22a2 added

Let's use this ticket to coordinate the future of BridgeDB's CAPTCHA. BridgeDB currently uses gimp-captcha to generate CAPTCHAs.

We believe that the GFW maintains a bot (which, ironically, uses Tor) that is successfully crawling BridgeDB while maintaining a CAPTCHA success rate that easily outperforms people. Not only does our CAPTCHA harm usability (see also #10831 (moved)), it also fails in the face of a real-world adversary.
Google provides a reCAPTCHA v3 API, which returns an anomaly score in the interval [0, 1] for each request, without any kind of friction. Ignoring for now that this is a Google service, it may be an option for BridgeDB's HTTPS distributor but not for moat or email.
There is plenty of research on new CAPTCHA schemes, sometimes leveraging more complex domains like video or adversarial examples, which are meant to confuse classifiers. None of these systems seems likely to make a difference in the long term.

We are in a particularly difficult situation because our CAPTCHA needs to work for a highly diverse set of people.

Trac:
Cc: N/A to brade, mcs

Trac:

New CAPTCHA 1

Trac:

New CAPTCHA 2

Trac:

New CAPTCHA 3

Cecylia had a chat with isis, who helpfully pointed out that BridgeDB serves a static set of pre-compiled CAPTCHAs. This set currently contains 10,000 CAPTCHAs, which were last updated in February 2014. (On an unrelated note, there's a good chance that the GFW isn't in fact using a classifier to break our CAPTCHAs (see #32117 (moved)) – it may have just solved these CAPTCHAs and then continued to recycle them.)

The tool gimp-captcha was last used to generate CAPTCHAs for BridgeDB. I made the code work on Debian buster and GIMP 2.10, and then experimented with making the CAPTCHAs easier to solve. In particular, I did the following:

Increase the spacing between letters.
Reduce the maximum angle tilt of letters.
Made the letters darker.

Here are three examples:

And here are two CAPTCHAs as they are currently used by BridgeDB:

In the long term, we should be moving away from CAPTCHAs but in the short term we can re-generate a new set that's easier for users to solve. Our BridgeDB metrics reveal (an approximation of) the success rate at which our users solve CAPTCHAs. We should deploy a new batch and then refine the CAPTCHAs if the success rate doesn't improve significantly.

Trac:
Cc: brade, mcs to brade, mcs, cohosh

I just deployed a new batch of CAPTCHAs that I created after applying the following patch set to isis's gimp-captcha: https://github.com/isislovecruft/gimp-captcha/compare/master...NullHypothesis:fix/24607?expand=1

Our BridgeDB metrics should soon reveal if 1) users will now be more successful at solving CAPTCHAs and if 2) the GFW will now fail more – either because their classifier (if they have one) does poorly on these new CAPTCHAs, or because they previously solved all 10,000 CAPTCHAs and are now unable to solve the new ones.

I took a look at our recent BridgeDB metrics to get an idea of how our new CAPTCHAs affected users and bots. Here's what we believe are bot requests for vanilla bridges over HTTPS (i.e., https.vanilla.zz):

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-01-28	4,060	640	4,700	86%
2020-01-29	3,700	1,120	4,820	77%
2020-01-30	510	4,550	5,060	10%

And here's what we believe are user requests from the U.S. for vanilla bridges over HTTPS (i.e., https.vanilla.us):

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-01-28	170	160	330	52%
2020-01-29	280	290	570	49%
2020-01-30	300	70	370	81%

Recall that we deployed new CAPTCHAs on January 29. This is when the success rate of bots began to decline, and the success rate of users began to increase. I expect the bots to improve over time but we managed to increase the success rate of users, which is what matters most. (Granted, I'm only looking at requests from the U.S. because we see most requests from this region. Other regions that don't have native English speakers may not be doing as well.)

Here's the number of moat requests over time (i.e., moat.obfs4.??). Note that we don't know the proportion of user and bot requests comprising all moat requests:

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-01-28	3,780	2,780	6,560	58%
2020-01-29	3,800	2,830	6,630	57%
2020-01-30	3,910	590	4,500	87%

I find it surprising that the number of failed requests decreased on Jan 30 but the number of successful requests didn't increase. This may be a bug in the metrics collection. Another possibility is that bots requested CAPTCHAs, were not sufficiently confident in their classification, and subsequently didn't send a POST request with their solution to BridgeDB.

Trac:
Owner: N/A to phw

Replying to phw:

Google provides a reCAPTCHA v3 API, which returns an anomaly score in the interval [0, 1] for each request, without any kind of friction. Ignoring for now that this is a Google service, it may be an option for BridgeDB's HTTPS distributor but not for moat or email.

In theory, we may even be able to set up a reverse proxy for Google's CAPTCHA API, so Google doesn't get to see requests directly from our users. The NSA apparently set up such a system.

Trac:
Keywords: anti-censorship-roadmap-november deleted, anti-censorship-roadmap-2020Q1 added

Here are the February 2020 numbers for moat (moat.obfs4.??):

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-02-01	3870	530	4400	87.00%
2020-02-02	3560	520	4080	87.00%
2020-02-03	3820	540	4360	87.00%
2020-02-05	4050	560	4610	87.00%
2020-02-06	3700	580	4280	86.00%
2020-02-07	3810	560	4370	87.00%
2020-02-08	3980	560	4540	87.00%
2020-02-09	4330	590	4920	88.00%
2020-02-10	4130	540	4670	88.00%
2020-02-11	4640	580	5220	88.00%
2020-02-15	3640	520	4160	87.00%
2020-02-16	4040	600	4640	87.00%
2020-02-17	4020	560	4580	87.00%
2020-02-18	4500	570	5070	88.00%
2020-02-20	4560	580	5140	88.00%
2020-02-21	4070	550	4620	88.00%
2020-02-22	3680	520	4200	87.00%
2020-02-23	3700	520	4220	87.00%
2020-02-24	3940	580	4520	87.00%
2020-02-25	4300	610	4910	87.00%
2020-02-26	4610	610	5220	88.00%
2020-02-27	4240	630	4870	87.00%
2020-02-28	5040	690	5730	87.00%
2020-02-29	4100	570	4670	87.00%

And for HTTPS requests from the U.S. for vanilla bridges (https.vanilla.us):

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-02-01	200	70	270	74.00%
2020-02-02	200	60	260	76.00%
2020-02-03	200	60	260	76.00%
2020-02-05	190	250	440	43.00%
2020-02-06	200	40	240	83.00%
2020-02-07	180	50	230	78.00%
2020-02-08	170	30	200	85.00%
2020-02-09	200	60	260	76.00%
2020-02-10	200	40	240	83.00%
2020-02-11	240	70	310	77.00%
2020-02-15	200	40	240	83.00%
2020-02-16	230	70	300	76.00%
2020-02-17	270	190	460	58.00%
2020-02-18	230	70	300	76.00%
2020-02-20	220	50	270	81.00%
2020-02-21	200	60	260	76.00%
2020-02-22	270	60	330	81.00%
2020-02-23	220	60	280	78.00%
2020-02-24	200	70	270	74.00%
2020-02-25	180	60	240	75.00%
2020-02-26	220	90	310	70.00%
2020-02-27	240	60	300	80.00%
2020-02-28	220	50	270	81.00%
2020-02-29	290	70	360	80.00%

Oddly, the success rate is lower for HTTPS requests from the U.S. for obfs4 bridges (https.obfs4.us):

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-02-01	150	150	300	50.00%
2020-02-02	130	120	250	52.00%
2020-02-03	110	120	230	47.00%
2020-02-05	120	330	450	26.00%
2020-02-06	80	100	180	44.00%
2020-02-07	80	120	200	40.00%
2020-02-08	80	110	190	42.00%
2020-02-09	90	110	200	45.00%
2020-02-10	110	120	230	47.00%
2020-02-11	120	120	240	50.00%
2020-02-15	120	110	230	52.00%
2020-02-16	120	110	230	52.00%
2020-02-17	140	250	390	35.00%
2020-02-18	130	120	250	52.00%
2020-02-20	150	150	300	50.00%
2020-02-21	110	120	230	47.00%
2020-02-22	120	130	250	48.00%
2020-02-23	120	120	240	50.00%
2020-02-24	90	130	220	40.00%
2020-02-25	130	130	260	50.00%
2020-02-26	110	120	230	47.00%
2020-02-27	130	130	260	50.00%
2020-02-28	160	150	310	51.00%
2020-02-29	180	150	330	54.00%

And for HTTPS for requests coming over Tor (https.vanilla.zz):

= Date =	= # success =	= # failed =	= # total =	= Success rate =
2020-02-01	250	7390	7640	3.00%
2020-02-02	260	7160	7420	3.00%
2020-02-03	240	6950	7190	3.00%
2020-02-05	190	5150	5340	3.00%
2020-02-06	260	7030	7290	3.00%
2020-02-07	230	6770	7000	3.00%
2020-02-08	250	7030	7280	3.00%
2020-02-09	250	7040	7290	3.00%
2020-02-10	240	7700	7940	3.00%
2020-02-11	250	7260	7510	3.00%
2020-02-15	280	6490	6770	4.00%
2020-02-16	300	5840	6140	4.00%
2020-02-17	280	5650	5930	4.00%
2020-02-18	270	6730	7000	3.00%
2020-02-20	220	7930	8150	2.00%
2020-02-21	490	9130	9620	5.00%
2020-02-22	600	8970	9570	6.00%
2020-02-23	560	8880	9440	5.00%
2020-02-24	600	7790	8390	7.00%
2020-02-25	600	4900	5500	10.00%
2020-02-26	580	6430	7010	8.00%
2020-02-27	580	7340	7920	7.00%
2020-02-28	670	6840	7510	8.00%
2020-02-29	600	6640	7240	8.00%

Here's how I got these numbers: I used the tool bridgedb-metrics-parser to parse BridgeDB's CollecTor files. I used the following script to batch-process archived BridgeDB metrics:


file="$1"
# args="-d moat -b obfs4"
args="-d https -b vanilla -o ru"
# args="-d https -b vanilla -o zz"

content=$(cat "$file")
date=$(echo "$content" | \
       bridgedb-metrics-parser -i | \
       cut -d ' ' -f 1)

success=$(echo "$content" | \
          bridgedb-metrics-parser -s success $(echo "$args") -i | \
          cut -d ',' -f 2)
fail=$(echo "$content" | \
       bridgedb-metrics-parser -s fail $(echo "$args") -i | \
       cut -d ',' -f 2)
total=$(("$success" + "$fail"))
pct=$(echo "scale=2; ${success} / ${total} * 100" | bc -l)

echo "|| $date || $success|| $fail|| $total|| ${pct}%||"

I'm closing this ticket because there's sufficient evidence suggesting that our CAPTCHAs are now easier to solve.

There remain open questions about the effect of the new CAPTCHAs on crawlers. Let's continue this investigation in #32117 (moved).

Trac:
Actualpoints: N/A to 1.1
Resolution: N/A to fixed
Status: assigned to closed

closed

changed time estimate to 40h

added 8h 48m of time spent

mentioned in issue #29695 (moved)

mentioned in issue #30872 (moved)

mentioned in issue #32117 (moved)

mentioned in issue #32997 (moved)

mentioned in issue #31279 (closed)

moved to tpo/anti-censorship/bridgedb#24607 (closed)

mentioned in issue tpo/anti-censorship/bridgedb#32117 (closed)

CAPTCHAs on BridgeDB seem to be getting more difficult

Child items 0

Activity