Captchas are not accessible for blind users

added anti-censorship-roadmap-2020 bridgedb-reportbug bridgedb-ui component::circumvention/bridgedb owner::juggy parent::31279 points::5 priority::medium reporter::PZajda s30-o22a2 severity::normal sponsor::30-can status::assigned type::enhancement labels

There are currently two ways to give CAPTCHAs to a BridgeDB user:

Request a CAPTCHA from a reCaptcha API server using either BridgeDB's IP or a random fake IP, steal the image and the 'recaptcha_challenge_string' form field from the response (the code for this is here), and then serve it to the client. The client's CAPTCHA solution is then sent back to the reCaptcha API server for verification.
There is a branch for #10809 (moved) which changes to using a local cache of descriptors, which is created with Gimp. I think we intend to to go the later route of using homebrewed CAPTCHAs, and adding audio CAPTCHA support would be excellent. The scripts which generate the CAPTCHAs cannot be run on BridgeDB, because Gimp requires X to be installed. The script produces a directory of image files which are named for the CAPTCHA answer, i.e. aT2bXvw7.jpg.

!#2 (closed) is the better way to go, I think, as BridgeDB is switching to that. Though having support for reCaptcha's audio CAPTCHAs (!#1) in BridgeDB would be good too.

For !#2 (closed): I am uncertain of the best way to do this.

One idea would be to convert the image filenames to audio, by extending the gimp-captcha scripts to also the produce audio files. I have not looked into Python TTS engine wrapping modules lately, and so I have little advice to give there.
Another idea, which might be more resource friendly, would be to ignore the filename completely and generate a random string, then use some TTS module to create the CAPTCHA (doing all this only if the audio CAPTCHA has been requested by a user).

Trac:
Type: defect to enhancement
Status: new to assigned

Trac:
Keywords: N/A deleted, isisExB, isisExC, isis2015Q3Q4 added

Trac:
Keywords: N/A deleted, bridgedb-ui added

Trac:
Severity: N/A to Blocker
Sponsor: N/A to N/A
Status: assigned to new
Reviewer: N/A to N/A

Trac:
Severity: Blocker to Normal

Trac:
Cc: isis to isis, brade, mcs

Can we use Python's captcha library for generating audio captchas? Also, we can use the same library for generating image captchas because it doesn't require X to be installed and hopefully we can run it on BridgeDB.

Trac:
Username: unknown_artist

Trac:
Cc: isis, brade, mcs to isis, brade, mcs, contact@carolin-zoebelein.de

Trac:
Cc: isis, brade, mcs, contact@carolin-zoebelein.de to isis, brade, mcs

Trac:
Cc: isis, brade, mcs to isis, brade, mcs, contact@carolin-zoebelein.de

Replying to unknown_artist:

Can we use Python's captcha library for generating audio captchas? Also, we can use the same library for generating image captchas because it doesn't require X to be installed and hopefully we can run it on BridgeDB.

If you write a ticket, please give more specific information. If I look on the web for python + captcha, I already find several python libraries ... .

Hence, nobody knows what exactly you are talking about.

Please write down for example the exactly name of the library, possible functions, a sketch of the code you want to implement etc... .

I am planning to use https://pypi.python.org/pypi/captcha for generating captchas. As per the documentation, we can do something like this for generating audio captchas :

from captcha.audio import AudioCaptcha
audio = AudioCaptcha(voicedir='/path/to/voices')
audio.write('aT2bXvw7','aT2bXvw7.wav')

The above code snippet will generate an audio captcha whose correct answer is aT2bXvw7 The voice directory should contain single character named directories, for example :

a/
b/
c/ These directories should contain 8 bit PCM .wav files. Each character directory may contain as many .wav files and one of them will be randomly chosen for captcha generation

Trac:
Username: unknown_artist

Replying to unknown_artist:

I am planning to use https://pypi.python.org/pypi/captcha for generating captchas. As per the documentation, we can do something like this for generating audio captchas :

from captcha.audio import AudioCaptcha
audio = AudioCaptcha(voicedir='/path/to/voices')
audio.write('aT2bXvw7','aT2bXvw7.wav')
}}}
The above code snippet will generate an audio captcha whose correct answer is aT2bXvw7
The voice directory should contain single character named directories, for example :
* a/
* b/
* c/
These directories should contain 8 bit PCM .wav files. Each character directory may contain as many .wav files and one of them will be randomly chosen for captcha generation

Hi unknown_artist!

Thanks for looking into this! It looks good. We'd need to make the recordings as part of this ticket, since their default voice only includes characters 0-9. From their README, it looks like they'd appreciate an upstream contribution of voice files as well.

We'll also need to update the interface at https://bridges.torproject.org/bridges to have some button people can click to hear audio, and probably have a hidden directive for screen readers before the header bar at the top of the page, e.g. something like:

{{{ .screen-reader-text { clip: rect(1px, 1px, 1px, 1px); height: 1px; width: 1px; overflow: hidden; position: absolute !important; }

Instructions for those using screen readers: please use access key 'a' to play an audio captcha, enter the characters you hear into the form which is accessible via access key 't', and then press enter. Please be aware that the audio captcha is in English.


The American Foundation for the Blind has [some helpful tips for making web things easier](http://www.afb.org/info/programs-and-services/technology-evaluation/creating-accessible-websites/accessible-forms/1235) on people with braille terminals and screen readers.

We may also want to put a screen reader note (on the page which contains the actual bridges) to let them know what the access key is for the "Select All" button to copy the bridge lines. (It also doesn't appear to have an access key right now.)

Oh, all of the strings above should also be translated; you can do that by making them constants in `bridgedb/strings.py`.

Let me know if you need any help!

Trac:
Keywords: isisExB, isisExC, isis2015Q3Q4 deleted, N/A added
Points: N/A to 5
Status: new to assigned
Owner: isis to N/A

Trac:
Sponsor: N/A to Sponsor19

Trac:
Keywords: N/A deleted, anti-censorship-roadmap-2019 added

Moving from Sponsor 19 to Sponsor 30.

Trac:
Sponsor: Sponsor19 to Sponsor30-can

Trac:
Keywords: anti-censorship-roadmap-2019 deleted, anti-censorship-roadmap added

Trac:
Parent: N/A to #31279 (closed)

Trac:
Keywords: N/A deleted, s30-o22a2 added

Trac:
Keywords: anti-censorship-roadmap deleted, anti-censorship-roadmap-2020Q1 added

Change tickets that are assigned to nobody to "new".

Trac:
Status: assigned to new

I wrote a sample web server [https://github.com/jugheadjones10/bridgedb-audio-captcha] that serves the original BridgeDB captcha page with audio captchas (using suggestions from the comments here). Could I receive some feedback about any naive code or problems that might arise if this is integrated into BridgeDB? Thank you!

Trac:
Username: juggy

Trac:
Owner: N/A to juggy
Status: new to assigned

Trac:
Username: juggy
Status: assigned to needs_revision

Trac:
Username: juggy
Status: needs_revision to needs_review

Replying to juggy:

I wrote a sample web server [https://github.com/jugheadjones10/bridgedb-audio-captcha] that serves the original BridgeDB captcha page with audio captchas (using suggestions from the comments here). Could I receive some feedback about any naive code or problems that might arise if this is integrated into BridgeDB? Thank you!

Thanks for working on this! I gave it a shot and it worked for me. Here are some thoughts:

The size of a single audio CAPTCHA seems to be approximately 85 KB. It should be straightforward to add the audio CAPTCHA to bridges.torproject.org but if possible, we should also make it available over moat. We could encode it in Base64 and send it in the HTTP response to a moat request. However, > 85 extra KB per request sounds expensive for a CAPTCHA that only a small fraction of users would use but we may be able to reduce the size.
The library's default voice is English, which is a potential usability problem. It would be neat if we had multiple languages but this doesn't strike me as a critical issue. Most people will recognise English numbers.
Your GitHub repository contains the following question:

A concern : Given the simple input-output nature of the Python audio captcha library, it seems like it wouldn't take long to train a simple model to accurately crack the audio captcha. That's true but I wouldn't expect the audio CAPTCHA to be easier to break than the visual CAPTCHA, or am I missing something? As long as it doesn't make our distributor easier to attack, I see no problem in deploying it.

Out of curiosity, did you take a look at other libraries too? If so, why did you end up using https://github.com/lepture/captcha ?

Trac:
Status: needs_review to new

Trac:
Username: juggy
Status: new to assigned

No more Q1 for 2020.

Trac:
Keywords: anti-censorship-roadmap-2020Q1 deleted, anti-censorship-roadmap-2020 added

changed time estimate to 40h

mentioned in issue #24607 (moved)

mentioned in issue #31279 (closed)

moved to tpo/anti-censorship/bridgedb#10831 (closed)

Captchas are not accessible for blind users

Child items ...

Activity