Separate BridgeDB's CAPTCHA into another service

added actualpoints::2 bridgedb-https captcha component::circumvention/bridgedb ex-sponsor-19 ooni-probe points::2 priority::medium severity::normal sponsor::30-can status::new tor-launcher type::enhancement labels

FWIW, Mike mentioned also wanting the API to present a JSON-RPC interface.

Trac:
Sponsor: N/A to N/A
Cc: isis, mikeperry, hellais to isis, mikeperry, hellais, brade, mcs
Reviewer: N/A to N/A
Severity: N/A to Normal

I made a CAPTCHA server here: https://github.com/isislovecruft/farfetchd

It has a JSON API:

GET /fetch will return JSON in the form: {{{ { 'image': null or base64-encoded jpeg image, 'challenge': null or url-safe base64-encoded challenge string, 'error': null or ascii-encoded string describing the error, } }}}
POST /check?data=[…] where the data url parameter is a JSON string in the following form: {{{ { 'challenge': base64-encoded challenge string (from the above response), 'response': base64-encoded response (i.e. the CAPTCHA solution), } }}} The farfetched server will attempt to verify the challenge response, and replies with JSON in the following form: {{{ { 'result': bool, 'error': null or base64-encoded string describing the error, } }}}

Please let me know if this API seems like it'll work on the Tor Browser side, or if there's any way I could make it easier to process the data and/or hand it back and forth.

Trac:
Status: new to needs_review
Actualpoints: N/A to 2
Points: N/A to 2
Sponsor: N/A to SponsorM

Couple things I realised I should do:

There should be a 'version' field in every JSON thing so that we can add things later if we need to.
There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.
The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)

Also, in order to conform to the JSON API standard, I need to change the following things:

The content-type apparently needs to be application/vnd.api+json (not application/json).
"Servers MUST respond with a 415 Unsupported Media Type status code if a request specifies the header Content-Type: application/vnd.api+json with any media type parameters."
"Servers MUST respond with a 406 Not Acceptable status code if a request’s Accept header contains the JSON API media type and all instances of that media type are modified with media type parameters."
"A document MUST contain at least one of the following top-level members:
- data: the document’s “primary data”
- errors: an array of error objects
- meta: a meta object that contains non-standard meta-information."
"The members data and errors MUST NOT coexist in the same document."
"Primary data MUST be […] a single resource object […]
"A resource object MUST contain at least the following top-level members:
- id
- type Exception: The id member is not required when the resource object originates at the client and represents a new resource to be created on the server."
"In addition, a resource object MAY contain any of these top-level members:
- attributes: an attributes object representing some of the resource’s data."
"The value of the attributes key MUST be an object (an “attributes object”). Members of the attributes object (“attributes”) represent information about the resource object in which it’s defined. Attributes may contain any valid JSON value."
"A JSON API document MAY include information about its implementation under a top level jsonapi member. If present, the value of the jsonapi member MUST be an object (a “jsonapi object”). The jsonapi object MAY contain a version member whose value is a string indicating the highest JSON API version supported."
"A server MUST return 403 Forbidden in response to an unsupported request to create a resource with a client-generated ID." (for the POST part)
"Error objects provide additional information about problems encountered while performing an operation. Error objects MUST be returned as an array keyed by errors in the top level of a JSON API document. An error object MAY have the following members:
- id: a unique identifier for this particular occurrence of the problem.
- links: a links object containing the following members:
  - about: a link that leads to further details about this particular occurrence of the problem.
- status: the HTTP status code applicable to this problem, expressed as a string value.
- code: an application-specific error code, expressed as a string value.
- title: a short, human-readable summary of the problem that SHOULD NOT change from occurrence to occurrence of the problem, except for purposes of localization.
- detail: a human-readable explanation specific to this occurrence of the problem. Like title, this field’s value can be localized.
- source: an object containing references to the source of the error, optionally including any of the following members: - pointer: a JSON Pointer [RFC6901] to the associated entity in the request document [e.g. "/data" for a primary data object, or "/data/attributes/title" for a specific attribute]. - parameter: a string indicating which URI query parameter caused the error.
- meta: a meta object containing non-standard meta-information about the error."

Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.

Trac:
Status: needs_review to needs_revision

Replying to isis:

Couple things I realised I should do:

There should be a 'version' field in every JSON thing so that we can add things later if we need to.

That seems like a good idea even if we never end up changing it (with JSON you have a lot of flexibility to add fields as needed without necessarily bumping the version).

There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.

Seems like a good idea and having it may also aid in debugging the protocol.

The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)

I like the idea of putting the response in the POST request, but maybe it is small so it doesn't matter much? That raises another issue: Kathy and I don't know exactly what the response will look like. Will all of the CAPTCHAs be similar to the ones that are currently used by bridgedb? E.g.: https://bridges.torproject.org/bridges?transport=obfs4 In any case, Tor Launcher will need to know how to present the image and what kind of response to ask the user for (e.g., text).

Also, maybe the server should return the "Enter the characters from the image above" prompt text so the server has more flexibility about that form of the CAPTCHA (or is the in the challenge field?) One challenge is localization of the prompt. The server could respect the Accept-Language header, but that is not necessarily correct for Tor Browser due to our English language spoofing feature. Hmmm.

Also, in order to conform to the JSON API standard, I need to change the following things: ...

That looks like a lot of complexity, but it might be worthwhile if we are going to have more JSON messages. I assume we will, e.g., Tor Launcher will request bridges from moat via a JSON request.

Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.

I was going to ask about that... requesting a CAPTCHA and solving it is not useful unless the server that cares about the CAPTCHA is involved, right? Is there a document available that shows how the pieces will fit together? If not, we should create one.

One more comment on your proposed API: Kathy and I would prefer error codes (numbers) rather than error strings, or at least a code plus a string. We will want to localize the errors that are displayed by Tor Launcher. It looks like the JSON API format includes this concept because error responses have a code as well as title and detail strings.

Thank you very much for your work!

anon-connection-wizard which is the Python-clone of Tor launcher is looking forward to the implementation of this BridgeDB API, too!

Trac:
Cc: isis, mikeperry, hellais, brade, mcs to isis, mikeperry, hellais, brade, mcs, iry

Replying to iry:

Thank you very much for your work!

anon-connection-wizard which is the Python-clone of Tor launcher is looking forward to the implementation of this BridgeDB API, too!

You're welcome!

This API won't be publicly accessible though, it'll be reachable through the API for #22871 (moved), and even then it's only reachable through a special meek reflector as part of #16650 (moved).

Is anon-connection-wizard what Tails uses now? I'd be happy to support Tails as well (but I'd strongly prefer the connection to go through the meek reflector).

Replying to mcs:

Replying to isis:

Couple things I realised I should do:

There should be a 'version' field in every JSON thing so that we can add things later if we need to.

That seems like a good idea even if we never end up changing it (with JSON you have a lot of flexibility to add fields as needed without necessarily bumping the version).

There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.

Seems like a good idea and having it may also aid in debugging the protocol.

The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)

I like the idea of putting the response in the POST request, but maybe it is small so it doesn't matter much? That raises another issue: Kathy and I don't know exactly what the response will look like. Will all of the CAPTCHAs be similar to the ones that are currently used by bridgedb? E.g.: https://bridges.torproject.org/bridges?transport=obfs4

Yes. They will be base64-encoded, jpeg images, 400px x 125px. (Is there something that would work better?)

In any case, Tor Launcher will need to know how to present the image and what kind of response to ask the user for (e.g., text).

Also, maybe the server should return the "Enter the characters from the image above" prompt text so the server has more flexibility about that form of the CAPTCHA (or is the in the challenge field?) One challenge is localization of the prompt. The server could respect the Accept-Language header, but that is not necessarily correct for Tor Browser due to our English language spoofing feature. Hmmm.

Hmmm. I am also not sure what to do here. I think probably, because TB's translation mechanics are quite different to BridgeDB's, it might be best to have TB create the string and localise it?

Also, in order to conform to the JSON API standard, I need to change the following things: ...

That looks like a lot of complexity, but it might be worthwhile if we are going to have more JSON messages. I assume we will, e.g., Tor Launcher will request bridges from moat via a JSON request.

Yeah, it's slightly complex, but it has everything we need and then we don't ever have to argue (also trying to think about the future when some router or VPN company wants to use this API) about who is doing things correctly. (Also it's less work for me to just use the JSON-API spec rather than write and maintain my own, more minimal one.)

Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.

I was going to ask about that... requesting a CAPTCHA and solving it is not useful unless the server that cares about the CAPTCHA is involved, right? Is there a document available that shows how the pieces will fit together? If not, we should create one.

I created a farfetchd API spec and there's parts of it incorporated into the overall (draft) moat spec. Let me know if you run across any more issues.

One more comment on your proposed API: Kathy and I would prefer error codes (numbers) rather than error strings, or at least a code plus a string. We will want to localize the errors that are displayed by Tor Launcher. It looks like the JSON API format includes this concept because error responses have a code as well as title and detail strings.

Makes sense. I'll rely on unique codes for different problems, that way you won't have to parse error strings.

Trac:
Sponsor: SponsorM to Sponsor19
Owner: isis to N/A
Status: needs_revision to assigned

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

Trac:
Keywords: bridgedb-https captcha tor-launcher ooni-probe deleted, tor-launcher, ex-sponsor-19, captcha, bridgedb-https, ooni-probe added

Moving from Sponsor 19 to Sponsor 30.

Trac:
Sponsor: Sponsor19 to Sponsor30-can

Change tickets that are assigned to nobody to "new".

Trac:
Status: assigned to new

changed time estimate to 16h

added 16h of time spent

mentioned in issue #16671 (moved)

mentioned in issue #22871 (moved)

moved to tpo/anti-censorship/bridgedb#15967 (closed)

mentioned in issue tpo/anti-censorship/bridgedb#22871 (closed)

Separate BridgeDB's CAPTCHA into another service

Child items ...

Activity