This was first requested when my GSoC student a couple summers ago was hacking on a Twitter bridge distributor, so that twitter requests for bridges could use the CAPTCHAs to decrease automated requests.
Last week, Mike Perry requested this, as part adding a mechanism to get new bridges directly from Tor Launcher.
Finally, Arturo also requested this today so that OONI probes running in censored countries can have an interface for getting bridges:
23:37 hellais | anyways it would be nice to have an API where I send a HTTP request and I get back some JSON with the captcha encoded in base64 and I can send back the solution to get bridges23:38 isis | yep, that's exactly what we're going to do :)23:39 isis | except we hadn't exactly decided on JSON, but yeah23:39 isis | the captcha image is already base64, btw23:42 isis | hellais: would these be bridges for bridge_reachability tests, or bridges just to get an ooniprobe capable of connecting to some (hopefully)-known-good-and-not-filtered version of the internet, for like submitting reports and stuff23:43 isis | hellais: err, that was meant as a question23:46 hellais | the second23:46 hellais | I would like to have this: https://gist.github.com/hellais/995793f88fc42727fb9223:46 hellais | run when ooniprobe is first installed to check if the user needs some bridges23:47 hellais | I think I'll put it in the setup.py or something23:47 hellais | since ooniprobe relies on tor for reporting without a working tor, we can't collect the reports23:48 hellais | parsing HTML with standard python libraries is just a bit messy and I was wondering if there was a better way00:10 hellais | anyways I think I will end up making something that starts a web server and makes the user solve the CAPTCHA on the local webserver. Having to open a JPG file and input it into a shell is so uncomfortable00:13 hellais | but the JSON API would be nice nonetheless
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
GET /fetch will return JSON in the form:
{{{
{
'image': null or base64-encoded jpeg image,
'challenge': null or url-safe base64-encoded challenge string,
'error': null or ascii-encoded string describing the error,
}
}}}
POST /check?data=[…] where the data url parameter is a JSON string in the following form:
{{{
{
'challenge': base64-encoded challenge string (from the above response),
'response': base64-encoded response (i.e. the CAPTCHA solution),
}
}}}
The farfetched server will attempt to verify the challenge response, and replies with JSON in the following form:
{{{
{
'result': bool,
'error': null or base64-encoded string describing the error,
}
}}}
Please let me know if this API seems like it'll work on the Tor Browser side, or if there's any way I could make it easier to process the data and/or hand it back and forth.
Trac: Status: new to needs_review Actualpoints: N/Ato 2 Points: N/Ato 2 Sponsor: N/Ato SponsorM
There should be a 'version' field in every JSON thing so that we can add things later if we need to.
There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.
The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)
Also, in order to conform to the JSON API standard, I need to change the following things:
The content-type apparently needs to be application/vnd.api+json (not application/json).
"Servers MUST respond with a 415 Unsupported Media Type status code if a request specifies the header Content-Type: application/vnd.api+json with any media type parameters."
"Servers MUST respond with a 406 Not Acceptable status code if a request’s Accept header contains the JSON API media type and all instances of that media type are modified with media type parameters."
"A document MUST contain at least one of the following top-level members:
data: the document’s “primary data”
errors: an array of error objects
meta: a meta object that contains non-standard meta-information."
"The members data and errors MUST NOT coexist in the same document."
"Primary data MUST be […] a single resource object […]
"A resource object MUST contain at least the following top-level members:
id
type
Exception: The id member is not required when the resource object originates at the client and represents a new resource to be created on the server."
"In addition, a resource object MAY contain any of these top-level members:
attributes: an attributes object representing some of the resource’s data."
"The value of the attributes key MUST be an object (an “attributes object”). Members of the attributes object (“attributes”) represent information about the resource object in which it’s defined. Attributes may contain any valid JSON value."
"A JSON API document MAY include information about its implementation under a top level jsonapi member. If present, the value of the jsonapi member MUST be an object (a “jsonapi object”). The jsonapi object MAY contain a version member whose value is a string indicating the highest JSON API version supported."
"A server MUST return 403 Forbidden in response to an unsupported request to create a resource with a client-generated ID." (for the POST part)
"Error objects provide additional information about problems encountered while performing an operation. Error objects MUST be returned as an array keyed by errors in the top level of a JSON API document.
An error object MAY have the following members:
id: a unique identifier for this particular occurrence of the problem.
links: a links object containing the following members:
about: a link that leads to further details about this particular occurrence of the problem.
status: the HTTP status code applicable to this problem, expressed as a string value.
code: an application-specific error code, expressed as a string value.
title: a short, human-readable summary of the problem that SHOULD NOT change from occurrence to occurrence of the problem, except for purposes of localization.
detail: a human-readable explanation specific to this occurrence of the problem. Like title, this field’s value can be localized.
source: an object containing references to the source of the error, optionally including any of the following members:
- pointer: a JSON Pointer [RFC6901] to the associated entity in the request document [e.g. "/data" for a primary data object, or "/data/attributes/title" for a specific attribute].
- parameter: a string indicating which URI query parameter caused the error.
meta: a meta object containing non-standard meta-information about the error."
Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.
There should be a 'version' field in every JSON thing so that we can add things later if we need to.
That seems like a good idea even if we never end up changing it (with JSON you have a lot of flexibility to add fields as needed without necessarily bumping the version).
There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.
Seems like a good idea and having it may also aid in debugging the protocol.
The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)
I like the idea of putting the response in the POST request, but maybe it is small so it doesn't matter much? That raises another issue: Kathy and I don't know exactly what the response will look like. Will all of the CAPTCHAs be similar to the ones that are currently used by bridgedb? E.g.:
https://bridges.torproject.org/bridges?transport=obfs4
In any case, Tor Launcher will need to know how to present the image and what kind of response to ask the user for (e.g., text).
Also, maybe the server should return the "Enter the characters from the image above" prompt text so the server has more flexibility about that form of the CAPTCHA (or is the in the challenge field?) One challenge is localization of the prompt. The server could respect the Accept-Language header, but that is not necessarily correct for Tor Browser due to our English language spoofing feature. Hmmm.
Also, in order to conform to the JSON API standard, I need to change the following things:
...
That looks like a lot of complexity, but it might be worthwhile if we are going to have more JSON messages. I assume we will, e.g., Tor Launcher will request bridges from moat via a JSON request.
Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.
I was going to ask about that... requesting a CAPTCHA and solving it is not useful unless the server that cares about the CAPTCHA is involved, right? Is there a document available that shows how the pieces will fit together? If not, we should create one.
One more comment on your proposed API: Kathy and I would prefer error codes (numbers) rather than error strings, or at least a code plus a string. We will want to localize the errors that are displayed by Tor Launcher. It looks like the JSON API format includes this concept because error responses have a code as well as title and detail strings.
anon-connection-wizard which is the Python-clone of Tor launcher is looking forward to the implementation of this BridgeDB API, too!
You're welcome!
This API won't be publicly accessible though, it'll be reachable through the API for #22871 (moved), and even then it's only reachable through a special meek reflector as part of #16650 (moved).
Is anon-connection-wizard what Tails uses now? I'd be happy to support Tails as well (but I'd strongly prefer the connection to go through the meek reflector).
There should be a 'version' field in every JSON thing so that we can add things later if we need to.
That seems like a good idea even if we never end up changing it (with JSON you have a lot of flexibility to add fields as needed without necessarily bumping the version).
There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.
Seems like a good idea and having it may also aid in debugging the protocol.
The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)
I like the idea of putting the response in the POST request, but maybe it is small so it doesn't matter much? That raises another issue: Kathy and I don't know exactly what the response will look like. Will all of the CAPTCHAs be similar to the ones that are currently used by bridgedb? E.g.:
https://bridges.torproject.org/bridges?transport=obfs4
Yes. They will be base64-encoded, jpeg images, 400px x 125px. (Is there something that would work better?)
In any case, Tor Launcher will need to know how to present the image and what kind of response to ask the user for (e.g., text).
Also, maybe the server should return the "Enter the characters from the image above" prompt text so the server has more flexibility about that form of the CAPTCHA (or is the in the challenge field?) One challenge is localization of the prompt. The server could respect the Accept-Language header, but that is not necessarily correct for Tor Browser due to our English language spoofing feature. Hmmm.
Hmmm. I am also not sure what to do here. I think probably, because TB's translation mechanics are quite different to BridgeDB's, it might be best to have TB create the string and localise it?
Also, in order to conform to the JSON API standard, I need to change the following things:
...
That looks like a lot of complexity, but it might be worthwhile if we are going to have more JSON messages. I assume we will, e.g., Tor Launcher will request bridges from moat via a JSON request.
Yeah, it's slightly complex, but it has everything we need and then we don't ever have to argue (also trying to think about the future when some router or VPN company wants to use this API) about who is doing things correctly. (Also it's less work for me to just use the JSON-API spec rather than write and maintain my own, more minimal one.)
Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.
I was going to ask about that... requesting a CAPTCHA and solving it is not useful unless the server that cares about the CAPTCHA is involved, right? Is there a document available that shows how the pieces will fit together? If not, we should create one.
I created a farfetchd API spec and there's parts of it incorporated into the overall (draft) moat spec. Let me know if you run across any more issues.
One more comment on your proposed API: Kathy and I would prefer error codes (numbers) rather than error strings, or at least a code plus a string. We will want to localize the errors that are displayed by Tor Launcher. It looks like the JSON API format includes this concept because error responses have a code as well as title and detail strings.
Makes sense. I'll rely on unique codes for different problems, that way you won't have to parse error strings.