Opened 3 years ago

Last modified 13 months ago

#15967 needs_revision enhancement

Separate BridgeDB's CAPTCHA into another service

Reported by: isis Owned by: isis
Priority: Medium Milestone:
Component: Obfuscation/BridgeDB Version:
Severity: Normal Keywords: bridgedb-https captcha tor-launcher ooni-probe
Cc: isis, mikeperry, hellais, brade, mcs, iry Actual Points: 2
Parent ID: Points: 2
Reviewer: Sponsor: SponsorM

Description

This was first requested when my GSoC student a couple summers ago was hacking on a Twitter bridge distributor, so that twitter requests for bridges could use the CAPTCHAs to decrease automated requests.

Last week, Mike Perry requested this, as part adding a mechanism to get new bridges directly from Tor Launcher.

Finally, Arturo also requested this today so that OONI probes running in censored countries can have an interface for getting bridges:

23:37          hellais  | anyways it would be nice to have an API where I send a HTTP request and I get back some JSON with the captcha encoded in base64 and I can send back the solution to get bridges
23:38             isis  | yep, that's exactly what we're going to do :)
23:39             isis  | except we hadn't exactly decided on JSON, but yeah
23:39             isis  | the captcha image is already base64, btw
23:42             isis  | hellais: would these be bridges for bridge_reachability tests, or bridges just to get an ooniprobe capable of connecting to some (hopefully)-known-good-and-not-filtered version of the 
                          internet, for like submitting reports and stuff
23:43             isis  | hellais: err, that was meant as a question
23:46          hellais  | the second
23:46          hellais  | I would like to have this: https://gist.github.com/hellais/995793f88fc42727fb92
23:46          hellais  | run when ooniprobe is first installed to check if the user needs some bridges
23:47          hellais  | I think I'll put it in the setup.py or something
23:47          hellais  | since ooniprobe relies on tor for reporting without a working tor, we can't collect the reports
23:48          hellais  | parsing HTML with standard python libraries is just a bit messy and I was wondering if there was a better way
00:10          hellais  | anyways I think I will end up making something that starts a web server and makes the user solve the CAPTCHA on the local webserver. Having to open a JPG file and input it into a shell 
                          is so uncomfortable
00:13          hellais  | but the JSON API would be nice nonetheless

Child Tickets

Change History (9)

comment:1 Changed 3 years ago by isis

FWIW, Mike mentioned also wanting the API to present a JSON-RPC interface.

comment:2 Changed 16 months ago by mcs

Cc: brade mcs added
Severity: Normal

comment:3 Changed 15 months ago by isis

Actual Points: 2
Points: 2
Sponsor: SponsorM
Status: newneeds_review

I made a CAPTCHA server here: https://github.com/isislovecruft/farfetchd

It has a JSON API:

1) GET /fetch will return JSON in the form:

{
  'image': null or base64-encoded jpeg image,
  'challenge': null or url-safe base64-encoded challenge string,
  'error': null or ascii-encoded string describing the error,
}

2) POST /check?data=[…] where the data url parameter is a JSON string in the following form:

{
  'challenge': base64-encoded challenge string (from the above response),
  'response': base64-encoded response (i.e. the CAPTCHA solution),
}

The farfetched server will attempt to verify the challenge response, and replies with JSON in the following form:

{
  'result': bool,
  'error': null or base64-encoded string describing the error,
}

Please let me know if this API seems like it'll work on the Tor Browser side, or if there's any way I could make it easier to process the data and/or hand it back and forth.

comment:4 Changed 15 months ago by isis

Couple things I realised I should do:

  • There should be a 'version' field in every JSON thing so that we can add things later if we need to.
  • There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.
  • The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)

Also, in order to conform to the JSON API standard, I need to change the following things:

  • The content-type apparently needs to be application/vnd.api+json (not application/json).
  • "Servers MUST respond with a 415 Unsupported Media Type status code if a request specifies the header Content-Type: application/vnd.api+json with any media type parameters."
  • "Servers MUST respond with a 406 Not Acceptable status code if a request’s Accept header contains the JSON API media type and all instances of that media type are modified with media type parameters."
  • "A document MUST contain at least one of the following top-level members:
    • data: the document’s “primary data”
    • errors: an array of error objects
    • meta: a meta object that contains non-standard meta-information."
  • "The members data and errors MUST NOT coexist in the same document."
  • "Primary data MUST be […] a single resource object […]
  • "A resource object MUST contain at least the following top-level members:
    • id
    • type
    Exception: The id member is not required when the resource object originates at the client and represents a new resource to be created on the server."
  • "In addition, a resource object MAY contain any of these top-level members:
    • attributes: an attributes object representing some of the resource’s data."
  • "The value of the attributes key MUST be an object (an “attributes object”). Members of the attributes object (“attributes”) represent information about the resource object in which it’s defined. Attributes may contain any valid JSON value."
  • "A JSON API document MAY include information about its implementation under a top level jsonapi member. If present, the value of the jsonapi member MUST be an object (a “jsonapi object”). The jsonapi object MAY contain a version member whose value is a string indicating the highest JSON API version supported."
  • "A server MUST return 403 Forbidden in response to an unsupported request to create a resource with a client-generated ID." (for the POST part)
  • "Error objects provide additional information about problems encountered while performing an operation. Error objects MUST be returned as an array keyed by errors in the top level of a JSON API document. An error object MAY have the following members:
    • id: a unique identifier for this particular occurrence of the problem.
    • links: a links object containing the following members:
      • about: a link that leads to further details about this particular occurrence of the problem.
    • status: the HTTP status code applicable to this problem, expressed as a string value.
    • code: an application-specific error code, expressed as a string value.
    • title: a short, human-readable summary of the problem that SHOULD NOT change from occurrence to occurrence of the problem, except for purposes of localization.
    • detail: a human-readable explanation specific to this occurrence of the problem. Like title, this field’s value can be localized.
    • source: an object containing references to the source of the error, optionally including any of the following members:
      • pointer: a JSON Pointer [RFC6901] to the associated entity in the request document [e.g. "/data" for a primary data object, or "/data/attributes/title" for a specific attribute].
      • parameter: a string indicating which URI query parameter caused the error.
    • meta: a meta object containing non-standard meta-information about the error."

Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.

comment:5 Changed 15 months ago by isis

Status: needs_reviewneeds_revision

comment:6 in reply to:  4 ; Changed 15 months ago by mcs

Replying to isis:

Couple things I realised I should do:

  • There should be a 'version' field in every JSON thing so that we can add things later if we need to.

That seems like a good idea even if we never end up changing it (with JSON you have a lot of flexibility to add fields as needed without necessarily bumping the version).

  • There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.

Seems like a good idea and having it may also aid in debugging the protocol.

  • The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)

I like the idea of putting the response in the POST request, but maybe it is small so it doesn't matter much? That raises another issue: Kathy and I don't know exactly what the response will look like. Will all of the CAPTCHAs be similar to the ones that are currently used by bridgedb? E.g.:

https://bridges.torproject.org/bridges?transport=obfs4

In any case, Tor Launcher will need to know how to present the image and what kind of response to ask the user for (e.g., text).

Also, maybe the server should return the "Enter the characters from the image above" prompt text so the server has more flexibility about that form of the CAPTCHA (or is the in the challenge field?) One challenge is localization of the prompt. The server could respect the Accept-Language header, but that is not necessarily correct for Tor Browser due to our English language spoofing feature. Hmmm.

Also, in order to conform to the JSON API standard, I need to change the following things:
...

That looks like a lot of complexity, but it might be worthwhile if we are going to have more JSON messages. I assume we will, e.g., Tor Launcher will request bridges from moat via a JSON request.

Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.

I was going to ask about that... requesting a CAPTCHA and solving it is not useful unless the server that cares about the CAPTCHA is involved, right? Is there a document available that shows how the pieces will fit together? If not, we should create one.

One more comment on your proposed API: Kathy and I would prefer error codes (numbers) rather than error strings, or at least a code plus a string. We will want to localize the errors that are displayed by Tor Launcher. It looks like the JSON API format includes this concept because error responses have a code as well as title and detail strings.

comment:7 Changed 14 months ago by iry

Cc: iry added

Thank you very much for your work!

anon-connection-wizard which is the Python-clone of Tor launcher is looking forward to the implementation of this BridgeDB API, too!

comment:8 in reply to:  7 Changed 14 months ago by isis

Replying to iry:

Thank you very much for your work!

anon-connection-wizard which is the Python-clone of Tor launcher is looking forward to the implementation of this BridgeDB API, too!


You're welcome!

This API won't be publicly accessible though, it'll be reachable through the API for #22871, and even then it's only reachable through a special meek reflector as part of #16650.

Is anon-connection-wizard what Tails uses now? I'd be happy to support Tails as well (but I'd strongly prefer the connection to go through the meek reflector).

comment:9 in reply to:  6 Changed 13 months ago by isis

Replying to mcs:

Replying to isis:

Couple things I realised I should do:

  • There should be a 'version' field in every JSON thing so that we can add things later if we need to.

That seems like a good idea even if we never end up changing it (with JSON you have a lot of flexibility to add fields as needed without necessarily bumping the version).

  • There should probably be a 'type' field in every JSON thing so that we know which part of the protocol it is.

Seems like a good idea and having it may also aid in debugging the protocol.

  • The JSON in the data URL parameter should be en-/de- coded as URL-safe base64. (And it should probably just be in the body of the POST request?)

I like the idea of putting the response in the POST request, but maybe it is small so it doesn't matter much? That raises another issue: Kathy and I don't know exactly what the response will look like. Will all of the CAPTCHAs be similar to the ones that are currently used by bridgedb? E.g.:

https://bridges.torproject.org/bridges?transport=obfs4


Yes. They will be base64-encoded, jpeg images, 400px x 125px. (Is there something that would work better?)

In any case, Tor Launcher will need to know how to present the image and what kind of response to ask the user for (e.g., text).

Also, maybe the server should return the "Enter the characters from the image above" prompt text so the server has more flexibility about that form of the CAPTCHA (or is the in the challenge field?) One challenge is localization of the prompt. The server could respect the Accept-Language header, but that is not necessarily correct for Tor Browser due to our English language spoofing feature. Hmmm.


Hmmm. I am also not sure what to do here. I think probably, because TB's translation mechanics are quite different to BridgeDB's, it might be best to have TB create the string and localise it?

Also, in order to conform to the JSON API standard, I need to change the following things:
...

That looks like a lot of complexity, but it might be worthwhile if we are going to have more JSON messages. I assume we will, e.g., Tor Launcher will request bridges from moat via a JSON request.


Yeah, it's slightly complex, but it has everything we need and then we don't ever have to argue (also trying to think about the future when some router or VPN company wants to use this API) about who is doing things correctly. (Also it's less work for me to just use the JSON-API spec rather than write and maintain my own, more minimal one.)

Also, it just occurred to me that Tor Launcher should probably just talk to the moat server, which will talk to farfetchd. For the CAPTCHA stuff moat will just be passing things between Tor Launcher and farfetchd so the concerns about the API here are still relevant.

I was going to ask about that... requesting a CAPTCHA and solving it is not useful unless the server that cares about the CAPTCHA is involved, right? Is there a document available that shows how the pieces will fit together? If not, we should create one.


I created a farfetchd API spec and there's parts of it incorporated into the overall (draft) moat spec. Let me know if you run across any more issues.

One more comment on your proposed API: Kathy and I would prefer error codes (numbers) rather than error strings, or at least a code plus a string. We will want to localize the errors that are displayed by Tor Launcher. It looks like the JSON API format includes this concept because error responses have a code as well as title and detail strings.


Makes sense. I'll rely on unique codes for different problems, that way you won't have to parse error strings.

Note: See TracTickets for help on using tickets.