Issues with corporate censorship and mass surveillance

added component::applications/tor browser owner::cypherpunks points::1000 light years priority::very high severity::critical status::assigned type::defect labels

Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.

I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.

A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea. I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

Trac:
Username: marek

Trac:
Cc: N/A to arthuredelstein

Replying to marek:

Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.

Could you please ask your employer or other coworkers to come and talk with us openly? Many members of our community, some which are also your (server side) users, are extremely frustrated. It is in the best interest of everyone to help find a solution for those users.

I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.

What specifically is political versus technical? That CF is now a GAA? That CF does indeed gather metrics? That CF does run untrusted (by me, or other users) in our browsers? That your metrics count as a kind of surveillance that is seemingly linked with a PRISM provider?

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.

A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea.

What is the difference between one super cookie and ~1m cookies on a per site basis? The anonymity set appears to be strictly worse. Or do you guys not do any stats on the backend? Do you claim that you can't and don't link these things?

I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

For such a user - how will you protect any information you've collected from them? Will that information be of higher value or richer technical information if there is a cookie (super, regular, whatever) tied to that data?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

This feels like a trick question - behavioral analysis is in itself reducing the anonymity set by adding at least one bit of information. My guess is that it is a great deal more than a single bit - especially over time.

Trac:
Cc: arthuredelstein to N/A

Trac:
Cc: N/A to arthuredelstein

Replying to marek:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

This sounds very much like something that could be provided through the use of zero-knowledge proofs. It doesn't seem clear to me that being able to say "this is an instance of tor which has already answered a bunch of captcha's" is actually useful. I think the main problem with captchas at this point is that robots are just about as good at answering them as humans. Apparently robots are worse than humans at building up tracked browser histories. That seems like a harder property for a tor user to prove.

What sort of data would qualify as an 'i'm a human' bit?

What sort of data would qualify as an 'i'm a human' bit?

I don't think DDoS should be based on identifying humans.

Bots are legitimate consumers of data as well, and in the future they might even be more intelligent than most humans today, so we might as well design our systems to be friendly for them.

DDoS is a supply/demand type of economic issue and any solutions should treat it as such.

Trac:
Username: sordid

Ultimately, I wonder if the point is simply to identify people - across browser sessions, across proxies, across Tor exits - and the start is the "I'm a human bit" - I wonder where does that end?

In a sense, I feel like this CF issue is like a giant Wifi Captive Portal for the web. It shims in some kind of "authentication" in a way that breaks many existing protocols and applications.

If I was logged into Google (as they use a Google Captcha...), could they vouch for my account and auto solve it? Effectively creating an ID system for the entire web where CF is the MITM for all the users visiting users cached/terminated by them? I think - yes to both - and that is concerning.

cc-ing isis since this covers earlier work.

Replying to marek:

Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.

I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.

A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea. I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

Yes. This is a problem that "Anonymous Credential" systems are designed to solve. A example of a system with most of the properties that are desired is presented in Au, M. H., Kapadia, A., Susilo, W., "BLACR: TTP-Free Blacklistable Anonymous Credentials with Reputation" (https://www.cs.indiana.edu/~kapadia/papers/blacr-ndss-draft.pdf). Note that this is still an active research area, and BLACR it of itself may not be practical/feasible to implement, and is listed only as an example since the paper gives a good overview of the problem and how this kind of primitive can be used to solve the problem.

Isis can go into more details on this sort of thing, since she was trying to implement a similar thing based on Mozilla Persona (aborted attempt due to Mozilla Persona being crap).

Trac:
Cc: arthuredelstein to arthuredelstein, isis

CloudFlare grew out of the narcs at Crimeflare. Do not assume good faith.

@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

There are a number of problems with this model.

(POST is hard) First, what actually the proxy should do on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.

(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many businesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.

(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.

The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they ARE often used for unwanted activity.

@willscott:

What sort of data would qualify as an 'i'm a human' bit?

Let's start with something not-worse than now: a captcha solved in last minutes.

This sounds very much like something that could be provided through the use of zero-knowledge proofs

Yup. What do we do to implement one both on ddos protection side and on TBB side?

Trac:
Username: marek

Trac:

Replying to cypherpunks:

CloudFlare grew out of the narcs at Crimeflare. Do not assume good faith.

That's right.

Substantial work with government and law enforcement officials

 What sort of data would qualify as an 'i'm a human' bit?

Does it even matter? Most bots, all the crawlers, sites like archive.is, etc are all regularly allowed in on Cloudflare sites.

Replying to marek:

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

There are a number of problems with this model.

(POST is hard) First, what actually the proxy should do on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing. CloudFlare is in a position to inject JavaScript into sites. Why not hook requests that would result in a POST and challenge after say, clicking the submit button?

@willscott:

What sort of data would qualify as an 'i'm a human' bit?

Let's start with something not-worse than now: a captcha solved in last minutes. Is this something that CloudFlare has actually found effective? Are there metrics on how many challenged requests that successfully solved a CAPTCHA turned out to actually be malicious?

CloudFlare is in a position to inject JavaScript into sites

This alone should be reason enough for the security warning. People might be viewing sites which they believe to be in a different jurisdiction and suddenly giving control to a US entity.

To quantify the scope of the problem slightly, a few weeks ago I measured that 10% of the Alexa top 25k are behind Cloudflare.

It would be helpful if we had a nice, well written, easy to understand explanation of the problem that we could give to site owners. Of those that I have contacted, some get it and adjust things quickly, but some struggle to understand what the problem is.

Trac:
Username: wwaites

Replying to marek:

@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.

I'm unclear on what I've said or done that is insulting you? Could you clarify? It certainly isn't my attempt or intent to insult you.

What is my opinion and what is technical reality? Could you enumerate that a bit? I've asked many questions and it is important that we discuss the wide range of topics here.

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

There are a number of problems with this model.

There are a number of problems with the current model - to be clear - and so while there are downsides to the read-only GET suggestion, I think it would reduce nearly all complaints by end users.

(POST is hard) First, what actually the proxy should do on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.

Off the top of my head - to ensure I reply to everything you've written:

It seems reasonable in many cases to redirect them on pages where this is a relevant concern? POST fails, failure page asks for a captcha solution, etc.

(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many businesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.

Actually, a censorship page with specific information ala HTTP 451 would be a nearly in spec answer to this problem. Why not use that? You're performing geographic discrimination on behalf of your users - this censorship should be transparent. It should be clear that the site owner has decided to do this - and there is less of a need to solve a captcha by default.

Though in the case of Tor - you can't do this properly - which is a reason to specifically treat Tor users as special. Visitors may be in the region and Tor is properly hiding them. That is a point in the direction of having an interstitial page that allows a user to solve a captcha.

(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.

Why not just serve them an older cached copy?

The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they ARE often used for unwanted activity.

Do you have any open data on this?

@willscott:

What sort of data would qualify as an 'i'm a human' bit?

Let's start with something not-worse than now: a captcha solved in last minutes.

This feels circular - one of the big problems is that users are unable to solve them after a dozen tries. We would not have as many complaining users if we could get this far, I think.

This sounds very much like something that could be provided through the use of zero-knowledge proofs

Yup. What do we do to implement one both on ddos protection side and on TBB side?

My first order proposition would be to solve a cached copy of the site in "read only" mode with no changes on the TBB side. We can get this from other third parties if CF doesn't want to serve it directly - that was part of my initial suggestion. Why not just serve that data directly?

Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?

That is, present a CAPTCHA only when:

the server owner has specifically requested that CAPTCHAs be used
the server is actively under DoS attack, and
the client's IP address is currently a source of the DoS.

I think it's hugely overkill to show CAPTCHAs all the time to all Tor users for every CloudFlare site. It's also unreasonable to maintain a "reputation" for a Tor exit node.

On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?

Replying to arthuredelstein:

Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?

That is, present a CAPTCHA only when:

the server owner has specifically requested that CAPTCHAs be used

the server is actively under DoS attack, and

the client's IP address is currently a source of the DoS.

That seems interesting - I wish we had data to understand if these choices would help - it seems opaque how "threat scores" for IP addresses are computed. Is there any public information about it?

I think it's hugely overkill to show CAPTCHAs all the time to all Tor users for every CloudFlare site. It's also unreasonable to maintain a "reputation" for a Tor exit node.

I agree.

On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?

I'm also interested in understanding the dataflow - could the FBI go to Google to get data on all CloudFlare users? Does CF protect it? If so - who protects users more?

Issues with corporate censorship and mass surveillance

Child items 0

Activity