There are companies - such as CloudFlare - which are effectively now Global Active Adversaries. Using CF as an example - they do not appear open to working together in open dialog, they actively make it nearly impossible to browse to certain websites, they collude with larger surveillance companies (like Google), their CAPTCHAs are awful, they block members of our community on social media rather than engaging with them and frankly, they run untrusted code in millions of browsers on the web for questionable security gains.
It would be great if they allowed GET requests - for example - such requests should not and generally do not modify server side content. They do not do this - this breaks the web in so many ways, it is incredible. Using wget with Tor on a website hosted by CF is... a disaster. Using Tor Browser with it - much the same. These requests should be idempotent according to spec, I believe.
I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web. When tied with Google, it seems like a basic analytics problem to enumerate users and most sites visited in a given session.
One way - I think - would be to create a warning page upon detection of a CF edge or captcha challenge. This could be similar to an SSL/TLS warning dialog - with an option for users to bypass, engage with their systems or an option to contact them or the site's owners or to hit a cached version, read only version of the website that is on archive.org, archive.is or other caching systems. That would ensure that millions of users would be able to engage with informed consent before they're tagged, tracked and potentially deanonymized. TBB can protect against some of this - of course - but when all your edge nodes are run by one organization that can see plaintext, ip addresses, identifiers and so on - the protection is reduced. It is an open research question how badly it is reduced but intuitively, I think there is a reduction in anonymity.
It would be great to find a solution that allows TBB users to use the web without changes on our end - where they can solve one captcha, if required - perhaps not even prompting for GET requests, for example. Though in any case - I think we have to consider that there is a giant amount of data at CF - and we should ensure that it does not harm end users. I believe CF would share this goal if we explain that we're all interested in protecting users - both those hosting and those using the websites.
Some open questions:
What kind of per browser session tracking is actually happening?
What other options do we have on the TBB side?
What would a reasonable solution look like for a company like Cloudflare?
What is reasonable for a user to do? (~17 CAPTCHAs for one site == not reasonable)
Would "Warning this site is under surveillance by Cloudflare" be a reasonable warning or should we make it more general?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.
I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.
I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.
A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea. I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:
There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?
In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.
Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.
Could you please ask your employer or other coworkers to come and talk with us openly? Many members of our community, some which are also your (server side) users, are extremely frustrated. It is in the best interest of everyone to help find a solution for those users.
I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.
What specifically is political versus technical? That CF is now a GAA? That CF does indeed gather metrics? That CF does run untrusted (by me, or other users) in our browsers? That your metrics count as a kind of surveillance that is seemingly linked with a PRISM provider?
I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.
A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea.
What is the difference between one super cookie and ~1m cookies on a per site basis? The anonymity set appears to be strictly worse. Or do you guys not do any stats on the backend? Do you claim that you can't and don't link these things?
I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:
There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?
Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.
For such a user - how will you protect any information you've collected from them? Will that information be of higher value or richer technical information if there is a cookie (super, regular, whatever) tied to that data?
In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.
This feels like a trick question - behavioral analysis is in itself reducing the anonymity set by adding at least one bit of information. My guess is that it is a great deal more than a single bit - especially over time.
There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?
In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.
This sounds very much like something that could be provided through the use of zero-knowledge proofs. It doesn't seem clear to me that being able to say "this is an instance of tor which has already answered a bunch of captcha's" is actually useful. I think the main problem with captchas at this point is that robots are just about as good at answering them as humans. Apparently robots are worse than humans at building up tracked browser histories. That seems like a harder property for a tor user to prove.
What sort of data would qualify as an 'i'm a human' bit?
What sort of data would qualify as an 'i'm a human' bit?
I don't think DDoS should be based on identifying humans.
Bots are legitimate consumers of data as well, and in the future they might even be more intelligent than most humans today, so we might as well design our systems to be friendly for them.
DDoS is a supply/demand type of economic issue and any solutions should treat it as such.
Ultimately, I wonder if the point is simply to identify people - across browser sessions, across proxies, across Tor exits - and the start is the "I'm a human bit" - I wonder where does that end?
In a sense, I feel like this CF issue is like a giant Wifi Captive Portal for the web. It shims in some kind of "authentication" in a way that breaks many existing protocols and applications.
If I was logged into Google (as they use a Google Captcha...), could they vouch for my account and auto solve it? Effectively creating an ID system for the entire web where CF is the MITM for all the users visiting users cached/terminated by them? I think - yes to both - and that is concerning.
Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.
I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.
I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.
A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea. I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:
There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?
In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.
Yes. This is a problem that "Anonymous Credential" systems are designed to solve. A example of a system with most of the properties that are desired is presented in Au, M. H., Kapadia, A., Susilo, W., "BLACR: TTP-Free Blacklistable Anonymous Credentials with Reputation" (https://www.cs.indiana.edu/~kapadia/papers/blacr-ndss-draft.pdf). Note that this is still an active research area, and BLACR it of itself may not be practical/feasible to implement, and is listed only as an example since the paper gives a good overview of the problem and how this kind of primitive can be used to solve the problem.
Isis can go into more details on this sort of thing, since she was trying to implement a similar thing based on Mozilla Persona (aborted attempt due to Mozilla Persona being crap).
Trac: Cc: arthuredelstein to arthuredelstein, isis
@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.
Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.
There are a number of problems with this model.
(POST is hard) First, what actually the proxy should do on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.
(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many businesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.
(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.
The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they ARE often used for unwanted activity.
Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.
There are a number of problems with this model.
(POST is hard) First, what actually the proxy should do on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.
CloudFlare is in a position to inject JavaScript into sites. Why not hook requests that would result in a POST and challenge after say, clicking the submit button?
What sort of data would qualify as an 'i'm a human' bit?
Let's start with something not-worse than now: a captcha solved in last minutes.
Is this something that CloudFlare has actually found effective? Are there metrics on how many challenged requests that successfully solved a CAPTCHA turned out to actually be malicious?
CloudFlare is in a position to inject JavaScript into sites
This alone should be reason enough for the security warning. People might be viewing sites which they believe to be in a different jurisdiction and suddenly giving control to a US entity.
To quantify the scope of the problem slightly, a few weeks ago I measured that 10% of the Alexa top 25k are behind Cloudflare.
It would be helpful if we had a nice, well written, easy to understand explanation of the problem that we could give to site owners. Of those that I have contacted, some get it and adjust things quickly, but some struggle to understand what the problem is.
@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.
I'm unclear on what I've said or done that is insulting you? Could you clarify? It certainly isn't my attempt or intent to insult you.
What is my opinion and what is technical reality? Could you enumerate that a bit? I've asked many questions and it is important that we discuss the wide range of topics here.
Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.
There are a number of problems with this model.
There are a number of problems with the current model - to be clear - and so while there are downsides to the read-only GET suggestion, I think it would reduce nearly all complaints by end users.
(POST is hard) First, what actually the proxy should do on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.
Off the top of my head - to ensure I reply to everything you've written:
It seems reasonable in many cases to redirect them on pages where this is a relevant concern? POST fails, failure page asks for a captcha solution, etc.
(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many businesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.
Actually, a censorship page with specific information ala HTTP 451 would be a nearly in spec answer to this problem. Why not use that? You're performing geographic discrimination on behalf of your users - this censorship should be transparent. It should be clear that the site owner has decided to do this - and there is less of a need to solve a captcha by default.
Though in the case of Tor - you can't do this properly - which is a reason to specifically treat Tor users as special. Visitors may be in the region and Tor is properly hiding them. That is a point in the direction of having an interstitial page that allows a user to solve a captcha.
(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.
Why not just serve them an older cached copy?
The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they ARE often used for unwanted activity.
What sort of data would qualify as an 'i'm a human' bit?
Let's start with something not-worse than now: a captcha solved in last minutes.
This feels circular - one of the big problems is that users are unable to solve them after a dozen tries. We would not have as many complaining users if we could get this far, I think.
This sounds very much like something that could be provided through the use of zero-knowledge proofs
Yup. What do we do to implement one both on ddos protection side and on TBB side?
My first order proposition would be to solve a cached copy of the site in "read only" mode with no changes on the TBB side. We can get this from other third parties if CF doesn't want to serve it directly - that was part of my initial suggestion. Why not just serve that data directly?
Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?
That is, present a CAPTCHA only when:
the server owner has specifically requested that CAPTCHAs be used
the server is actively under DoS attack, and
the client's IP address is currently a source of the DoS.
I think it's hugely overkill to show CAPTCHAs all the time to all Tor users for every CloudFlare site. It's also unreasonable to maintain a "reputation" for a Tor exit node.
On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?
Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?
That is, present a CAPTCHA only when:
the server owner has specifically requested that CAPTCHAs be used
the server is actively under DoS attack, and
the client's IP address is currently a source of the DoS.
That seems interesting - I wish we had data to understand if these choices would help - it seems opaque how "threat scores" for IP addresses are computed. Is there any public information about it?
I think it's hugely overkill to show CAPTCHAs all the time to all Tor users for every CloudFlare site. It's also unreasonable to maintain a "reputation" for a Tor exit node.
I agree.
On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?
I'm also interested in understanding the dataflow - could the FBI go to Google to get data on all CloudFlare users? Does CF protect it? If so - who protects users more?