Opened 7 months ago

Last modified 2 weeks ago

#18361 new enhancement

Issues with corporate censorship and mass surveillance

Reported by: ioerror Owned by: tbb-team
Priority: High Milestone:
Component: Applications/Tor Browser Version:
Severity: Critical Keywords: security, privacy, anonymity
Cc: arthuredelstein, jeroen@…, torry, saint, tne Actual Points:
Parent ID: Points:
Reviewer: Sponsor: None

Description

There are companies - such as CloudFlare - which are effectively now Global Active Adversaries. Using CF as an example - they do not appear open to working together in open dialog, they actively make it nearly impossible to browse to certain websites, they collude with larger surveillance companies (like Google), their CAPTCHAs are awful, they block members of our community on social media rather than engaging with them and frankly, they run untrusted code in millions of browsers on the web for questionable security gains.

It would be great if they allowed GET requests - for example - such requests should not and generally do not modify server side content. They do not do this - this breaks the web in so many ways, it is incredible. Using wget with Tor on a website hosted by CF is... a disaster. Using Tor Browser with it - much the same. These requests should be idempotent according to spec, I believe.

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web. When tied with Google, it seems like a basic analytics problem to enumerate users and most sites visited in a given session.

One way - I think - would be to create a warning page upon detection of a CF edge or captcha challenge. This could be similar to an SSL/TLS warning dialog - with an option for users to bypass, engage with their systems or an option to *contact them* or the *site's owners* or to hit a cached version, read only version of the website that is on archive.org, archive.is or other caching systems. That would ensure that *millions* of users would be able to engage with informed consent before they're tagged, tracked and potentially deanonymized. TBB can protect against some of this - of course - but when all your edge nodes are run by one organization that can see plaintext, ip addresses, identifiers and so on - the protection is reduced. It is an open research question how badly it is reduced but intuitively, I think there is a reduction in anonymity.

It would be great to find a solution that allows TBB users to use the web without changes on our end - where they can solve one captcha, if required - perhaps not even prompting for GET requests, for example. Though in any case - I think we have to consider that there is a giant amount of data at CF - and we should ensure that it does not harm end users. I believe CF would share this goal if we explain that we're all interested in protecting users - both those hosting and those using the websites.

Some open questions:

  • What kind of per browser session tracking is actually happening?
  • What other options do we have on the TBB side?
  • What would a reasonable solution look like for a company like Cloudflare?
  • What is reasonable for a user to do? (~17 CAPTCHAs for one site == not reasonable)
  • Would "Warning this site is under surveillance by Cloudflare" be a reasonable warning or should we make it more general?

Child Tickets

Attachments (1)

CloudFlareNarc.gif (30.9 KB) - added by cypherpunks 7 months ago.

Download all attachments as: .zip

Change History (240)

comment:1 follow-ups: Changed 7 months ago by marek

Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.

I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.

A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea. I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

Last edited 7 months ago by marek (previous) (diff)

comment:2 Changed 7 months ago by arthuredelstein

  • Cc arthuredelstein added

comment:3 in reply to: ↑ 1 Changed 7 months ago by ioerror

  • Cc arthuredelstein removed

Replying to marek:

Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.

Could you please ask your employer or other coworkers to come and talk with us openly? Many members of our community, some which are also your (server side) users, are extremely frustrated. It is in the best interest of everyone to help find a solution for those users.

I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.

What specifically is political versus technical? That CF is now a GAA? That CF does indeed gather metrics? That CF does run untrusted (by me, or other users) in our browsers? That your metrics count as a kind of surveillance that is seemingly linked with a PRISM provider?

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.

A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea.

What is the difference between one super cookie and ~1m cookies on a per site basis? The anonymity set appears to be *strictly* worse. Or do you guys not do any stats on the backend? Do you claim that you can't and don't link these things?

I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

For such a user - how will you protect any information you've collected from them? Will that information be of higher value or richer technical information if there is a cookie (super, regular, whatever) tied to that data?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

This feels like a trick question - behavioral analysis is in itself reducing the anonymity set by adding at least one bit of information. My guess is that it is a great deal more than a single bit - especially over time.

comment:4 Changed 7 months ago by ioerror

  • Cc arthuredelstein added

comment:5 in reply to: ↑ 1 ; follow-up: Changed 7 months ago by willscott

Replying to marek:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

This sounds very much like something that could be provided through the use of zero-knowledge proofs. It doesn't seem clear to me that being able to say "this is an instance of tor which has already answered a bunch of captcha's" is actually useful. I think the main problem with captchas at this point is that robots are just about as good at answering them as humans. Apparently robots are worse than humans at building up tracked browser histories. That seems like a harder property for a tor user to prove.

What sort of data would qualify as an 'i'm a human' bit?

comment:6 Changed 7 months ago by sordid

What sort of data would qualify as an 'i'm a human' bit?

I don't think DDoS should be based on identifying humans.

Bots are legitimate consumers of data as well, and in the future they might even be more intelligent than most humans today, so we might as well design our systems to be friendly for them.

DDoS is a supply/demand type of economic issue and any solutions should treat it as such.

Last edited 7 months ago by sordid (previous) (diff)

comment:7 Changed 7 months ago by ioerror

Ultimately, I wonder if the point is simply to identify people - across browser sessions, across proxies, across Tor exits - and the start is the "I'm a human bit" - I wonder where does that end?

In a sense, I feel like this CF issue is like a giant Wifi Captive Portal for the web. It shims in some kind of "authentication" in a way that breaks many existing protocols and applications.

If I was logged into Google (as they use a Google Captcha...), could they vouch for my account and auto solve it? Effectively creating an ID system for the entire web where CF is the MITM for all the users visiting users cached/terminated by them? I think - yes to both - and that is concerning.

comment:8 in reply to: ↑ 1 ; follow-up: Changed 7 months ago by yawning

  • Cc isis added

cc-ing isis since this covers earlier work.

Replying to marek:

Disclaimer: I work for CloudFlare. Disclaimer: Comments here are opinions of myself, not my employer.

I will restrain myself and not comment on the political issues Jacob raised. I'll keep it technical.

I would like to find a solution with Cloudflare - but I'm unclear that the correct answer is to create a single cookie that is shared across all sessions - this effectively links all browsing for the web.

A thousand times yes. I raised this option a couple times (supercookie) and we agreed this is a bad idea. I believe there is a cryptographic solution to this. I'm not a crypto expert, so I'll allow others to explain this. Let's define a problem:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

Yes. This is a problem that "Anonymous Credential" systems are designed to solve. A example of a system with most of the properties that are desired is presented in Au, M. H., Kapadia, A., Susilo, W., "BLACR: TTP-Free Blacklistable Anonymous Credentials with Reputation" (https://www.cs.indiana.edu/~kapadia/papers/blacr-ndss-draft.pdf). Note that this is still an active research area, and BLACR it of itself may not be practical/feasible to implement, and is listed only as an example since the paper gives a good overview of the problem and how this kind of primitive can be used to solve the problem.

Isis can go into more details on this sort of thing, since she was trying to implement a similar thing based on Mozilla Persona (aborted attempt due to Mozilla Persona being crap).

comment:9 follow-up: Changed 7 months ago by cypherpunks

CloudFlare grew out of the narcs at Crimeflare. Do not assume good faith.

comment:10 follow-ups: Changed 7 months ago by marek

@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

There are a number of problems with this model.

(POST is hard) First, what actually the proxy should *do* on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.

(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many businesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.

(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.

The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they _ARE_ often used for unwanted activity.

@willscott:

What sort of data would qualify as an 'i'm a human' bit?

Let's start with something not-worse than now: a captcha solved in last <XX> minutes.

This sounds very much like something that could be provided through the use of zero-knowledge proofs

Yup. What do we do to implement one both on ddos protection side and on TBB side?

Changed 7 months ago by cypherpunks

comment:11 in reply to: ↑ 9 Changed 7 months ago by cypherpunks

Replying to cypherpunks:

CloudFlare grew out of the narcs at Crimeflare. Do not assume good faith.

That's right.

Substantial work with government and law enforcement officials

https://trac.torproject.org/projects/tor/attachment/ticket/18361/CloudFlareNarc.gif

comment:12 Changed 7 months ago by cypherpunks

 What sort of data would qualify as an 'i'm a human' bit?

Does it even matter? Most bots, all the crawlers, sites like archive.is, etc are all regularly allowed in on Cloudflare sites.

comment:13 in reply to: ↑ 10 Changed 7 months ago by cypherpunks

Replying to marek:

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

There are a number of problems with this model.

(POST is hard) First, what actually the proxy should *do* on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.

CloudFlare is in a position to inject JavaScript into sites. Why not hook requests that would result in a POST and challenge after say, clicking the submit button?

@willscott:

What sort of data would qualify as an 'i'm a human' bit?

Let's start with something not-worse than now: a captcha solved in last <XX> minutes.

Is this something that CloudFlare has actually found effective? Are there metrics on how many challenged requests that successfully solved a CAPTCHA turned out to actually be malicious?

comment:14 Changed 7 months ago by cypherpunks

CloudFlare is in a position to inject JavaScript into sites

This alone should be reason enough for the security warning. People might be viewing sites which they believe to be in a different jurisdiction and suddenly giving control to a US entity.

comment:15 Changed 7 months ago by wwaites

To quantify the scope of the problem slightly, a few weeks ago I measured that 10% of the Alexa top 25k are behind Cloudflare.

It would be helpful if we had a nice, well written, easy to understand explanation of the problem that we could give to site owners. Of those that I have contacted, some get it and adjust things quickly, but some struggle to understand what the problem is.

comment:16 in reply to: ↑ 10 Changed 7 months ago by ioerror

Replying to marek:

@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.

I'm unclear on what I've said or done that is insulting you? Could you clarify? It certainly isn't my attempt or intent to insult you.

What is my opinion and what is technical reality? Could you enumerate that a bit? I've asked many questions and it is important that we discuss the wide range of topics here.

Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests.

There are a number of problems with this model.

There are a number of problems with the current model - to be clear - and so while there are downsides to the read-only GET suggestion, I think it would reduce nearly all complaints by end users.

(POST is hard) First, what actually the proxy should *do* on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.

Off the top of my head - to ensure I reply to everything you've written:

It seems reasonable in many cases to redirect them on pages where this is a relevant concern? POST fails, failure page asks for a captcha solution, etc.

(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many businesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.

Actually, a censorship page with specific information ala HTTP 451 would be a nearly in spec answer to this problem. Why not use that? You're performing geographic discrimination on behalf of your users - this censorship should be transparent. It should be clear that the site owner has decided to do this - and there is less of a need to solve a captcha by default.

Though in the case of Tor - you can't do this properly - which is a reason to specifically treat Tor users as special. Visitors may be in the region and Tor is properly hiding them. That is a point in the direction of having an interstitial page that allows a user to solve a captcha.

(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.

Why not just serve them an older cached copy?

The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they _ARE_ often used for unwanted activity.

Do you have any open data on this?

@willscott:

What sort of data would qualify as an 'i'm a human' bit?

Let's start with something not-worse than now: a captcha solved in last <XX> minutes.

This feels circular - one of the big problems is that users are unable to solve them after a dozen tries. We would not have as many complaining users if we could get this far, I think.

This sounds very much like something that could be provided through the use of zero-knowledge proofs

Yup. What do we do to implement one both on ddos protection side and on TBB side?

My first order proposition would be to solve a cached copy of the site in "read only" mode with no changes on the TBB side. We can get this from other third parties if CF doesn't want to serve it directly - that was part of my initial suggestion. Why not just serve that data directly?

comment:17 follow-up: Changed 7 months ago by arthuredelstein

Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?

That is, present a CAPTCHA only when:

  1. the server owner has specifically requested that CAPTCHAs be used
  2. the server is actively under DoS attack, and
  3. the client's IP address is currently a source of the DoS.

I think it's hugely overkill to show CAPTCHAs all the time to all Tor users for every CloudFlare site. It's also unreasonable to maintain a "reputation" for a Tor exit node.

On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?

comment:18 in reply to: ↑ 17 Changed 7 months ago by ioerror

Replying to arthuredelstein:

Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?

That is, present a CAPTCHA only when:

  1. the server owner has specifically requested that CAPTCHAs be used
  2. the server is actively under DoS attack, and
  3. the client's IP address is currently a source of the DoS.

That seems interesting - I wish we had data to understand if these choices would help - it seems opaque how "threat scores" for IP addresses are computed. Is there any public information about it?

I think it's hugely overkill to show CAPTCHAs all the time to all Tor users for every CloudFlare site. It's also unreasonable to maintain a "reputation" for a Tor exit node.

I agree.

On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?

I'm also interested in understanding the dataflow - could the FBI go to Google to get data on all CloudFlare users? Does CF protect it? If so - who protects users more?

comment:19 Changed 7 months ago by marek

ioerror:

Why not just serve them an older cached copy?

While we do provide a feature that caches old versions of sites (called Always Online), it is not enabled by default. And even if it was you can imagine site owners disabling that. Furthermore it is totally possible for the url to not be in cache. Fundamentally Always Online solves a different problem - serving content in event of origin being unavailable. This is different from protecting origin - you want to serve challenge to bots, not content.

I'll add one more aspect here - in some large attacks we struggle to even serve captchas. The bots request them over and over again, which generates big traffic. Captcha page is optimised for size. We certainly don't want to serve larger sites to suspected-bad IP addresses in order to shield our servers as well.

Do you have any open data on this?

No, but the bad IP reputation for TOR exits is not generated by rolling a dice.

This feels circular - one of the big problems is that users are unable to solve them after a dozen tries
arthuredelstein: On top of this, Google's reCAPTCHA is buggy and frequently impossible to solve.

Maybe this is the problem. But here is a thing - reCaptcha gives different challenges to different IP addresses. Maybe the google IP reputation of TOR exits is _so_ bad that they really don't want this traffic.

Last edited 7 months ago by marek (previous) (diff)

comment:20 follow-up: Changed 7 months ago by marek

Ok, let me try to put the discussion on track again.

I would be very interested in getting the zero-knowledge proofs working. That is - require TBB user to prove he's human exactly once, and then reuse this data across the browsing session, without losing anonymity. This is not CloudFlare specific idea, there are many other providers using captcha. We could have a generic technology for proving "i'm-a-human".


Last edited 7 months ago by marek (previous) (diff)

comment:21 Changed 7 months ago by sordid

We could have a generic technology for proving "i'm-a-human".

What does attempting to prove "i'm-a-human" have to do with addressing DDoS attacks?

Bots are legitimate consumers of data (as stated above).

Just because something is not human, does not mean you should treat it specially. I thought you were trying to prevent DDoS attacks, not play the Turing Test.

Last edited 7 months ago by sordid (previous) (diff)

comment:22 in reply to: ↑ 20 Changed 7 months ago by arthuredelstein

Replying to marek:

Ok, let me try to put the discussion on track again.

I would be very interested in getting the zero-knowledge proofs working. That is - require TBB user to prove he's human exactly once, and then reuse this data across the browsing session, without losing anonymity. This is not CloudFlare specific idea, there are many other providers using captcha. We could have a generic technology for proving "i'm-a-human".

Building the infrastructure for a zero-knowledge proof system sounds like a fascinating but expensive and long-term project. And I wouldn't be confident that CloudFlare would even adopt such a thing once it became available, unless they made a significant investment in the work at the beginning.

Personally I am more interested in what near-term adjustments CloudFlare could make to reduce the CAPTCHA burden on Tor users, which seems to be unnecessarily high. Marek, do you have any thoughts about my suggestions for reducing CAPTCHA use in comment:17?

comment:23 follow-ups: Changed 7 months ago by jgrahamc

Hello. I'm CloudFlare's CTO.

There are companies - such as CloudFlare - which are effectively now Global Active Adversaries.

That's an inflammatory introduction. We are not adversarial to TOR as an entity, we are trying to deal with abuse that uses the TOR network. It's inevitable that a system providing anonymity gets abused (as well as used). I'm old enough to remember the trials and tribulations of the Penet remailer and spent a long time working in antispam.

Using CF as an example - they do not appear open to working together in open dialog,

Really? We've had multiple contacts with people working on TOR through events like Real World Crypto and have been trying to come up with a solution that will protect web sites from malicious use of TOR while protecting the anonymity of TOR users (such as myself). We rolled out special handling of the TOR network so that users should not see a CAPTCHA on a circuit change. We also changed the CAPTCHA to the new one since the old was serving very hard to handle text CAPTCHAs to TOR users. The crypto guys who work for me are interested in blinded tokens as a way to solve both the abuse problem and preserve anonymity.

Earlier @ioerror asked if there was open data on abuse from TOR exit nodes. In 2014 I wrote a small program called "torhoney" that pulls the list of exit nodes and matches it against data from Project Honeypot about abuse. That code is here: https://github.com/jgrahamc/torhoney. You can run it and see the mapping between an exit node and its Project Honeypot score to get a sense for abuse from the exit nodes.

I ran the program today and have data on 1,057 exit nodes showing that Project Honeypot marks 710 of them as a source of comment spam (67%) with 567 having a score of greater than 25 (in the Project Honeypot terminology meaning it delivered at least 100 spam messages) (54%). Over time these values have been trending upwards. I've been recording the Project Honeypot data for about 13 months that the percentage of exit nodes that were listed as a source of comment spam was about 45% a year ago and is now around 65%.

So, I'm interested in hearing about technical ways to resolve these problems. Are there ways to reduce the amount of abuse through TOR? Could TorBrowser implement a blinded token scheme that would preserve anonymity and allow a Turing Test?

comment:24 follow-up: Changed 7 months ago by wwaites

Sometimes the problem CF seems to be worried about is DDoS sometimes it is comment spam. Those are typically very different things and are protected against in very different ways. Indeed it is quite hard to use Tor to do many of the more common amplification-style DDoS techniques. Can we please try not to muddy the waters by having an ambiguous threat model?

comment:25 in reply to: ↑ 23 Changed 7 months ago by cypherpunks

Replying to jgrahamc:

Hello. I'm CloudFlare's CTO.

There are companies - such as CloudFlare - which are effectively now Global Active Adversaries.

That's an inflammatory introduction. We are not adversarial to TOR as an entity, we are trying to deal with abuse that uses the TOR network. It's inevitable that a system providing anonymity gets abused (as well as used). I'm old enough to remember the trials and tribulations of the Penet remailer and spent a long time working in antispam.

It's Tor not TOR.

comment:26 follow-up: Changed 7 months ago by marek

wwaites:

Sometimes the problem CF seems to be worried about is DDoS sometimes it is comment spam

Well spotted. Ddos protection services try to protect users from:

  • comment spam (or just spam)
  • people running wget in a loop (to protect resources)
  • search engine bots (if user wants to)
  • other, not search related, automated bots (if user wants to)
  • proper L7 attacks, resource exhaustion - network
  • proper L7 attacks, resource exhaustion - server CPU

Captchas work across the board and are one of the most important strategies. I'm not saying it's perfect, but it's what the industry uses.

Last edited 7 months ago by marek (previous) (diff)

comment:27 in reply to: ↑ 24 Changed 7 months ago by jgrahamc

Replying to wwaites:

Sometimes the problem CF seems to be worried about is DDoS sometimes it is comment spam. Those are typically very different things and are protected against in very different ways. Indeed it is quite hard to use Tor to do many of the more common amplification-style DDoS techniques. Can we please try not to muddy the waters by having an ambiguous threat model?

I was giving the example of comment spamming because the Project Honeypot is a third party. It gives you an idea of what's happening through Tor. Comment spam is something we deal with, along with DDoS attacks and hacking of web sites (SQL injection etc.). Different techniques are used for different attack types.

comment:28 in reply to: ↑ 23 Changed 7 months ago by cypherpunks

See responses inline.

Replying to jgrahamc:

Using CF as an example - they do not appear open to working together in open dialog,

Really? We've had multiple contacts with people working on TOR through events like Real World Crypto and have been trying to come up with a solution that will protect web sites from malicious use of TOR while protecting the anonymity of TOR users (such as myself).

Yes, really.

We rolled out special handling of the TOR network so that users should not see a CAPTCHA on a circuit change.

This has never worked, and I say that as someone who uses the Tor Browser Bundle every day and has for years.

We also changed the CAPTCHA to the new one since the old was serving very hard to handle text CAPTCHAs to TOR users.

You should know that the CAPTCHA still works about 1 in 20 times in my experience, and that didn't change at all after you switched to the "new one."

The crypto guys who work for me are interested in blinded tokens as a way to solve both the abuse problem and preserve anonymity.

That's a nice thought, but you're still completely censoring my use of your customer's websites 95% of the time all day every day and wasting my time during the 5% of times your system 'works'.

I ran the program today and have data on 1,057 exit nodes showing that Project Honeypot marks 710 of them as a source of comment spam (67%) with 567 having a score of greater than 25 (in the Project Honeypot terminology meaning it delivered at least 100 spam messages) (54%). Over time these values have been trending upwards. I've been recording the Project Honeypot data for about 13 months that the percentage of exit nodes that were listed as a source of comment spam was about 45% a year ago and is now around 65%.

This is not a relevant fact for the vast majority of users whose right to read your company infringes on.

So, I'm interested in hearing about technical ways to resolve these problems. Are there ways to reduce the amount of abuse through TOR? Could TorBrowser implement a blinded token scheme that would preserve anonymity and allow a Turing Test?

You clearly don't understand the clearly articulated problem as it was described, and if you expect people to solve your team's inability to implement a censorship system for you, I hope you find the help you need.

Last edited 7 months ago by cypherpunks (previous) (diff)

comment:29 in reply to: ↑ 5 ; follow-up: Changed 7 months ago by cypherpunks

Replying to willscott:

Replying to marek:

There are CDN/DDoS companies in the internet that provide spam protection for their customers. To do this they use captchas to prove that the visitor is a human. Some companies provide protection to many websites, therefore visitor from abusive IP address will need to solve captcha on each and all domains protected. Let's assume the CDN/DDoS don't want to be able to correlate users visiting multiple domains. Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

This sounds very much like something that could be provided through the use of zero-knowledge proofs. It doesn't seem clear to me that being able to say "this is an instance of tor which has already answered a bunch of captcha's" is actually useful. I think the main problem with captchas at this point is that robots are just about as good at answering them as humans. Apparently robots are worse than humans at building up tracked browser histories. That seems like a harder property for a tor user to prove.

What sort of data would qualify as an 'i'm a human' bit?

Let's be clear on one point: humans do not request web pages. User-Agents request web pages. When people talk about "prove you're a human", what they really mean is "prove that your User-Agent behaves the way we expect it to".

CloudFlare expect that "good" User-Agents should leave a permanent trail of history between all sites across the web. Humans who decide they don't want this property, and use a User-Agent such as Tor Browser fall outside of CloudFlare's conception of how User-Agents should behave (which conception includes neither privacy nor anonymity), and are punished by CloudFlare accordingly.

It might be true that there is some kind of elaborate ZKP protocol that would allow a user to prove to CloudFlare that their User-Agent behaves the way CloudFlare demands, without revealing all of the user's browsing history to CloudFlare and Google. Among other things, this would require CloudFlare to explicitly and precisely describe both their threat model and their definition of 'good behaviour', which as far as I know they have never done.

However, it is not the Tor Project's job to perform free labour for a censor. If CloudFlare is actually interested in solving the problem, then perhaps the work should be paid for by the $100MM company that created the problem, not done for free by the nonprofit and community trying to help the people who suffer from it.

comment:30 in reply to: ↑ 29 ; follow-up: Changed 7 months ago by jgrahamc

Replying to cypherpunks:

CloudFlare expect that "good" User-Agents should leave a permanent trail of history between all sites across the web.

No, we do not.

We have a simple need: our customers pay us to protect their web sites from DoS, spam and intrusions using things like SQL injection. We need to provide that service for the money they pay us.

Another way to think about this is to imagine we're not talking about Tor but some other source of abuse. In the past we've worked to shut down open DNS resolvers, open NTP servers, and we work with networks to disable abuse coming from them. We can't do those things with Tor because of its nature. So we're in a tough spot, we see abuse coming from Tor that's hard to deal with because of anonymity.

A related approach might be for us to say "Let's whitelist all the Tor exit nodes". Play that forward a bit and you could see that any abuser worth their salt would migrate to Tor increasing the abuse problem through Tor.

Ultimately, I think we want the same thing: reduce abuse coming through Tor. Coming up with a good technical solution is hard, but worth working on. You may think that CloudFlare doesn't care about this problem, but in fact it's something that's occupying time (and therefore money) as we look for solutions.

Despite what's been said in this ticket there have been contacts between CloudFlare and Tor developers.

comment:31 in reply to: ↑ 8 Changed 7 months ago by isis

Replying to yawning:

cc-ing isis since this covers earlier work.

Replying to marek:

In other words: is it possible to provide a bit of data (i'm-a-human) tied to the browsing session while not violating anonymity.

Yes. This is a problem that "Anonymous Credential" systems are designed to solve. A example of a system with most of the properties that are desired is presented in Au, M. H., Kapadia, A., Susilo, W., "BLACR: TTP-Free Blacklistable Anonymous Credentials with Reputation" (https://www.cs.indiana.edu/~kapadia/papers/blacr-ndss-draft.pdf). Note that this is still an active research area, and BLACR it of itself may not be practical/feasible to implement, and is listed only as an example since the paper gives a good overview of the problem and how this kind of primitive can be used to solve the problem.

Isis can go into more details on this sort of thing, since she was trying to implement a similar thing based on Mozilla Persona (aborted attempt due to Mozilla Persona being crap).


Having not read the BLACR paper yet… one should generally be wary of anonymous credentials which advertise some form of revocation, since effectively what this means is having some backdoor whereby a trusted third party can do "anonymity revocation". The other form this usually takes is to keep a blacklist (skimming tells me that BLACR does this), or keep some other form of state, e.g. "all blinded signature tokens we've already seen used before," which additionally introduces the requirement that the credential issuing server be always online.

There are other anonymous credential schemes built on NIZK proofs which do not require keeping expensive (and continually growing) blacklists, one of my personal favourites being described in Belenkiy, Lysyanskaya, Camenisch, Sacham, Chase, and Kohlweiss' "Randomizable Proofs and Delegatable Anonymous Credentials". The delegation aspect could also provide a nice feature of being able to e.g. say "I'll trust any user who has met the authentication requirements of any of Cloudflare, Wikipedia, or Amazon" without necessarily knowing which of those three the user had already authenticated to.

comment:32 in reply to: ↑ 23 Changed 7 months ago by ioerror

Replying to jgrahamc:

Hello. I'm CloudFlare's CTO.

There are companies - such as CloudFlare - which are effectively now Global Active Adversaries.

That's an inflammatory introduction. We are not adversarial to TOR as an entity, we are trying to deal with abuse that uses the TOR network.

It is a statement of facts about capabilities. It is not inflammatory - Tor must take into account that Google, for example, can run arbitrary code from many thousands of websites visited in Tor Browser.

To say that CF is not adversarial is awkward - Tor users are prevented from browsing the web and are constantly blocked. I do not believe that CF has yet made this a specific act of malice, of course. To design such a system without considering how it will impact Tor users and then working with us is however seriously problematic as we see from user reports.

It's inevitable that a system providing anonymity gets abused (as well as used). I'm old enough to remember the trials and tribulations of the Penet remailer and spent a long time working in antispam.

Centralization ensures that your company is a high value target. The ability to run code in the browsers of millions of computers is highly attractive. The fact that CF and Google appear to both appear in those captcha prompts probably ensures CF isn't even in control of the entirety of the risk. Is it the case that for all the promises CF makes, Google is actually in control of the Captcha - and thus is by proxy given the ability to run code in the browsers of users visiting CF terminated sites?

Should we be reaching out to Google here?

comment:33 Changed 7 months ago by isis

  • Cc isis removed

comment:34 in reply to: ↑ 30 ; follow-ups: Changed 7 months ago by ioerror

Replying to jgrahamc:

Ultimately, I think we want the same thing: reduce abuse coming through Tor. Coming up with a good technical solution is hard, but worth working on. You may think that CloudFlare doesn't care about this problem, but in fact it's something that's occupying time (and therefore money) as we look for solutions.

Offering a read only version of these websites would be a very good mitigation that could be done effectively instantly - by enabling the above mentioned "Always Online" CDN option - where a CAPTCHA would be added. For any POST action, a javascript hook could be added to then prompt to solve a CAPTCHA as discussed above.

A related approach might be for us to say "Let's whitelist all the Tor exit nodes". Play that forward a bit and you could see that any abuser worth their salt would migrate to Tor increasing the abuse problem through Tor.

That would be a fine approach - it is true that this could be a problem but this would absolutely solve the "defaults" problem we see today.

Despite what's been said in this ticket there have been contacts between CloudFlare and Tor developers.

I am one of those developers and after more than a year, I'm sorry to say that we need to have substantially more serious discussions. Individual engineers who care is not enough. There are also other options - such as some of the things suggested above. I really like the idea of an interstitial that allows a user to see a third party read only CDN cache before remote code execution happens in the user's browser.

In any case - I think we all agree that there is a serious problem here and we should involve our communities and not just have backroom communications that do not result in differences for users. There are millions of impacted users who are being censored from reading websites because of a combination of issues - every single day.

I encourage you to use the Tor Browser for a week and report back to us about how well it works for you. If your experience is completely different from the rest of us, we'd very much like to learn about the different factors in your web surfing habits.

comment:35 in reply to: ↑ 23 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Earlier @ioerror asked if there was open data on abuse from TOR exit nodes. In 2014 I wrote a small program called "torhoney" that pulls the list of exit nodes and matches it against data from Project Honeypot about abuse. That code is here: https://github.com/jgrahamc/torhoney. You can run it and see the mapping between an exit node and its Project Honeypot score to get a sense for abuse from the exit nodes.

I ran the program today and have data on 1,057 exit nodes showing that Project Honeypot marks 710 of them as a source of comment spam (67%) with 567 having a score of greater than 25 (in the Project Honeypot terminology meaning it delivered at least 100 spam messages) (54%). Over time these values have been trending upwards. I've been recording the Project Honeypot data for about 13 months that the percentage of exit nodes that were listed as a source of comment spam was about 45% a year ago and is now around 65%.

This is useful though it is unclear - is this what CF uses on the backend? Is this data the reason that Google's captchas are so hard to solve?

Furthermore - what is the expected value for a network with millions of users per day?

So, I'm interested in hearing about technical ways to resolve these problems. Are there ways to reduce the amount of abuse through TOR? Could TorBrowser implement a blinded token scheme that would preserve anonymity and allow a Turing Test?

Offering a read only version of these websites that prompts for a captcha on POST would be a very basic and simple way to reduce the flood of upset users. Ensuring that a captcha is solved and not stuck in a 14 or 15 solution loop is another issue - that may be a bug unsolvable by CF but rather needs to be addressed by Google. Another option, as I mentioned above, might be to stop a user before ever reaching a website that is going to ask them to run javascript and connect them between two very large end points (CF and Google).

Does Google any end user connections for those captcha requests? If so - it seems like the total set of users for CF would be seen by both Google and CF, meaning that data on all Cloudflare users prompted for the captcha would be available to Google. Is that incorrect?

comment:36 in reply to: ↑ 34 ; follow-ups: Changed 7 months ago by jgrahamc

Replying to ioerror:

In any case - I think we all agree that there is a serious problem here and we should involve our communities and not just have backroom communications that do not result in differences for users. There are millions of impacted users who are being censored from reading websites because of a combination of issues - every single day.

I don't agree with your characterization of this as "censoring". That implies an active desire to prevent people from reaching certain types of content. Given all that we've done to uphold free speech in the face of a barrage of criticism I think your use of the word "censor" is unwarranted.

I encourage you to use the Tor Browser for a week and report back to us about how well it works for you. If your experience is completely different from the rest of us, we'd very much like to learn about the different factors in your web surfing habits.

I did this three weeks ago. In addition the entire company was forced for 30 days to see CAPTCHAs any time they visited a site using CloudFlare while in our offices. Doing so caused us to fix lots of problems with the way the CAPTCHA was implemented. I also personally worked on the code that deals with prevention of a CAPTCHA when the circuit changes and fixed a bug that was preventing it working correctly.

comment:37 in reply to: ↑ 35 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

This is useful though it is unclear - is this what CF uses on the backend? Is this data the reason that Google's captchas are so hard to solve?

It's a data source that we use for IP reputation. I was using it as illustrative as well because it's a third party. I don't know if there's any connection between Project Honeypot and Google's CAPTCHAs.

Offering a read only version of these websites that prompts for a captcha on POST would be a very basic and simple way to reduce the flood of upset users. Ensuring that a captcha is solved and not stuck in a 14 or 15 solution loop is another issue - that may be a bug unsolvable by CF but rather needs to be addressed by Google. Another option, as I mentioned above, might be to stop a user before ever reaching a website that is going to ask them to run javascript and connect them between two very large end points (CF and Google).

I'm not convinced about the R/O solution. Seems to me that Tor users would likely be more upset the moment they got stale information or couldn't POST to a forum or similar. I'd much rather solve the abuse problem and make this go away completely. Also, the CAPTCHA-loop thing is an issue that needs to be addressed by us and Google.

I still think the blinded tokens thing is going to be interesting to investigate because it would help anonymously prove that the User-Agent was controlled by a human and could be sent eliminating the need for any JavaScript.

Does Google any end user connections for those captcha requests?

Can you rewrite that? Couldn't parse it.

comment:38 in reply to: ↑ 34 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

Replying to jgrahamc:

A related approach might be for us to say "Let's whitelist all the Tor exit nodes". Play that forward a bit and you could see that any abuser worth their salt would migrate to Tor increasing the abuse problem through Tor.

That would be a fine approach - it is true that this could be a problem but this would absolutely solve the "defaults" problem we see today.

It's a very short term solution because if all the abuse moves to Tor the obvious next step is that our clients come along and demand that we give them the option to block visitors from Tor completely. If we go that way wholesale I think it will be negative for everyone.

comment:39 in reply to: ↑ 36 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

In any case - I think we all agree that there is a serious problem here and we should involve our communities and not just have backroom communications that do not result in differences for users. There are millions of impacted users who are being censored from reading websites because of a combination of issues - every single day.

I don't agree with your characterization of this as "censoring". That implies an active desire to prevent people from reaching certain types of content. Given all that we've done to uphold free speech in the face of a barrage of criticism I think your use of the word "censor" is unwarranted.

I don't agree with the characterization of this as mere "blocking" when CF prevents users from *reading* websites. I haven't even begun to describe the pain of having written lengthy comments only to hit a captcha loop and it censored my *speech* as well.

It is censorship from where many of our users stand. Some of our Chinese users refer to it as the Great Distributed Firewall that they hit after jumping over the other Great Firewall.

Forgive me for not knowing the other details about Cloudflare and Free Speech - I'm not at all trying to characterize those activities. The active blocking, captcha loop issues are seriously problematic and they have a *result* which is that websites are unreadable. I'm not claiming you're burning books or something silly. I'm correctly pointing out that the books are safely on the otherside of a locked door and we're being turned into captcha solving machines that often do not unlock the door, if you'll forgive the metaphor.

I encourage you to use the Tor Browser for a week and report back to us about how well it works for you. If your experience is completely different from the rest of us, we'd very much like to learn about the different factors in your web surfing habits.

I did this three weeks ago. In addition the entire company was forced for 30 days to see CAPTCHAs any time they visited a site using CloudFlare while in our offices. Doing so caused us to fix lots of problems with the way the CAPTCHA was implemented. I also personally worked on the code that deals with prevention of a CAPTCHA when the circuit changes and fixed a bug that was preventing it working correctly.

You used it for a week after all of these changes were deployed? And you didn't encounter any issues? You feel that it works perfectly and that there are no valid issues being voiced? Or...?

comment:40 in reply to: ↑ 39 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

You used it for a week after all of these changes were deployed? And you didn't encounter any issues? You feel that it works perfectly and that there are no valid issues being voiced? Or...?

I did not encounter the loops that people are talking about. If I had I would have had one of the engineers fix that problem. The biggest thing I encountered was that our "one CAPTCHA per site modulo circuit change" code wasn't working and I fixed it. I'd like to get this to a point where Tor users are not in pain and during our CAPTCHA testing we found some problems which were fixed.

It would be *very* helpful if someone were able to reproduce the CAPTCHA loop thing so we can address it. I will get an engineer to take look and see if we can reproduce internally.

comment:41 in reply to: ↑ 37 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

This is useful though it is unclear - is this what CF uses on the backend? Is this data the reason that Google's captchas are so hard to solve?

It's a data source that we use for IP reputation. I was using it as illustrative as well because it's a third party. I don't know if there's any connection between Project Honeypot and Google's CAPTCHAs.

How do we vet this information or these so-called "threat scores" other than trusting what someone says?

Offering a read only version of these websites that prompts for a captcha on POST would be a very basic and simple way to reduce the flood of upset users. Ensuring that a captcha is solved and not stuck in a 14 or 15 solution loop is another issue - that may be a bug unsolvable by CF but rather needs to be addressed by Google. Another option, as I mentioned above, might be to stop a user before ever reaching a website that is going to ask them to run javascript and connect them between two very large end points (CF and Google).

I'm not convinced about the R/O solution. Seems to me that Tor users would likely be more upset the moment they got stale information or couldn't POST to a forum or similar. I'd much rather solve the abuse problem and make this go away completely.

Are you convinced that it is strictly worse than the current situation? I'm convinced that it is strictly better to only toss up a captcha that loads a Google research when a user is about to interact with the website in a major way.

I do not believe that you can solve abuse on the internet anymore than a country "solve" healthcare or that the hacker community can "solve" surveillance. Abuse is relative and it is part of having free speech on the internet. There is no doubt a problem - but the solution is not to collectively punish millions of people (and their bots who are people too, man :-) ) based on ~1600 ip address "threat" scores.

Also, the CAPTCHA-loop thing is an issue that needs to be addressed by us and Google.

Does that mean that Google, in addition to CF, has data on everyone hitting those captchas?

I still think the blinded tokens thing is going to be interesting to investigate because it would help anonymously prove that the User-Agent was controlled by a human and could be sent eliminating the need for any JavaScript.

I'm not at all convinced that this can be done in the short term and it seems to assume that users only use graphical browsers. Attackers will be able to extract tokens and have farms of people solving things, when they need new tokens, so usually regular users pay the highest price.

Does Google any end user connections for those captcha requests?

Can you rewrite that? Couldn't parse it.

When a user is given a CF captcha - does Google see any request from them directly? Do they see the Tor Exit IP hitting them? Is it just CF or is it also Google? Do both companies get to run javascript in this user's browser?

comment:42 in reply to: ↑ 40 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

You used it for a week after all of these changes were deployed? And you didn't encounter any issues? You feel that it works perfectly and that there are no valid issues being voiced? Or...?

I did not encounter the loops that people are talking about. If I had I would have had one of the engineers fix that problem. The biggest thing I encountered was that our "one CAPTCHA per site modulo circuit change" code wasn't working and I fixed it. I'd like to get this to a point where Tor users are not in pain and during our CAPTCHA testing we found some problems which were fixed.

We'd all like that - I'd really like it if it was CAPTCHA free entirely until there is a POST request, for example. A read only version of the website, rather than a CAPTCHA prompt just to read would be better wouldn't it?

It would be *very* helpful if someone were able to reproduce the CAPTCHA loop thing so we can address it. I will get an engineer to take look and see if we can reproduce internally.

How many people are actively testing with Tor Browser on a daily basis for regressions? Does anyone use it full time?

comment:43 follow-up: Changed 7 months ago by cypherpunks

Why not just blanket disallow POST for TOR exit nodes, that takes care of the bulk of everyone's problems.

comment:44 in reply to: ↑ 43 ; follow-up: Changed 7 months ago by ioerror

Replying to cypherpunks:

Why not just blanket disallow POST for TOR exit nodes, that takes care of the bulk of everyone's problems.

That doesn't solve the issue in a proportional manner. It would be better to solve a captcha or use an anonymous token for certain kinds of interactive activity over blanket denial.

It also doesn't solve any of the other issues - such as the code running in people's browsers, the PII collected and so on. I'd rather a user have an option to hit an archive that is unrelated at that point - wouldn't you?

comment:45 Changed 7 months ago by jeffburdges

We're adding an "auto-pay" option to the auditor signing keys in GNU Taler to allow the creation of denomination signing keys for automatic payments without user confirmation. https://taler.net/

Automatic payments are a potential deanonymization vector if the attacker can issue as many denomination keys as they like. We'd therefore envision the Tor project being the auditor who limits the issuing of new denomination signing keys.

Ideally, CloudFlare would run a mint whose denomination keys the Tor project signs every few months. Anytime a TBB user solves a CloudFlare CAPTCHA they'd receive stash of token that TBB automatically uses to access pages.

We've actually had some limited discussions with CloudFlare about doing this. I'll speak about it some at the Tor dev meeting later this week. Along with several interesting variations.

comment:46 Changed 7 months ago by ioerror

I think the idea of using taler is an interesting open research question. It also seems orthogonal to many possible options that do not involve complicated cryptographic solutions with questionable anonymity properties. Using tokens, cookies, anonymous credential or ledger based solutions may be useful once a user tries to do some SQLI - I'm not at all convinced that it is reasonable to require what sounds like "an internet drivers license" or some Chaum scheme to read a web page.

comment:47 Changed 7 months ago by cypherpunks

CAPTCHAs are a fundamentally untenable solution to dealing with DDOS attacks. Algorithmic solutions will always catch up to evolving CAPTCHA methods. CloudFlare and other service providers should recognize that is the inevitable direction technology is going and abandon it now.

An alternate solution is a client proof-of-work protocol. This puts a greater burden on attackers attempting to establish many connections than on users who only need one connection. Then once a TLS session is established, the server can determine from behavior of that client whether it's an attacker and drop the connection. We should try to standardize that and get it into TLS implementations so service providers have an easy configuration choice.

https://tools.ietf.org/html/draft-nir-tls-puzzles-00

comment:48 in reply to: ↑ 38 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

Replying to jgrahamc:

A related approach might be for us to say "Let's whitelist all the Tor exit nodes". Play that forward a bit and you could see that any abuser worth their salt would migrate to Tor increasing the abuse problem through Tor.

That would be a fine approach - it is true that this could be a problem but this would absolutely solve the "defaults" problem we see today.

It's a very short term solution because if all the abuse moves to Tor the obvious next step is that our clients come along and demand that we give them the option to block visitors from Tor completely. If we go that way wholesale I think it will be negative for everyone.

Treating Tor as special seems to make sense as it is already treated specially as as ~1600 nodes shared by millions of users seems to just utterly ruin ip reputation schemes.

I also find it hard to believe that "all the abuse" will move to Tor. Even if a great deal of it moved to Tor, we have lots of users and lots of traffic that is not abusive traffic.

comment:49 in reply to: ↑ 23 Changed 7 months ago by lunar

Replying to jgrahamc:

Earlier @ioerror asked if there was open data on abuse from TOR exit nodes. In 2014 I wrote a small program called "torhoney" that pulls the list of exit nodes and matches it against data from Project Honeypot about abuse. That code is here: https://github.com/jgrahamc/torhoney. You can run it and see the mapping between an exit node and its Project Honeypot score to get a sense for abuse from the exit nodes.

I ran the program today and have data on 1,057 exit nodes showing that Project Honeypot marks 710 of them as a source of comment spam (67%) with 567 having a score of greater than 25 (in the Project Honeypot terminology meaning it delivered at least 100 spam messages) (54%). Over time these values have been trending upwards. I've been recording the Project Honeypot data for about 13 months that the percentage of exit nodes that were listed as a source of comment spam was about 45% a year ago and is now around 65%.

Could run the exact same test against all Comcast IP addresses aggregated as just once or another significant ISP?

In the context of Tor, large exit nodes have as many users behind them as a whole IPv4 /16 or IP addresses used for Carrier-grade NAT.

How are you handling CGNs so far?

One piece of understanding that I feel Marek and you might be missing is that with Tor Browser, every domain (the darkest part of the URL in Firefox / Tor Browser) will different Tor circuits and different cookies. I don't think this match the experiment you had with your team: Tor users will get CAPTCHAs for every single CloudFlare domain, and for each of these domains, multiple times a day.

Please ask your developers experiment using Tor Browser are their sole browser. I bet they will start to—at least—campaign StackOverflow to turn off the CAPTCHAs.

comment:50 in reply to: ↑ 26 Changed 7 months ago by comzeradd

Replying to marek:

wwaites:

Sometimes the problem CF seems to be worried about is DDoS sometimes it is comment spam

So what happens if me (as a site/server admin) don't need this (or part of this).

Specifically:

  • As a server admin, if my site is not under DDoS (or spam) attack, then its visitors should not get the captcha challenge.
  • As a server admin I should be able to choose if I want this kind of protection and potentially completely disable it.
  • As a server admin, I want more sane defaults (lower security level).

comment:51 in reply to: ↑ 44 ; follow-up: Changed 7 months ago by cypherpunks

Replying to ioerror:

That doesn't solve the issue in a proportional manner.

My primary issue is that I cannot access a site at all. I have zero intention of doing anything other than reading this marmalade recipe. I would gladly trade having nothing for something.

It would be better to solve a captcha or use an anonymous token for certain kinds of interactive activity over blanket denial.

It also doesn't solve any of the other issues - such as the code running in people's browsers, the PII collected and so on. I'd rather a user have an option to hit an archive that is unrelated at that point - wouldn't you?

Since I am of the tin-hat variety of TOR user, I do not have images or javascript enabled, so solving a captcha becomes nigh possible. Interactivity is nice, but I've conditioned myself over the years to treat most things through TOR as read-only, and that I can't even do that lately for the most benign things is frustrating. It would be nice to have options, but at this point, I'd really just like functional basics.

comment:52 in reply to: ↑ 51 Changed 7 months ago by ioerror

Replying to cypherpunks:

Replying to ioerror:

That doesn't solve the issue in a proportional manner.

My primary issue is that I cannot access a site at all. I have zero intention of doing anything other than reading this marmalade recipe. I would gladly trade having nothing for something.

That suggests that a read only version of these websites without a captcha or token of any kind would perfectly fit your use case.

It would be better to solve a captcha or use an anonymous token for certain kinds of interactive activity over blanket denial.

It also doesn't solve any of the other issues - such as the code running in people's browsers, the PII collected and so on. I'd rather a user have an option to hit an archive that is unrelated at that point - wouldn't you?

Since I am of the tin-hat variety of TOR user, I do not have images or javascript enabled, so solving a captcha becomes nigh possible. Interactivity is nice, but I've conditioned myself over the years to treat most things through TOR as read-only, and that I can't even do that lately for the most benign things is frustrating. It would be nice to have options, but at this point, I'd really just like functional basics.

Right - would you consider the read only version as a default, where to say, POST, you'd *then* have to solve a captcha as a reasonable default?

comment:53 in reply to: ↑ 23 Changed 7 months ago by ioerror

I'm hoping that jgrahamc will come back to answer some of the outstanding questions about CF/Google as well as other details.

comment:54 follow-up: Changed 7 months ago by massar

  • Cc jeroen@… added

Silly-side-track idea I am throwing out there:

Why does CloudFlare not run a .onion proxy for their sites?

That way, Tor gets rate limited through the Tor network and in addition at that CloudFlare-run .onion node.

There is no more possibility of a DoS from an exit, as the Tor client can go through the proxy, Tor exits that do not are not following protocol.

Thus, for short-term keep on serving the always broken captcha's along with the below extra details, then in the long term just a "Hi, you are coming from Tor, please use the proxy instead, if you see this you should have updated TBB by now...".

Thus instead of serving the captcha or in addition, serve a few extra headers:
`
<meta name="onion-proxy" url="socks5://<hash>.onion:1080">
`
or if a direct onion exists for the site (tell folks they can configure that, heck, charge people for that service if you want):
`
<meta name="onion-url" url="https://<hash>.onion">
`

TBB could have a built-in list of "well known proxies", eg the CloudFlare ones, the ones for Akamai and many other CDNs, for others it could pop up a "This site can be reached through Tor without leaving the Tor network, please consider using it".

TBB can also keep a cache of 'recently seen onion-*' so that it does not have to exit the Tor network to figure out where to go.
Normal HTTP cache times can be used if really wanted, or we can add a 'expires' tag to the meta URLs above.

For anonymity this can only be a win, as connections do not leave the Tor network anymore, also it reduces load on the exits (which IMHO should not exist in the first place, everything should be available in the Tor network directly...).

comment:55 Changed 7 months ago by jeffburdges

Just to clarify : Adding auto-pay support to Taler is basically the same solution being discussed internally at CloudFlare. We just have working blind singing code that runs in the browser already done. :)

These CAPTCHAs won't be so annoying if you solve one CAPTCHA for x page loads access everything, even across TBB sessions. As opposed to one CAPTCHA per domain per TBB session. It's just amortizing the CAPTCHAs really.

ioerror, I agree that tokens for merely viewing web pages is extreme. We should absolutely continue lobbying CloudFlare to apply their filters more precisely. We do still need a token based scheme for anything that triggers SQL though because asking Tor users to solve a CAPTCHA anytime they want to post anything is also extreme.

Also, one could imagine issuing tokens in other ways besides CAPTCHAs once we have an auto-pay blind singing based infrastructure deployed. I dislike most idea in this space, like a facebook app that gives you CloudFlare tokens. ;)

As an aside, there is an interesting anonymous white/black listing protocol implicit in Taler's refresh protocol : If you do not miss behave then you get your token refunded, meaning far fewer CAPTCHAs. I think refreshing tokens offers stronger anonymity than all the anonymous white/black listing protocols that I've seen in the literature (see Isis' comment, although I haven't read BLACR). It's even post-quantum. Now Taler's refresh protocol costs 3ish RSA signatures, while a simpler coin refresh costs only one, but Taler's refresh helps obstruct a market token distribution though. I can explain all this in person if you like, but probably any near term deployment would avoid refreshing entirely.

comment:56 in reply to: ↑ 54 ; follow-up: Changed 7 months ago by ioerror

Replying to massar:

Silly-side-track idea I am throwing out there:

Why does CloudFlare not run a .onion proxy for their sites?

Tor is an onion proxy? :-)

It could be that the captcha page, upon detecting Tor, could redirect to a CF controlled .onion that has a read only version of the website, for example.

comment:57 in reply to: ↑ 56 Changed 7 months ago by massar

Replying to ioerror:

Replying to massar:

Silly-side-track idea I am throwing out there:

Why does CloudFlare not run a .onion proxy for their sites?

Tor is an onion proxy? :-)

:)

I used that wording though to indicate that in the browser URL bar it will still say https://www.HostedByCDN.com (and thus HTTPS certificates keep on working) while TBB actually just redirects those through the indicated SOCKS proxy (ala what FoxyProxy does for Chrome).

If CF and other CDNs would implement something like that suddenly a lot of content would automatically start existing on the Tor network, which does not have any surveillance issues when going through an exit. (of course what the CDN network and then the final recipient do is still all voodoo, but it is better than going through an exit you can't fully trust; ignoring TLS there for a bit).

It could be that the captcha page, upon detecting Tor, could redirect to a CF controlled .onion that has a read only version of the website, for example.

IMHO forcing a 30{123} redirect is far from a good solution, that should be a browser and thus a user choice.

Maybe the user is mis-detected as being a Tor user (though exit lists are pretty much 'correct') or they do not want that mode of operation to reach the site. Also, why bother redirecting a Bot there, if a Bot was properly written it reads the meta tag and uses that (I mean, if you specifically program your bot to crawl over Tor then you can use the meta tag too).

Also, if that meta line is included everywhere, an aware browser could suggest to the user "hey, you can use Tor for this site" which is also a win...

comment:58 Changed 7 months ago by ioerror

(Just solved a five captcha set to read a web page, again.)

comment:59 follow-up: Changed 7 months ago by throwaway1

I just wanted to comment on this due to being a private beta tester of CloudFlare (in it's early years).

I opened a support ticket requesting that CloudFlare allow GET only requests (read-only proxy) for my sites viewers that used TOR. I believe this is perfectly feasible since they use nginx as a reverse proxy. Several other proxy sites do this already, to prevent spam.

I was told however that they were "unsure how that would work" and the ticket was promptly closed.
This does concern me as my users frequently get caught up in the captcha loop. If using an RSS feed reader through TOR, it is not even possible to receive RSS from the site.

I agree that this is something CloudFlare needs to fix on their end, but thusfar have been unwilling to do so.

I also found a workaround on bypassing the captcha over TOR which I have given to some of my users, but will not share it here for obvious reasons.

comment:60 Changed 7 months ago by mmarco

Hello everybody.

DISCLAIMER: I am by no means an expert in networks, computer science or any other technical aspect that might have to do with the subject here, so it is likely that what I am going to propose makes no sense at all (uf that is the case, just ignore it). I am just a Tor user that is particularly annoyed by CF capthas, since I do a big part of my browsing through orfox, and captchas are broken there. The only reason I have decided to share my thoughts here is because ioerror publicly uncouraged us to do so in Twitter.

My (probably naive) proposal is the following:

from my perspective, the problem here is that Tor, by dessign, makes it hard to distinguish bewteen the legitimate user from the abusive one (be it human or robot). CF's work is preciselly to distinguish between those two users, so we have a inconpatibility problem here. More preciselly, the problem is the lack of granularity: CF just sees one IP (the exit node) used by many users, both legitimate and abusive.

So my propsal goes in the direction of adding more granularity, that is, distinguish between those different users. It would be something like this:

  • When the website receives a request from a Tor exit node, it creates an ephemeral .onion service (or gets one from a pool of pre-created ones), and answers with a 301 message that redirects to the .onion service (maybe with a delay to give time for the corresponmding circuits to be stablished).
  • Those ephemeral .onion services are killed when there is no session running on them anymore. Or when abusive behaviour is detected through them.
  • The connections through those .onion services now can be treated separatedly, thus allowing to treat separatedly the legitimate users from the abusive ones.

I don't know if this solution is viable (maybe the overhead of creating the ephemeral .onion services is too much), but if it is, I think it would give an improvement over the current situation.

From the user viewpoint, there is a delay when accessing for the first time to the website, but that sounds better than the captcha hell. After that, there is a slower browsing experience, but that is the usuall price you have to pay for using Tor .onion services.

From the website viewpoint, you have an initial timeout for each connection, which might already discourage abusers. If some abuser wants to reuse the same .onion connection, you have a direct handle over it (you can then push the captha hell, or even directly kill the connection).

comment:61 follow-ups: Changed 7 months ago by nicatronTg

I'm a CloudFlare free customer, and I'd like to voice this to those at CloudFlare who are listening: Why can't we, as customers, at least tell CloudFlare to be permissive of all traffic through Tor exits? We can set security levels on our websites from "Under Attack" to "Essentially off," but what I'd really like is another option that says "permit all traffic from Tor exit nodes." You can tell me about how this is blocking spam, but in reality I've had a significant uptake in forum spam from real human sweat shop workers who pass the challenge pages with no problem. The only reason why I want to use CloudFlare at this point is for anycast and DNS, but the "off" security option doesn't even exist for those of us on free plans.

But back to the point: If all Tor exits are known, why isn't there even a control panel option for customers to say "Okay, I know Tor traffic is good, allow it unconditionally"?

comment:62 in reply to: ↑ 61 Changed 7 months ago by throwaway1

Replying to nicatronTg:

point: If all Tor exits are known, why isn't there even a control panel option for customers to say "Okay, I know Tor traffic is good, allow it unconditionally"?

I brought this up to them as well, it's an excellent idea that went unheard.

comment:63 in reply to: ↑ 61 ; follow-up: Changed 7 months ago by jgrahamc

Replying to nicatronTg:

But back to the point: If all Tor exits are known, why isn't there even a control panel option for customers to say "Okay, I know Tor traffic is good, allow it unconditionally"?

We will add this feature. Our customers will be able to 'whitelist' Tor so that Tor users visiting their web sites will not be challenged. This feature is coded (I couldn't talk about it earlier as had not got permission to do so) and will be released shortly.

comment:64 Changed 7 months ago by torry

  • Cc torry added

comment:65 in reply to: ↑ 59 Changed 7 months ago by cypherpunks

Replying to throwaway1:

This does concern me as my users frequently get caught up in the captcha loop. If using an RSS feed reader through TOR, it is not even possible to receive RSS from the site.

This is but one example of how cloudflare breaks the web, and of the insight behind 'bots are people too'. It's been said above: we humans don't make HTTP requests, our machines to do it for us.

Need more reason to dislike cloudflare? How about their across-the-board HTTPS man-in-the-middle? Look for the paper "When HTTPS meets CDN".

comment:66 follow-up: Changed 7 months ago by toruser2016

I am not affiliated with Tor itself, I am just a normal web user, who occasionally uses tor, and I am also a cloudflare user in the sense that I am a user / visitor of sites "protected" by cloudflare. I find the status quo frustrating and disappointing. I can't understand why CF have so much trouble with implementing working captchas.

==Overview==

  1. It is widely accepted that there is a problem here.
  2. Cloudflare have been trying for months if not years to solve it.
  3. So far CF's attempts to solve this problem have been a failure.

Why is it so hard for Cloudflare to solve this??

There are two tracks here, better not to confuse them.

The first track is the "intended" status quo, which involves giving up Captchas now and then to tor users, to force them to identify as human before they can browse a page. It is supposed to work but it doesn't. Apparently changes were made recently, but people still report the endless captcha cycle, unsolvable captchas, it doesn't work on the android tor browser Orfox etc etc. Personally, I think if this actually worked as intended (captchas were actually solvable etc), I would be a lot happier. I don't mind solving captchas every so often, I had to solve one to register for trac.torproject.org!.

The second track is more complex solutions to the general problem of identifying good actors and bad actors, zero knowledge proofs and all the rest of it. These are complex solutions to hard problems and I think these discussions should come later. If CF are not willing or able to solve the simple "serve up a captcha that works" problem, there is no hope for them to implement a hard solution to this. Forget the second track for now.

So my question is to Cloudflare, their CTO was on here earlier. Why exactly are you not able to just implement a Captcha system that works?? Seriously, is it that hard? As far as I know, you have recently moved over to serving up Google captchas, but it still doesn't work? Is CF's CTO really OK and comfortable with the fact that his team couldn't implement this after apparently trying for a few years? Seriously!! Captcha's as a concept have been around for a pretty long time now.

Never attribute to malice that which is adequately explained by stupidity

Personally I don't believe CF are deliberately making the internet hard to use through Tor due to some nefarious conspiracy with the lizard men, but we should accept that the status quo suits the NSA very nicely. It was made clear in the Snowden leaks that GCHQ, the NSA etc would like people to stop using Tor, so I am sure they are very happy to see CF make general web browsing difficult and frustrating for ordinary users. The longer the situation persists, the less adequate "stupidity" is as a reason for Cloudflare's inability to solve this. It's time for CF to step up and fix their captchas, which they have claimed they will do on a number of occasions in recent months.

comment:67 follow-up: Changed 7 months ago by jgrahamc

I see the 'CAPTCHA loop' problem. Reproduced it internally (as did a couple of other people). Going to try to figure out why. That's ugly.

comment:68 in reply to: ↑ 66 Changed 7 months ago by lhi

Replying to toruser2016:

Never attribute to malice that which is adequately explained by stupidity

Reading their comments here, I have come to understand that some of them really don't care.

I view the CAPTCHAs as a reminder that something is very amiss in the web. No amount of PR or whitewashing can suppress my strong reservations about some company being in a position to MITM large swathes of the web. This is a bad situation, and I appreciate being reminded of it nearly every time I browse the web.

This nonwithstanding, thanks jgrahamc for finally agreeing to provide the simple option for data sources, I mean "free customers", to whitelist Tor exits, and investigate long-standing problems with the CAPTCHAs. It would at least fix the collateral censorship effect of IP-based overblocking.

The ZK "human bit" proof discussion is superfluous. Increasing attack surface by weighing down client software with contorted and expensive functionality that serves no discernible purpose for the user - just artificial complications to reduce accessibility and serve the crazy whims of some self-important company - is a horrible idea.

I hope no Tor developers will provide free labor (as cypherpunks lucidly characterized the idea) to address someone else's perceived needs. No one should be tricked into wasting time on this. I could see use cases elsewhere. But not for being allowed to browse the effing web. It's a trap. The whole idea smacks of the EME affair. I side with @ioerror: where will this lead?

As people have already correctly stated, all requests are negotiated by bots, not humans. It is no one's bus.iness whether some person physically attends the process. Attempting to ascertain this in some way or other is surveillance-think. What if I want to retrieve some page via cron job, for example. Not "legitimate"?

I always find it a demeaning and insulting attitude towards humans that we are being asked by rooms full of servers (which handle enormous amounts of requests and should be able handle the few extra ones coming from Tor exits without breaking an electronic sweat, honestly) to solve puzzles. I am very angry about this attitude btw because my time is infinitely more valuable than your servers'.

Being treated as a CAPTCHA-solving bot makes people angry, understand? Especially the ones one attempts, in vain, to solve. (It's a mistake to even try. No web content is that important.)

Btw trac.torproject.org made me solve google CAPTCHAs, some of which didn't work. Way to go ...

Last edited 7 months ago by lhi (previous) (diff)

comment:69 Changed 7 months ago by bashrc

I'm a Tor user and CloudFlare's antisocial behavior is a problem. I rarely bother to even attempt the CAPTCHAs since they're often either illegible or just don't work. CloudFlare is breaking the internet one site at a time.

CloudFlare themselves may be beyond any reasoning, and so I'd be in favour of having some warning page as mentioned above. "You have reached a CloudFlare site, click here to complain or just move right along", or something similar. Maybe with a scary looking icon too. I would favour a giant rat with fangs brandishing dagger, but you can use your imagination.

comment:70 in reply to: ↑ 23 Changed 7 months ago by lhi

Replying to jgrahamc:

Hello. I'm CloudFlare's CTO.

Hello. I'm yet another Tor user, victim of global mass surveillance and distrustful of anyone I have no reason to trust, angry after being repeatedly and systematically (ab)used as a CAPTCHA-solving bot by your network.

There are companies - such as CloudFlare - which are effectively now Global Active Adversaries.

That's an inflammatory introduction. We are not adversarial to TOR as an entity, we are trying to deal with abuse that uses the TOR network. It's inevitable that a system providing anonymity gets abused (as well as used). I'm old enough to remember the trials and tribulations of the Penet remailer and spent a long time working in antispam.

That's rich. You fully understood the meaning. You misrepresent what is being said, twisting the words around. The intended meaning is obviously that Johnny Doe user of the Tor system (which after all is based on distributing trust), or Tor itself, have no reason to trust the central and tentacular entity CloudFlare, which makes it, effectively, an adversary in security terms.

Earlier @ioerror asked if there was open data on abuse from TOR exit nodes. In 2014 I wrote a small program called "torhoney" that pulls the list of exit nodes and matches it against data from Project Honeypot about abuse. That code is here: https://github.com/jgrahamc/torhoney. You can run it and see the mapping between an exit node and its Project Honeypot score to get a sense for abuse from the exit nodes.

This is "open data" insofar as one uncritically trusts "project honeypot" classification. sorry.

comment:71 in reply to: ↑ 10 ; follow-ups: Changed 7 months ago by lhi

Replying to marek:

@ioerror: you are doing this again. You are mixing your opinions with technical reality. Please stop insulting me. Please focus on what can we can technically do to fix the problem.

??? where did ioerror insult you? I looked hard and the only thing I could find is discussions about "human bits" and spurious distinctions between tech and politics, which might be construed as insulting our intelligence.

Technology is politics and politics is technology in times of mass surveillance and big data, and so is corporate policy that discourages anonymity and defines legitimate behavior. no one brought politics into this except that it was there from the beginning. sorry for being rude. I am angry about all the CAPTCHAs I have been served.

I also think this discussion is being derailed.

You claim to put it back on track by proposing some expensive idea on TBB side, which is overkill and would cause a shitload of new artificial problems while we already have real ones. I contend that responsibility for the debacle lies squarely with your company, not with Tor. to some extent, granted, abusive bots and the general makeup of the web are also to blame, but you mishandled it by serving impossible puzzles and causing lots of collateral blocking.

(POST is hard) First, what actually the proxy should *do* on the POST? Abort your POST, serve captcha, and ask you to fill the POST again? Or accept your 10meg upload, serve captcha and ask you to upload it again? Now think about proxy behaviour during an attack. Doing captcha validation on POST is not a trivial thing.

You're the MITM, you can see whether there is already an auth token of some kind right? disallow POST otherwise. think of something. you're the ones breaking things for people right now.

(blocking regions) Second, during an "attack" (call it ddos or something) the website owners often decide to block traffic from ceirtain regions. Many bus.nesses care only about visitors from some geographical region, and in case of a DDoS are happy to just DROP traffic from other regions. This is not something to like or dislike. This is a reality for many website owners. Serving captcha is strictly better than disallowing the traffic unconditionally.

How often does that happen? most of the time for a given site there's no DDOS, certainly.

(Not only spam, load as well) Third, there regularly are bot "attacks" that just spam website with continous flood of GET requests, for example to check if the offered product is released, the promotion started or price updated. This is a problem for some website owners and they wish to allow only traffic from vetted sessions.

What about slowing down recurrent requests? it's really not something that can be solved on the Tor side.

The underlying problem, is that for any ddos / spam protection system the source IP address is a very strong signal. Unfortunately many Tor exit IP's have bad IP reputation, because they _ARE_ often used for unwanted activity.

In other words, you're happy to overblock Tor because IP blocks are just so convenient? probably also because as far as cloudflare is concerned, we just don't matter. I don't understand why you (or jgrahamc) bother with this discussion anyway. what's in it for you?

I have already aired a comment about the "human bit", which I think is an appalling idea in this context.

Last edited 7 months ago by lhi (previous) (diff)

comment:72 follow-up: Changed 7 months ago by ttr99

Right so let's suppose, I am a non tech savvy internet user. I just opened a restaurant and I put up a web page advertising my restaurant which gives a phone number people can call to make a booking.

I've heard about all the bad hackers and spammers on the internet so I want to keep my site safe and secure. I google up on how to do that and I read something about cloudflare. Sounds good, so I decide to protect my site with Cloudflare.

What happens?

After 1 hour I get my first customer, they come from a clearnet IP, like the website and menu, check my address, and call up to make a booking.
After 1 more hour, I get my second customer, but they are browsing through Tor. Cloudflare gives them an impossible to solve captcha, they leave and go to Burger King instead.

Can you see the problem? I lost a customer because Cloudflare - for no legitimate reason - made my site unusable.

Cloudflare's threat model is wrong. It sounds good and proper for cloudflare to say, "we protect our users and their sites", but in reality that is not what they are doing. In the case described above, there is no question of protection, no reason to suppose harm will occur from the Tor user, no question of comment spam or any reason to believe a DDOS is happening. But because Cloudflare have implemented a broken solution to the wrong problem (Tor users vs malicious users), I lost a customer.

So it's easy to say, we protect our users, but in real terms, if you put it to any one of Cloudflare's customers, I don't believe any of them will see the above situation as something that requires "protection". Just coming from Tor in and of itself is not a problem. In the case of suspected comment spam captchas can be served up, in the case of a DDOS attack there are other solutions (and do DDOS attacks even come from Tor, seems doubtful?).

Now you can say the above example is contrived, or you only lost one customer which is not a lot, but it highlights that CF are using the wrong approach to solve the problem. Right now this type of problem is not very big because the set of Tor users compared to clearnet users is small, so the lost business is again small, but I am sure Tor use is only going to grow over time, and the problem is only going to grow. CF need to get ahead of the curve on this.

comment:73 in reply to: ↑ 72 Changed 7 months ago by lhi

Replying to ttr99:

Now you can say the above example is contrived, or you only lost one customer which is not a lot, but it highlights that CF are using the wrong approach to solve the problem. Right now this type of problem is not very big because the set of Tor users compared to clearnet users is small, so the lost bus.ness is again small, but I am sure Tor use is only going to grow over time, and the problem is only going to grow. CF need to get ahead of the curve on this.

Somehow I get the strong impression that CloudFlare's web sabotage is driving novice users away from Tor, thereby "solving" said problem by ensuring that the number of casual Tor users remains comfortably low (the way the NSA wants it too).

Last edited 7 months ago by lhi (previous) (diff)

comment:74 Changed 7 months ago by madD

Hello
I'm a voluntary user of Tor for a long time, and also a forced user of CloudFlare ever since they launched on us their business model.
Having read the complete thread just want to say following.
Just like cypherpunks, usually I want only to read stuff, so for me captchas really are another form of digital harrasement. Had to solve about a 10 of them to get registered at Trac. And because my post contains the word "business" i got:

Captcha Error
Submission rejected as potential spam

    Content contained these blacklisted patterns: '(?i)business'

Serving malfunctional javascript captchas for years can hardly be attributed to stupidity. I believe it even less when I read the company's CEO's LinkedIn.

So Mr. jgrahamc, are captcha's part of Government Technology? Why is javascript so necessary. Do you measure per click reaction time? Do you correlate it with previous data sets? With enough signal gathered, can you then establish unique profiles of people?

comment:75 Changed 7 months ago by misc-human

I'll add that anecdotally, I've redirected at least $100 but probably more of purchases to competitors of CloudFlare customers due to captchas.

In economic terms, CloudFlare's service is creating "negative externalities". This term describes the fact that CloudFlare profits from an action that negatively affects a 3rd party, in this case Tor user agents, as readily admitted by jgrahamc. (Among others - remote execution risks pointed out by ioerror, privacy degradation).

It's a poor security mechanism from the view of false positives, and as pointed out it's hard to believe spammers don't operate human captcha-solving farms in any case, leading to unavoidable, high false negatives.

Combined with the laughable notion to classify Tor IPs using a generic IP reputation implementation when *you have the exit IP list as a given*, the security engineering employed at CloudFlare is beyond reproach. It's a turd that should not be polished, IMO. I agree on the proportionality and carrier-grade NAT points above.

Worth mentioning the entire Tor network has very small egress bandwidth relatively, so the strain on CloudFlare, from Tor, will never be that high.

Yes, it is preferable as default to serve Always Online content, to Tor Exits for GET requests, where you would otherwise have served a captcha. Stop polishing the turd.

comment:76 Changed 7 months ago by cypherpunks

Hi there, Cloudflare! Another Torrorist here who, on behalf of your customers, you have blocked and CAPTCHA-bullied into acrimony.

Let me tell you that I for one no longer care for you to "fix" your censorship: many months ago I decided I had enough of your shit,I'm not filling any more CAPTCHAs, you and your customers can go ____ yourselves! :)

As another commenter said, no web site is worth tolerating your bullying. Which is not constrained to the stupid CAPTCHA loop, by the way; as was also mentioned, automated web requests are perfectly legitimate use cases, you have no right to require a human to be sitting in front of the screen at all times.

As I've been doing for the past several months, each time I see Cloudflare's advertisement/taunting interstitial, I'll just make a subconscious wish for you (the company) to die ASAP and move along to some other more respectful and welcoming site.

Oh yeah, hey Cloudflare, want to see how fine a publicity act your censorship is for you? Look at the comments section of every Tor Browser release for past bunch of months (year?), but especially since last December 10-ish when you apparently went "lol ____ these nobodies let's just block the entire Tor network, yay!".

Still, I have to admit, I doubt your customers care much for the traffic you block. As Schneier has said, the WWW βusiness model is surveillance. Your customers, Cloudflare, by and large, make their living by tracking/profiling/surveilling their visitors and then selling them to the highest bidder, usually some advertisement company. So how much would these customers care for the anonymous eyeballs of a relatively small group (in relation to the rest of the "net") of privacy-active users of a technology that attempts to destroy their βusiness model? Isn't this also your βusiness model, Cloudflar? Isn't this the very thing you do with the traffic from all those sites you MITM? I wonder who do you sell to though, hmm...

Also, jgrahamc, despite you twice stating that Project Honey Pot is a third party, several sites mention that it was created by Cloudflare's own Matthew Prince. See for example the screenvcap posted earlier by cypherpunks. That's not much of a thrid party to me.

Anyway, hope you go out of βusiness soon. Bye!

comment:77 in reply to: ↑ description ; follow-up: Changed 7 months ago by cypherpunks

Replying to ioerror:

they collude with larger surveillance companies (like Google)

Should we assume you do too?

appelbaum.net. 3600 IN MX 10 ASPMX3.GOOGLEMAIL.COM.
appelbaum.net. 3600 IN MX 1 ASPMX.L.GOOGLE.COM.
appelbaum.net. 3600 IN MX 5 ALT1.ASPMX.L.GOOGLE.COM.
appelbaum.net. 3600 IN MX 5 ALT2.ASPMX.L.GOOGLE.COM.
appelbaum.net. 3600 IN MX 10 ASPMX2.GOOGLEMAIL.COM.

comment:78 in reply to: ↑ 77 Changed 7 months ago by ioerror

Replying to cypherpunks:

Replying to ioerror:

they collude with larger surveillance companies (like Google)

Should we assume you do too?

appelbaum.net. 3600 IN MX 10 ASPMX3.GOOGLEMAIL.COM.
appelbaum.net. 3600 IN MX 1 ASPMX.L.GOOGLE.COM.
appelbaum.net. 3600 IN MX 5 ALT1.ASPMX.L.GOOGLE.COM.
appelbaum.net. 3600 IN MX 5 ALT2.ASPMX.L.GOOGLE.COM.
appelbaum.net. 3600 IN MX 10 ASPMX2.GOOGLEMAIL.COM.

Yes, you should assume that the internet is unsafe unless you take care to protect yourself. If you email me, you should assume that the FBI and other parties will read your email - they do not respect the US Constitution and they have been politically persecuting people involved with the Tor Project, WikiLeaks and other groups. Obviously, yes, that includes me. If you email me - you should take extra precautions - one of which is to ask me for another email address and to use end to end encryption.

I'm involved with fighting the DoJ in a protracted legal struggle. Google has been suing them on my behalf because I chose to park my domain there. I made that choice for exactly that reason - millions of dollars of legal support from some of the best lawyers on earth with an aligned interest. You'll note, I don't offer you service for that domain name and you'll also note that I talk about this strategy in public. I've even confronted the FBI about it in public and there is video evidence of that discussion. As a result of my choice, I've received an unknown amount of free legal defense in (effectively secret) courts fighting under seal. Every time we win or lose, Google has been working to unseal it and tell us how the world actually works. I don't have any privacy with email anyway, so I have chosen to make that worthwhile in a way that benefits everyone in the short, medium and longer term.

I understand the problems in this space very well and I am fighting it on every front available to me. CF is another terrain of struggle and it impacts people more than that specific domain name which is largely limited to my own personal privacy for that given address.

My concern about Google is not that people should not be free to use their services - it is that CF *colludes* with Google when a user has not at all consented. How many server operators know that the CAPTCHA is hosted by Google, when they use CF for "protection" services? All of them? None of them? Did anyone get a choice? Tor users certainly did not get a choice when they are automatically flagged based on an IP reputation system and then redirected to Google.

So sure - you can say that I'm colluding as long as you also consider that CloudFlare is as well. You can't say that Tor is colluding from any of this discussion. My *personal* collusion is part of a larger strategy to improve things for everyone and it only harms me, when I fail. So which collision matters to Tor users and in what way? I'd guess CF's collusion with Google is a much larger problem and if I'm wrong in my personal choices, I anticipate no fallout for you. If CF merely continues as it stands, we will continue to see a GAA for nearly the entire internet with data shared between CF and Google about people who deeply care about their privacy and anonymity.

comment:79 in reply to: ↑ 67 Changed 7 months ago by ioerror

Replying to jgrahamc:

I see the 'CAPTCHA loop' problem. Reproduced it internally (as did a couple of other people). Going to try to figure out why. That's ugly.

Fantastic to hear that you are experiencing the same issues as the rest of us. How do we ensure that it not only gets fixed but that it also never is left to our end users alone to detect these kinds of issues?

comment:80 in reply to: ↑ 63 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to nicatronTg:

But back to the point: If all Tor exits are known, why isn't there even a control panel option for customers to say "Okay, I know Tor traffic is good, allow it unconditionally"?

We will add this feature. Our customers will be able to 'whitelist' Tor so that Tor users visiting their web sites will not be challenged. This feature is coded (I couldn't talk about it earlier as had not got permission to do so) and will be released shortly.

Will that be the new default until a site decides to actively block Tor?

comment:81 Changed 7 months ago by anoncatsperson

Hi there all.

Average user here with moderate security knowledge. I registered (through 1 captcha and 7 on relogging in :/). Came in because I follow ioerror on twitter and have long been annoyed by CF in my day-to-day Tor usage. Of course its double-plus ungood for more normal users; I know this because when I try to get people to use Tor people really get turned off by doing these constantly since alot of popular sites utilize CF. I know I often just don't want to deal with doing them and holding my breath that I get through. Anecdotally, i've participated in a research project about Tor users awhile ago and that was my one complaint. Anyways, I will keep using Tor because I know better, but more average people might not in this regard. I hope someone comes up with a good solution to the Lament Configuration that is CF Captcha, but i'm not holding my breath. Thanks for reading and big ups to everyone fighting for the user.

comment:82 follow-ups: Changed 7 months ago by jgrahamc

To summarize:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.
  1. We'll roll out the ability for any CloudFlare web site to whitelist Tor so that Tor users will not see CAPTCHAs within days.
  1. We've reproduced the "CAPTCHA loop" problem and have an engineer looking into what's happening.
  1. We are in contact with Google to see if they can help us with number 2.
  1. I've asked our head of Infosec to look into an alternative CAPTCHA provider. We had already done this in the past and concluded that switching to the latest reCAPTCHA was going to be 'better'. It looks like it has not made things better.

comment:83 in reply to: ↑ 82 Changed 7 months ago by lunar

Replying to jgrahamc:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.

This is unlikely to help. As mentioned earlier, Tor Browser will use a different circuit and a different cookie for each domain name. Users will continue to be required to solve thirty different CAPTCHAs a day for each blog, news sites, and other service providers they visit.

To make an uneasy parallel, this is like street harassment. Men harassing women can happily think it's nothing because they are only doing it once, while women have to endure tens (or more) harassers every single day. It adds up fast.

Please see by yourself: use Tor Browser as your sole browser. The beautify of trying to create an anonymity set is that the software is likely to make the experience pretty similar to all users.

comment:84 follow-up: Changed 7 months ago by kbaegis

@jgrahamc First of all, thanks for coding in a new feature. From your position that's probably all that you can do safely without receiving input directly from your customers.

Second, thank you for your direct participation in this discussion.

Finally, I'd invite you to revisit the key point here, which is that your product line makes Tor unusable by many users who still want to browse the web anonymously. I understand that your company has a goal. In this specific context, the business goals are causing a legitimate harm to web users and this is something that I suggest you revisit more broadly within your organization. Surely CloudFlare has technical expertise that extends beyond "Let's fix that with captcha" and there are probably (from an engineering perspective) better ways to solve both the problems of DDoS and spam than authenticating every single session.

I'll wrap up with a question. How are you intending on rolling out this new feature? Is it going to be opt-in, opt-out, will there be an email sent to your customers about using it? I think that this is something that the community is greatly interested in.

comment:85 in reply to: ↑ 36 Changed 7 months ago by cypherpunks

Replying to jgrahamc:

I did this three weeks ago. In addition the entire company was forced for 30 days to see CAPTCHAs any time they visited a site using CloudFlare while in our offices.

There is no way they were getting the same captchas that Tor users got served with Javascript turned off. Without Javascript, the v1 catchas were unsolvable 98% of the time. The new v2 are a lot better, but still often require several attempts.

Please run some tests with IP addresses with a bad reputation (like large Tor exit nodes) and Javascript disabled.

comment:86 Changed 7 months ago by massar

Another option that CloudFlare can attempt:

That JWT signs that CDNCaptha.com proved that a captcha was signed and acceptable for use for hostedbyCDN.com

...
.

This way, captchas are all handled in one place and thus no need to keep on resolving captchas.

  • It does mean that the CDN must steal /CDNCaptacha/* url from every site, as otherwise there is no way to pass the cookies across sites (cookies don't do cross site boundaries fortunately)


  • It could attack anonimity if the same JWT is served globally and thus enable tracking based on that. Hence, it would be great if a different JWT is generated per site. Thus maybe encode the domainname in the JWT (this also ensures that the cookie was provided for that domain.
  • The CDNCaptcha.com can see all requests and domains that user is accessing. But that is the same thing that Google sees everything and can correlate requests (eg when logging in or by setting tracking cookies, or letting sites include www.google.com/ javascript etc etc) or just keeping a huge database.

The usage of JWT means that there is no state on the side of the CDN. There is state (cookies) in the client, but there is no real way around it.
As the JWTs are different,

If we standardise the cookie name, we could let TBB do a verification that the cookie's JWT content are different for each site visited, thus making sure that (except for CDNCaptcha.com who could store everything) is issuing per-domain/host cookies.

Another thing that TBB could do is warn that this special CDNCaptcha.com is used and if the user wants to solve a new captcha or re-use the previous approval.

Note that the above requires Cookies to be enabled, but it does not require any form of javascript except for the CDNCaptcha.com site, thus allowing a user to decide "I want easier captchas that require javascript, lets allow it for this specific site" (Ublock Origin+++++ unfortunately not in TBB yet...).

If we then add the Tor .onion access I mentioned in a previous comment, anonimity would be pretty well served ;)

BONUS: Let the CDNCaptcha.com service be run by Tor Project or other independent third party, so that only they know the full list. Let CDNs sigh up to that service and provide the keys that can be used to sign the JWTs, that way, the CDN can verify that it was attested that this was correct, but can't correlate the events.

Last edited 7 months ago by massar (previous) (diff)

comment:87 in reply to: ↑ 84 ; follow-ups: Changed 7 months ago by jgrahamc

Replying to kbaegis:

Finally, I'd invite you to revisit the key point here, which is that your product line makes Tor unusable by many users who still want to browse the web anonymously. I understand that your company has a goal. In this specific context, the business goals are causing a legitimate harm to web users and this is something that I suggest you revisit more broadly within your organization. Surely CloudFlare has technical expertise that extends beyond "Let's fix that with captcha" and there are probably (from an engineering perspective) better ways to solve both the problems of DDoS and spam than authenticating every single session.

I agree with this. I've kicked off an internal discussion of the best way to deal with the abuse coming from Tor (and elsewhere) that doesn't involve CAPTCHAs. We'll continue with the other things listed above as I want to have some immediate impact on this while in parallel looking for better solutions.

I'll wrap up with a question. How are you intending on rolling out this new feature? Is it going to be opt-in, opt-out, will there be an email sent to your customers about using it? I think that this is something that the community is greatly interested in.

Almost everything we announce goes on our blog so I imagine we'll do it that way. It gets emailed to people who subscribe to the blog. I don't know if it'll be emailed to all customers (mostly because we don't tend to send them a lot of email and it's the marketing group that decides). The current plan is for this to be opt-in.

comment:88 in reply to: ↑ 71 Changed 7 months ago by cypherpunks

Replying to lhi:

In other words, you're happy to overblock Tor because IP blocks are just so convenient? probably also because as far as cloudflare is concerned, we just don't matter. I don't understand why you (or jgrahamc) bother with this discussion anyway. what's in it for you?

As I understand it, they are worried that the Tor project does something about it. In order to win time before this actually happens, they keep claiming they are Tor supporters, working on it, we're in contact with Tor devs, things will improve really soon, bla bla bla ... I think it's clear now that it's not the case.

If they said fuck tor and blocked everything, they know there would be some quick reaction. With this strategy of keeping the service minimal, while engaging in discussions to give the impression they care, the Tor users are effectively blocked for a longer period of time.

I think it's time the Tor project does something to solve this without CF. Giving the user an option to be redirected to the archive.org or startpage.com proxy when facing a CF page sounds good as a first step.

comment:89 in reply to: ↑ 71 ; follow-ups: Changed 7 months ago by jgrahamc

Replying to lhi:

I don't understand why you (or jgrahamc) bother with this discussion anyway. what's in it for you?

Three reasons:

  1. Economic. A group of users (who use our customers web sites) are having trouble accessing those web sites. In this case it's Tor users, if it were "people in Brazil" or "people on BlackBerry devices" you'd likely see me get involved. That's my job (partly).
  1. Technical. Solving the spam, DoS, hacking problem for Tor is hard because of anonymity. That makes it technically interesting. If we can protect our clients from abuse through Tor while letting legitimate users browse unhindered it's a technical win.
  1. Ethical. CloudFlare has a service called Project Galileo (https://www.cloudflare.com/galileo/) where we offer free protection to at-risk public interest websites referred to us by partners like ACLU, EFF, etc. We've deflected massive DDoS attacks keeping people online whose speech is threatened.

comment:90 follow-up: Changed 7 months ago by HairyPotter

Just passing by to say... The hacktivist types with some influence may care about Tor, and want that treated as "special", but I have often had all these same problems when using a simple VPN service. (Which I use mostly for some little extra security when on random Wifi networks and also to keep my web history out of the ISP logs.)

Sure, recently I haven't got so many. But who knows what tomorrow will bring? And when it does happen... It's so frustrating. And getting it fixed is like talking to a brick wall. A faceless corp with no recourse and general suggestions that it's the website owner's fault/choice for not choosing appropriate settings (when they most likely just accepted the default).

The only way I got results is from trolling jgraham or some other employee on hacker news. But what's the regular or less motivated person to do? Nothing, they just give up.

To be stuck behind shitty captcha after shitty unsolvable squiggle captcha when you just want to read some article that literally nobody is going to DDoS anyway. (HERE'S A THING HOW ABOUT ENABLING THE RESTRICTIONS ONLY WHEN SOME ACTUAL SPIKE OF TRAFFIC APPEARS).

EEEEEEEEEEEEEK.

  1. An IP address simply does not represent a single person.
  2. It's not just Tor users who get affected by this.
  3. CloudFlare is literally the greatest evil known to mankind.

Final point:
hmm Tor folks, I got a whole lot of unsolvable captchas when trying to sign up for this Trac using Chromium incognito.

comment:91 in reply to: ↑ 82 Changed 7 months ago by ioerror

Replying to jgrahamc:

To summarize:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.

Doesn't this mean that you've now got cross circuit tracking for Tor Browser users, effectively? I assume that is by issuing a cookie that isn't tied to a given IP address - though again without any transparency, I feel like it is unclear what was actually done in any technical sense.

  1. We'll roll out the ability for any CloudFlare web site to whitelist Tor so that Tor users will not see CAPTCHAs within days.

It seems reasonable to thank you for this option, though I admit, I'm actually quite displeased with it personally. You've chosen to frame this as a positive thing when in fact, you're allowing a few people to jump through hoops while keeping the vast majority of the web censored by default. It would be possible to serve up an Always Online version with no captcha as the default behavior as a very reasonable middle ground. The default will not change and so, there is no change to the status quo.

This as a default means that by default CF will continue their censorship of Tor users who wish to read websites.

I urge you to reconsider this while your points 2 and 3 are outstanding.

  1. We've reproduced the "CAPTCHA loop" problem and have an engineer looking into what's happening.

Is there a timeline for this? Will they report back on this bug?

  1. We are in contact with Google to see if they can help us with number 2.

Does this indeed mean that Google, because of actions by CF, has data on every person prompted for a CAPTCHA?

  1. I've asked our head of Infosec to look into an alternative CAPTCHA provider. We had already done this in the past and concluded that switching to the latest reCAPTCHA was going to be 'better'. It looks like it has not made things better.

Any American third party presents similar problems as Google. On the one hand, they are a PRISM provider. On the other, they probably have the best security team in the world. Why aren't you guys just hosting your own CAPTCHA solution or proxying it to Google in such a way that Google gets nothing directly from your users?

I hope that I'm reading you wrong but it also seems like you're concluding your engagement here. I'd like to encourage you to keep engaging here - there are many outstanding questions for CloudFlare that you (or others at CF) haven't answered which help us to understand the shape of the current and future situation.

The above four points as well as a near total dismissal of all other questions, could be summed up as confirming a critical multi-month long bug with a vague promise that you guys will look into it. I really hope that this isn't the case - especially considering the other questions and the other options discussed here.

comment:92 in reply to: ↑ 87 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to kbaegis:

Finally, I'd invite you to revisit the key point here, which is that your product line makes Tor unusable by many users who still want to browse the web anonymously. I understand that your company has a goal. In this specific context, the business goals are causing a legitimate harm to web users and this is something that I suggest you revisit more broadly within your organization. Surely CloudFlare has technical expertise that extends beyond "Let's fix that with captcha" and there are probably (from an engineering perspective) better ways to solve both the problems of DDoS and spam than authenticating every single session.

I agree with this. I've kicked off an internal discussion of the best way to deal with the abuse coming from Tor (and elsewhere) that doesn't involve CAPTCHAs. We'll continue with the other things listed above as I want to have some immediate impact on this while in parallel looking for better solutions.

It would be nice if this wasn't a closed discussion with answers thrown over the wall. How can we include other people in these discussions?

I'll wrap up with a question. How are you intending on rolling out this new feature? Is it going to be opt-in, opt-out, will there be an email sent to your customers about using it? I think that this is something that the community is greatly interested in.

Almost everything we announce goes on our blog so I imagine we'll do it that way. It gets emailed to people who subscribe to the blog. I don't know if it'll be emailed to all customers (mostly because we don't tend to send them a lot of email and it's the marketing group that decides). The current plan is for this to be opt-in.

It would be nice if CloudFlare engaged directly and worked together in an open manner. Talking on this bug is a great start and I hope that we can continue this process or improve it, perhaps by switching to another open, easy to use interface, if needed.

comment:93 in reply to: ↑ 89 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to lhi:

I don't understand why you (or jgrahamc) bother with this discussion anyway. what's in it for you?

Three reasons:

  1. Economic. A group of users (who use our customers web sites) are having trouble accessing those web sites. In this case it's Tor users, if it were "people in Brazil" or "people on BlackBerry devices" you'd likely see me get involved. That's my job (partly).

It is *also* people in Brazil, though it is unlikely to be people in BlackBerry devices. :-)

  1. Technical. Solving the spam, DoS, hacking problem for Tor is hard because of anonymity. That makes it technically interesting. If we can protect our clients from abuse through Tor while letting legitimate users browse unhindered it's a technical win.

What kind of DoS can you guys possibly see through Tor? The network in total capacity has to be less than a tiny fraction of the capacity at *one* of your PoPs.

Could you please give us actual data here? I've seen some basic CF API data - what is exposed seems to be quite minimal. As far as I can tell - the main data is score data that is from project honeynet. That has a lot of history that is extremely problematic in my view.

  1. Ethical. CloudFlare has a service called Project Galileo (https://www.cloudflare.com/galileo/) where we offer free protection to at-risk public interest websites referred to us by partners like ACLU, EFF, etc. We've deflected massive DDoS attacks keeping people online whose speech is threatened.

There is a tradeoff here which is unsaid along with some other stuff that is said often. You guys are clearly doing good by keeping those folks online and I think it is important to help with that problem. The unsaid trade off is that you're also performing content inspection, over blocking Tor users and have effectively full surveillance of those sites. Exploit data can be intercepted and gathered, studied and then used. Those at risk parties are not just a matter of ethics, they are a source of surveillance capital for CloudFlare which is useful for generating so-called "threat" scores as well as other data. I assume that 0days found in that process are submitted to CERT, the same CERT that exploited Tor Hidden Service users, I might add.

In short - those at risk services are paying for this protection with their user/attacker data which is extracted with surveillance by CloudFlare. It may be ethical in motivation but unless I completely misunderstand the monitoring by CloudFlare of its own network, it appears to be sustained with surveillance more than pure good will.

comment:94 follow-up: Changed 7 months ago by seanrose

Why is CF even blocking Tor on sites that don't historically receive abusive traffic from Tor IPs in the first place? The "whitelist tor IPs" thing should be the default on all sites and only turned off when significant abusive traffic patterns are detected from Tor IPs.

comment:95 in reply to: ↑ 90 Changed 7 months ago by ioerror

Replying to HairyPotter:

Final point:
hmm Tor folks, I got a whole lot of unsolvable captchas when trying to sign up for this Trac using Chromium incognito.

The irony here is not lost on anyone.

comment:96 in reply to: ↑ 94 Changed 7 months ago by ioerror

Replying to seanrose:

Why is CF even blocking Tor on sites that don't historically receive abusive traffic from Tor IPs in the first place? The "whitelist tor IPs" thing should be the default on all sites and only turned off when significant abusive traffic patterns are detected from Tor IPs.

If the main data is really project honeynet - there are some exits with a "threat score" of 0 and some with a none zero score. It appears that this data isn't tied to specific sites, it is just a single dimension of data based on the ip address. That suggests a very unsophisticated analysis, so I must be missing some critical detail of how and when CF's censorship trigger is pulled.

comment:97 Changed 7 months ago by ioerror

It is worth noting that this issue does not only impact Tor and Tor Browser users. It also impacts VPN users and "carrier grade" NAT users too. Effectively these three (NAT, VPN, Tor) classes of users are often without other options.

NAT users are sometimes the victims of captured regulatory situations - this is commonly the case when the only or primary upstream is a national telecom. VPN users are often people escaping from such a NAT situation, among many other similar but different, contexts. Tor is much the same with an added detail of being Free of cost, distributed, and decentralized in nature.

All of these users need to have relief from this awfully frustrating censorship situation. Very few of them will have a voice because of the disparate nature of those other networks and providers. While it is important to treat Tor specially - Tor is part of a group of classes that have related and valid concerns. As we approach a world with less and less IPv4, I think the IP based approach to analytics will fail more and more. The same is also true for IPv6, I suspect - new IP addresses are effectively free - a per IP reputation score may or may not even be a concern in the future. I suppose that this may make the problem worse: will entire blocks get *one* score? Or will it make it better because...?

comment:98 follow-up: Changed 7 months ago by cypherpunks

Back to the read-only-for-TOR-exit-nodes idea, what if torbutton were modified to emit a HTTP header depending on user preference indicating that the user wanted to actually be able to POST to whatever site that was X-Served-By: CloudFlare or whatever and then be served a CAPTCHA?

comment:99 in reply to: ↑ 98 Changed 7 months ago by ioerror

Replying to cypherpunks:

Back to the read-only-for-TOR-exit-nodes idea, what if torbutton were modified to emit a HTTP header depending on user preference indicating that the user wanted to actually be able to POST to whatever site that was X-Served-By: CloudFlare or whatever and then be served a CAPTCHA?

That seems like a non-starter - why not just allow CF to serve it up and hook it? They're a MITM after all, they can actually do that without any end user software modifications at all.

comment:100 in reply to: ↑ 82 ; follow-up: Changed 7 months ago by madD

Replying to jgrahamc:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.

Does this fixed CAPTCHA record users' reaction time, order of clicks, or mouse movements?

comment:101 in reply to: ↑ 82 ; follow-up: Changed 7 months ago by garrettr

Replying to jgrahamc:

To summarize:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.

For what it's worth, at least of right now, this is still an issue. To demonstrate, I recorded a video of myself using Tor Browser to access https://cloudflare.com, which is naturally behind Cloudflare.

Video: https://youtu.be/HIDhYHCwUEs

This video demonstrates two things:

  1. The CAPTCHAs vary in their difficulty to solve, but can be quite onerous (see the first CAPTCHA that I have to solve, which takes over a minute to do).
  2. The "authenticated as a human user" state sometimes persists over circuit changes, but not always. As you can see in the video, I change my Tor circuit for cloudflare.com, and I am able to access it without re-doing a captcha. However, upon changing the circuit again, I am asked to do another CAPTCHA. After solving that CAPTCHA, and changing the Tor circuit yet again, I am forced asked to do yet another CAPTCHA.

comment:102 Changed 7 months ago by phw

I am helping out a group of researchers that developed a scheme that allows a service provider (say, CloudFlare) to sign a set of tokens for clients (say, Tor Browser) after the client proved that it is human (say, by solving a CAPTCHA). These tokens can subsequently be spent when revisiting the service provider without having to prove again that you're human. The tokens are created by the client, and signed by the service provider using blind signatures. Therefore, the service provider is unable to link tokens across sites. There are still many details to flesh out but this could be a medium to long term solution, albeit at the cost of having some kind of protocol between client and service provider.

comment:103 Changed 7 months ago by kbaegis

Finally, I'd invite you to revisit the key point here, which is that your product line makes Tor unusable by many users who still want to browse the web anonymously. I understand that your company has a goal. In this specific context, the busi ness goals are causing a legitimate harm to web users and this is something that I suggest you revisit more broadly within your organization. Surely CloudFlare has technical expertise that extends beyond "Let's fix that with captcha" and there are probably (from an engineering perspective) better ways to solve both the problems of DDoS and spam than authenticating every single session.

I agree with this. I've kicked off an internal discussion of the best way to deal with the abuse coming from Tor (and elsewhere) that doesn't involve CAPTCHAs. We'll continue with the other things listed above as I want to have some immediate impact on this while in parallel looking for better solutions.

I agree with Jacob here. The Tor community can likely give you unique expertise if they're given a forum to do so. Currently, they had to open a ticket to get your attention- hence the above discussion. I'd also seriously look into how you are addressing DDoS from the network layer (specifically your edge router/firewall/load balancing configurations), how you scale your client infrastructure elastically, and specifically how you define your threat model. Two subpoints: your own engineer has admitted that captcha is a terrible isn't a salable way to address this problem, stating "we struggle to even serve captchas" [edit: 'while under attack']. So I'd challenge that this is an effective solution for DDoS. Second, I'm with several others here seriously questioning the SNR and throughput constraints around blanket allowance of Tor infrastructure. It's like using a hatchet to remove a fly from your friends forehead. Small problem, oblique solution.  Please remember that exit nodes are communal, so pretend for example that every time you wanted to blacklist a /32 ipv4 address, instead you were blacklisting an entire /24 public network. 

I'll wrap up with a question. How are you intending on rolling out this new feature? Is it going to be opt-in, opt-out, will there be an email sent to your customers about using it? I think that this is something that the community is greatly interested in.

Almost everything we announce goes on our blog so I imagine we'll do it that way. It gets emailed to people who subscribe to the blog. I don't know if it'll be emailed to all customers (mostly because we don't tend to send them a lot of email and it's the mark eting group that decides). The current plan is for this to be opt-in.

I think that this marginalizes the issue. Offering a feature that most customers would have to voluntarily opt into and likely don't know about (because they'd have to be looking for it to find it) is a waste of everyone's time- particularly a CTO. If your goal is to find a solution, this patently isn't it if it's going to be unannounced and opt-in.  [Added:]  I would expect that only website owners contacted specifically by Tor users would even be aware that this feature was available and could be enabled. 

Last edited 7 months ago by kbaegis (previous) (diff)

comment:104 follow-up: Changed 7 months ago by ford

I think it's great that CloudFlare is participating in this discussion and working to address the most immediate pain points. Especially given the amount of vitriol getting thrown their way.

But the larger issue is not remotely specific to CloudFlare. Remember way back when Wikipedia allowed anonymous edits without logins, even by Tor users? Or even farther back, when USENET was the thing but then died a heat death from uncontrollable spam and abuse, forcing everyone to scurry away to private mailing lists and walled-garden discussion websites? Many websites and other online services would like to support privacy and anonymity, but most can't afford to spend all their time and financial resources dealing with anonymous abuse.

In the longer term I think a deployable anonymous credential system of some kind is essential. Blacklistable long-term credentials are definitely worth exploring further, but incur a lot of complexity and I don't think anyone knows yet how to make them highly usable. The approach phw mentions sounds promising and I'd like to hear more about it.

Giving users a bucket of anonymous tokens for solving a CAPTCHA may be a reasonable and arguably tractable starting-point (or stopgap) measure. But there are many other ways anonymous credentials could be produced and other useful "foundations of trust" for them, and I definitely sympathize with Tor users who feel like they're being treated as CAPTCHA-solving machines. There needs to be a clear roadmap from CAPTCHA-based credentials to something-else-based credentials.

Some other particular possibilities:

  • Anonymous credentials attesting that, e.g., "I am a user with a Twitter account who has been around at least 1 hear and has at least 100 followers". In other words, build on the investment that the big social media companies make all the time to detect and shutdown abusive or automated accounts. This basis is not remotely perfect obviously, but pragmatically, social media identities that have "survived a while" and have friends/followers are much more expensive on the black market than fresh Sybil accounts created by paid CAPTCHA-solvers. Thus, users who can produce better anonymous evidence that they're "real" might get a bigger pile or faster rate of anonymous tokens than they do just by solving a CAPTCHA. My group started exploring this approach in our Crypto-Book project (http://dedis.cs.yale.edu/dissent/papers/cryptobook-abs) but there are certainly gaps to be filled to make it practical.
  • Anonymous credentials that can attest with even higher certainty that they represent "one and only one real person", e.g., credentials derived from pseudonyms distributed at physical pseudonym parties (see http://bford.info/pub/net/sybil.pdf and http://bford.github.io/2015/10/07/names.html). No one would be "required" to participate in such a system, but those that do might be able to get an even bigger pile or faster flow of tokens on the basis of demonstrating with higher certainty that they're one and only one real person. Further, this seems like ultimately the only kind of basis that might provide a legitimate "democratic foundation": e.g., a basis that would allow Tor to hold online polls or votes and be reasonably certain that each real human got one and only one vote.
  • Anonymous credentials based on reputation scores that users exhibiting "good/civil behavior" can build up over time. Basically, use a "carrot" approach rather than the "stick" approach that blacklistable credentials tend to represent. We're also starting to explore ideas in this space; see our upcoming NSDI paper on AnonRep (http://dedis.cs.yale.edu/dissent/papers/anonrep-abs).

At any rate, the problem is definitely not at all simple; we need to start with baby steps (e.g., the CF+Google looping bug, then maybe a simple CAPTCHA-based credential scheme). But in the longer term we need an architecture flexible enough to deal with abuse while allowing well-behaved users to demonstrate as such in multiple different ways based on multiple different trust foundations.

P.S. To underscore the problem, I had to rewrite parts of this post twice already, because of trac.torproject.org deciding it looks like spam and rejecting it - and making me solve CAPTCHAs to prove otherwise. Pot, meet kettle.

Last edited 7 months ago by ford (previous) (diff)

comment:105 follow-ups: Changed 7 months ago by sjmurdoch

We have done a survey of this problem and the results are published in the paper "Do You See What I See? Differential Treatment of Anonymous Users" (see also the accompanying blog post). Our results show that Cloudflare blocks around 10–50% of Tor nodes and is one of the major reasons that Tor users are unable to access websites from the Alexa top-1000 list (though it was worse before June 2015).

comment:106 in reply to: ↑ 105 Changed 7 months ago by ford

Replying to sjmurdoch:

We have done a survey of this problem and the results are published in the paper "Do You See What I See? Differential Treatment of Anonymous Users"

Indeed, great work, and I'm looking forward to the talk this afternoon here at NDSS. :)

comment:107 in reply to: ↑ 105 ; follow-up: Changed 7 months ago by cypherpunks

Replying to sjmurdoch:

We have done a survey of this problem and the results are published in the paper "Do You See What I See? Differential Treatment of Anonymous Users" (see also the accompanying blog post). Our results show that Cloudflare blocks around 10–50% of Tor nodes and is one of the major reasons that Tor users are unable to access websites from the Alexa top-1000 list (though it was worse before June 2015).

Obviously you are not a Tor user, otherwise you would have immediately realised the manifest falsehood of your assertion.

Your data is already outdated, your conclusions, at least in this topic, stale. Maybe 10-50% was a reasonable estimate when you did your data acquisition, not anymore.

Since around Dec. 2015 Cloudflare almost certainly blocks all exit nodes. If on some exceedingly rare occasion we aren't blocked is because (a) the Tor network, and its consensus, is dynamic and cloudflare obviously doesn't do live probing, it probably keeps a cached blacklist that only updates every few weeks or so; (b) something that some cypherpunks and people who actually have bothered to set up nodes know.

comment:108 in reply to: ↑ 107 ; follow-ups: Changed 7 months ago by sjmurdoch

Replying to cypherpunks:

Obviously you are not a Tor user, otherwise you would have immediately realised the manifest falsehood of your assertion.

Actually I am a Tor user.

Since around Dec. 2015 Cloudflare almost certainly blocks all exit nodes. If on some exceedingly rare occasion we aren't blocked is because (a) the Tor network, and its consensus, is dynamic and cloud flare obviously doesn't do live probing, it probably keeps a cached blacklist that only updates every few weeks or so; (b) something that some cypherpunks and people who actually have bothered to set up nodes know.

You might feel the blocking rate is higher than it actually is because the blocking probability is proportional to the probability of the node being selected. So there are plenty of Tor nodes that are unblocked, but the chances of you selecting them are very small.

comment:109 in reply to: ↑ 108 ; follow-ups: Changed 7 months ago by cypherpunks

Replying to sjmurdoch:

You might feel the blocking rate is higher than it actually is because the blocking probability is proportional to the probability of the node being selected. So there are plenty of Tor nodes that are unblocked, but the chances of you selecting them are very small.

I'm not convinced.

I haven't read your paper, just gave it a quick skim, but I'm assuming you infer this from the way Tor does path selection (not really randomly), and not from any blocking criteria by Cloudflare. If that's the case then that probability wouldn't have changed last December. However blocking rates did. Very abruptly.

Have you acquired new data since Dec. 2015? If you haven't I suggest you do it and amend your publications accordingly.

jgrahamc: Could you confirm what/whether you changed?

comment:110 follow-up: Changed 7 months ago by cypherpunks

@jgrahamc

While we're at it take a look at how broken capthcas in Orfox for Android are. Without javascript its shows nothing and with it the check boxes don't register anything.

comment:111 in reply to: ↑ 87 Changed 7 months ago by lhi

Replying to jgrahamc:

Replying to kbaegis:

The current plan is for this to be opt-in.

Please! Everyone knows ordinary people don't touch defaults in the absence of a very convincing, direct motivation. I appreciate your proposal but it's too timid. A lot like sprinkling droplets of water on the hot stone. This one needs bucketfuls. Douse it, drench it, boldly quench it. Don't play for time. That's the only way to make the complaints stop.

comment:112 in reply to: ↑ 104 Changed 7 months ago by lhi

Replying to ford:

I think it's great that CloudFlare is participating in this discussion and working to address the most immediate pain points. Especially given the amount of vitriol getting thrown their way.

I agree. Though technically a global active adversary, they're neither unapproachable (they're facing the complaints they've caused) nor as fiendishly inimical to us as certain state-level actors (who may however already have decided to co-opt their tentacles, who knows).

But the larger issue is not remotely specific to CloudFlare. Remember way back when Wikipedia allowed anonymous edits without logins, even by Tor users? Or even farther back, when USENET was the thing but then died a heat death from uncontrollable spam and abuse, forcing everyone to scurry away to private mailing lists and walled-garden discussion websites? Many websites and other online services would like to support privacy and anonymity, but most can't afford to spend all their time and financial resources dealing with anonymous abuse.

Wikipedia is a nutcase. It merits another ticket. I'm not even asking for anonymous contribution or being allowed to correct small mistakes anymore (where research on anonymous trust tokens could come handy), no, I'm not allowed to use my established username, let alone a new one, at all unless forgoing Tor. That doesn't even make sense.

Thanks for sharing your research! I'm going to read it because it is an extremely interesting subject with fine applications to every single one of the other domains you mentioned (polls; Wikipedia - theoretically but not in bureaucratic practice; etc.). I just don't think it's the solution to the problem at hand, which in my opinion is: In the absence of ongoing large-scale attacks, Cloudflare should just serve the damn page, better static than not at all, and not give us bullshit about how this is not possible.

For me, the rest of the original ticket boils down to

1) We all know that the web and the internet it is built on, are fundamentally broken at an architectural level. As long as: DNS is around, servers are insecure, proper end-to-end crypto isn't the norm hence MITM goes unnoticed, anonymity is an edge case, routing lacks built-in resiliency to disruption, we're always going to have actors building a bus.ness model around cobbling together superficial, overapproximating mitigations.

It's nice of them to build workarounds, it would be nicer still to see them relegating Threat Scores and IP-based blocking to the dustbin of history where it belongs, but we can't expect them to retract their tentacles which will continue to suck in as much data as they can get.

2) They will be able to suck considerably less data out of anonymous users when not allowed to execute Javascript. Hence whatever workaround they choose, it must work exactly the same without Javascript.

3) Warning the user UI-wise? There are already add-ons which allow fine-grained control over connections to Google and so on.

Last edited 7 months ago by lhi (previous) (diff)

comment:113 in reply to: ↑ 89 Changed 7 months ago by lhi

Replying to jgrahamc:

  1. Ethical. CloudFlare has a service called Project Galileo (https://www.cloudflare.com/galileo/)

Say, couldn't you perhaps get the CAPTCHA disabled so I can actually have a look at it?

Last edited 7 months ago by lhi (previous) (diff)

comment:114 in reply to: ↑ 101 ; follow-up: Changed 7 months ago by jgrahamc

Replying to garrettr:

Replying to jgrahamc:
For what it's worth, at least of right now, this is still an issue. To demonstrate, I recorded a video of myself using Tor Browser to access https://cloudflare.com, which is naturally behind Cloudflare.

Le sigh.

You are correct. I've reproduced myself here using TorBrowser 5.5.2 (JavaScript enabled; not that that matters as we don't use JavaScript to decide on the CAPTCHA serving). Raising this internally to figure out why.

comment:115 in reply to: ↑ 110 Changed 7 months ago by jgrahamc

Replying to cypherpunks:

@jgrahamc

While we're at it take a look at how broken capthcas in Orfox for Android are. Without javascript its shows nothing and with it the check boxes don't register anything.

Will do. I've told the appropriate person.

comment:116 in reply to: ↑ 100 ; follow-up: Changed 7 months ago by jgrahamc

Replying to madD:

Replying to jgrahamc:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.

Does this fixed CAPTCHA record users' reaction time, order of clicks, or mouse movements?

The fix I am talking about does not involve JavaScript or any of those things at all.

comment:117 in reply to: ↑ 109 ; follow-up: Changed 7 months ago by jgrahamc

Replying to cypherpunks:

jgrahamc: Could you confirm what/whether you changed?

We didn't make code changes around this, but looking at the trend in abuse coming from Tor exit nodes there has been a steady increase in the percentage of exits through which we see abuse since December. Here's a chart for the last 90 days showing trend from the Project Honeypot data.

http://i.imgur.com/iwT7pA0.png

comment:118 in reply to: ↑ 109 ; follow-up: Changed 7 months ago by jgrahamc

Replying to cypherpunks:

jgrahamc: Could you confirm what/whether you changed?

We did not make changes in that time period, but there has been an increase in abuse from Tor exit nodes over time and that may account for what you saw. Here's a chart of the last 90 days: http://i.imgur.com/iwT7pA0.png

comment:119 follow-up: Changed 7 months ago by jgrahamc

Sorry for the double post there. Got stuck in a loop of CAPTCHAs on this site and was unable to submit.

comment:120 in reply to: ↑ 108 Changed 7 months ago by cypherpunks

Replying to sjmurdoch:

You might feel the blocking rate is higher than it actually is because the blocking probability is proportional to the probability of the node being selected. So there are plenty of Tor nodes that are unblocked, but the chances of you selecting them are very small.

Your test data is outdated. Testing against change.org (fronted by CloudFlare), right now 92% of the exit nodes are blocked (7.5% time out/completely block connections, the rest serves captchas). Most of the non-blocked exit nodes are tiny, so 99.9% of the Tor traffic will be blocked taking the exit node selection algorithm into account.

comment:121 in reply to: ↑ 117 ; follow-up: Changed 7 months ago by lunar

Replying to jgrahamc:

Replying to cypherpunks:

jgrahamc: Could you confirm what/whether you changed?

We didn't make code changes around this, but looking at the trend in abuse coming from Tor exit nodes there has been a steady increase in the percentage of exits through which we see abuse since December.

Could you run similar comparisons with Internet access providers using Carrier-grade NAT? Could you tell use what qualifies as abuse?

If abuse is “attacking” your honeypots then your system is simply going to fail hard in the next year. Even older Internet access providers are being forced to do Carrier-grade NAT as IPv4 are now a scarce resource. You can't punish millions of users as soon as one bad actor hits your honeypots.

Tor is just being slightly ahead of what the IPv4 Internet is going to look like pretty soon.

comment:122 in reply to: ↑ 121 Changed 7 months ago by jgrahamc

Replying to lunar:

Could you run similar comparisons with Internet access providers using Carrier-grade NAT? Could you tell use what qualifies as abuse?

Abuse: comment spamming, harvesting email addresses, attacking web applications (e.g. SQL injection), HTTP DoS (exploiting slow web servers/applications to knock them offline). I'm not interested in L3/L4 DoS and Tor as that's non-existent (unless then exit node is separately part of a botnet).

Tor is just being slightly ahead of what the IPv4 Internet is going to look like pretty soon.

I agree with the sentiment. As I said in an earlier comment I've kicked off an internal project to get us off the use of CAPTCHA for the types of abuse seen above. Related news article with comments from my boss (the CEO): http://www.theregister.co.uk/2016/02/24/cloudflare_may_stop_captcha_tor_users/

comment:123 in reply to: ↑ 118 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to cypherpunks:

jgrahamc: Could you confirm what/whether you changed?

We did not make changes in that time period, but there has been an increase in abuse from Tor exit nodes over time and that may account for what you saw. Here's a chart of the last 90 days: http://i.imgur.com/iwT7pA0.png

Could you give us some actual absolute numbers here? This chart is without context or even absolute numbers where we may compare with some other sources of information.

comment:124 in reply to: ↑ 123 ; follow-ups: Changed 7 months ago by jgrahamc

Replying to ioerror:

Could you give us some actual absolute numbers here? This chart is without context or even absolute numbers where we may compare with some other sources of information.

That's the percentage of Tor exit nodes. I can give you the absolute number of Tor exit nodes at each measurement point corresponding to the % number there if it helps, although I'm not sure what you really learn from that other than "the number of Tor exit nodes varies every day".

comment:125 in reply to: ↑ 124 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

Could you give us some actual absolute numbers here? This chart is without context or even absolute numbers where we may compare with some other sources of information.

That's the percentage of Tor exit nodes. I can give you the absolute number of Tor exit nodes at each measurement point corresponding to the % number there if it helps, although I'm not sure what you really learn from that other than "the number of Tor exit nodes varies every day".

Wait, didn't you say above that the graph was an increase in abuse?

It would be nice, if you could show us data that compares carrier grade NAT for a similar quantity of users, and then to back up the data beyond "project honeynet" offered "threat scores" for us to understand it in detail.

I tried to read the Register article but was presented with a broken captcha stuck in an endless loop. Argh.

comment:126 follow-up: Changed 7 months ago by ioerror

Another comment about the broken CloudFlare captchas is that they're always in English for me. Is that always the case? For those who don't speak English, they're even more confused when they are censored with a looping and thus broken captcha security solution...?

comment:127 Changed 7 months ago by ioerror

There are a number outstanding issues directed at CloudFlare in the above thread - it would be really wonderful if someone from CloudFlare could address these questions.

comment:128 in reply to: ↑ 125 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

It would be nice, if you could show us data that compares carrier grade NAT for a similar quantity of users, and then to back up the data beyond "project honeynet" offered "threat scores" for us to understand it in detail.

I don't have that data because it's not something we monitor.

I only have data on Tor because there's a public list of exit nodes and because I wrote code to pull information on Tor (see github above) because the Tor community was upset at CAPTCHAs. I also don't know how large the Tor user base is so even if I did have the data I couldn't make the comparison.

comment:129 Changed 7 months ago by naif

My point on that is:

a) Tor should enable Tor Relay Operators to reduce portscan and web attacks (see #18142 and #17975)

b) Cloudflare must implement a Proof of Work and Dynamic attack threshold detection system for traffic coming from Tor (see https://lists.torproject.org/pipermail/tor-talk/2016-January/040011.html), relying on Captcha only as a last resort

Last edited 7 months ago by naif (previous) (diff)

comment:130 in reply to: ↑ 114 Changed 7 months ago by jgrahamc

Replying to jgrahamc:

Replying to garrettr:

Replying to jgrahamc:
For what it's worth, at least of right now, this is still an issue. To demonstrate, I recorded a video of myself using Tor Browser to access https://cloudflare.com, which is naturally behind Cloudflare.

Le sigh.

You are correct. I've reproduced myself here using TorBrowser 5.5.2 (JavaScript enabled; not that that matters as we don't use JavaScript to decide on the CAPTCHA serving). Raising this internally to figure out why.

Replying to myself, but... this looks like it's a caching bug where the fact that an IP is currently a Tor exit node is not being cached correctly. Fix is in and should be pushed to production today.

comment:131 in reply to: ↑ 126 Changed 7 months ago by mmarco

Replying to ioerror:

Another comment about the broken CloudFlare captchas is that they're always in English for me. Is that always the case? For those who don't speak English, they're even more confused when they are censored with a looping and thus broken captcha security solution...?

I would jsay it deppends on the language that TBB is configured to request the content. By changing it to spanish, I get the "One more step..." web page in english, but the captcha question is in spanish.

So yes, users that don't speak english would get some extra confussion, but at least they can read the captcha challenge.

comment:132 Changed 7 months ago by mmarco

Addendum: of course, that comes at the prize of leaking some extra bit of information: the fact that they have their browser configured to request a specific language.

comment:133 in reply to: ↑ 128 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

It would be nice, if you could show us data that compares carrier grade NAT for a similar quantity of users, and then to back up the data beyond "project honeynet" offered "threat scores" for us to understand it in detail.

I don't have that data because it's not something we monitor.

You don't have other threat scores for other IP addresses? You could look at the country wide proxy list that Wikipedia keeps for X-Forwarded-For style proxies - for example. Though I'm surprised - you don't have *any* data on carrier grade NAT IP ranges?

Here is a list of ip addresses of major proxies - such as say, all users of Opera Mini which is probably more users than Tor Browser hitting CloudFlare:

https://meta.wikimedia.org/w/extensions/TrustedXFF/trusted-hosts.txt

One can read about those proxies here:

https://meta.wikimedia.org/wiki/XFF_project

I only have data on Tor because there's a public list of exit nodes and because I wrote code to pull information on Tor (see github above) because the Tor community was upset at CAPTCHAs. I also don't know how large the Tor user base is so even if I did have the data I couldn't make the comparison.

We publish a great deal of data in a privacy preserving manner: https://metrics.torproject.org

Could you please make some comparisons of the abuse in question? Is CloudFlare really just using Project Honeynet data here?

comment:134 in reply to: ↑ 124 ; follow-up: Changed 7 months ago by lunar

Replying to jgrahamc:

Replying to ioerror:

Could you give us some actual absolute numbers here? This chart is without context or even absolute numbers where we may compare with some other sources of information.

That's the percentage of Tor exit nodes.

If I understood correctly, this chart show individual Tor exit nodes who have been spotted doing at least one of the “abuse” you describe earlier. If that's correct, then it's just totally meaningless data. Tor users can control which exit node they use. So this could just be a single Tor user doing a single attempt at SQL injection repeatedly over different exit nodes.

Please stop framing this as “Tor traffic is 90% abuse”. This is not what these numbers are telling us.

Last edited 7 months ago by lunar (previous) (diff)

comment:135 in reply to: ↑ 133 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

You don't have other threat scores for other IP addresses? You could look at the country wide proxy list that Wikipedia keeps for X-Forwarded-For style proxies - for example. Though I'm surprised - you don't have *any* data on carrier grade NAT IP ranges?

Nope. We do not have special treatment for groups of IP addresses other than Tor and only for Tor because of widespread complaints from Tor users. We have scores and data for individual IP addresses.

We publish a great deal of data in a privacy preserving manner: https://metrics.torproject.org

Thanks

Could you please make some comparisons of the abuse in question? Is CloudFlare really just using Project Honeynet data here?

We are not just using Project Honeypot but it's an input. I'm using it here because all of us can look at their data and draw conclusions about any IP address or group of IP addresses.

comment:136 in reply to: ↑ 134 ; follow-up: Changed 7 months ago by jgrahamc

Replying to lunar:

If I understood correctly, this chart show individual Tor exit nodes who have been spotted doing at least one of the “abuse” you describe earlier. If that's correct, then it's just totally meaningless data. Tor users can control which exit node they use. So this could just be a single Tor user doing a single attempt at SQL injection repeatedly over different exit nodes.

No, this data is only for Tor exit nodes that are comment spamming. It's not "one person did a bad thing once on an exit node".

Please stop framing this as “Tor traffic is 90% abuse”. This is not what these numbers are telling us.

I never said that. Don't put words in my mouth.

comment:137 in reply to: ↑ 135 ; follow-up: Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

You don't have other threat scores for other IP addresses? You could look at the country wide proxy list that Wikipedia keeps for X-Forwarded-For style proxies - for example. Though I'm surprised - you don't have *any* data on carrier grade NAT IP ranges?

Nope. We do not have special treatment for groups of IP addresses other than Tor and only for Tor because of widespread complaints from Tor users. We have scores and data for individual IP addresses.

OK - but for example - VPN services - or the Wikipedia X-Forwarded-For IP ranges - what data do you have on those? Do you see a threat score that is higher than really large Tor exit nodes used by millions of people (machines, bots, actual people, etc) daily?

We publish a great deal of data in a privacy preserving manner: https://metrics.torproject.org

Thanks

Sure, you're welcome - you can probably do neat metrics data comparisons too. Especially during censorship events, I expect you'll see interesting data.

Could you please make some comparisons of the abuse in question? Is CloudFlare really just using Project Honeynet data here?

We are not just using Project Honeypot but it's an input. I'm using it here because all of us can look at their data and draw conclusions about any IP address or group of IP addresses.

It is hard to address abuse if we cannot understand what that word means. I think Project Honeypot data is more of an art than a just process or even a fully explained science. If that is the only bit of data we'll see, I'm quite unhappy as it is effectively "trust us" as an answer.

comment:138 in reply to: ↑ 136 Changed 7 months ago by lunar

Replying to jgrahamc:

Replying to lunar:

If I understood correctly, this chart show individual Tor exit nodes who have been spotted doing at least one of the “abuse” you describe earlier. If that's correct, then it's just totally meaningless data. Tor users can control which exit node they use. So this could just be a single Tor user doing a single attempt at SQL injection repeatedly over different exit nodes.

No, this data is only for Tor exit nodes that are comment spamming. It's not "one person did a bad thing once on an exit node".

Same thing. It could be just one person using all Tor exit nodes in turns to attempt comment spamming. It is not a good metric to reflect about Tor users and Tor traffic.

Please stop framing this as “Tor traffic is 90% abuse”. This is not what these numbers are telling us.

I never said that. Don't put words in my mouth.

According to The Register article, your boss did: According to Prince, third-party figures have suggested than more than 90 per cent of Tor traffic […] is traffic that is actively trying to hurt the websites it is visiting. Please read my comment as addressed to all readers of your graph.

comment:139 in reply to: ↑ 137 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

Replying to jgrahamc:

We are not just using Project Honeypot but it's an input. I'm using it here because all of us can look at their data and draw conclusions about any IP address or group of IP addresses.

It is hard to address abuse if we cannot understand what that word means.

I've said a few times what abuse means. It means things like SQL injection, comment spamming, harvesting email addresses and HTTP DoS that exploits slowness on a web server to knock it over.

I think Project Honeypot data is more of an art than a just process or even a fully explained science.

They publish all their data and you can look up any IP to see what they've seen from that IP. I'm not sure how to make progress on this then. Do you have an alternative source of information that would help measure abuse coming through Tor?

comment:140 in reply to: ↑ 116 Changed 7 months ago by madD

Replying to jgrahamc:

Replying to madD:

Replying to jgrahamc:

  1. We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.

Does this fixed CAPTCHA record users' reaction time, order of clicks, or mouse movements?

The fix I am talking about does not involve JavaScript or any of those things at all.

But I was not asking about the particular bug fix, you realize. The subject of my repeated question is the ability of CF CAPTCHA to capture response of users such as mouse movements, reaction time, order of checkbox selection. Is this kind of information transferred out of users device by submitting a CF CAPTCHA?

comment:141 follow-up: Changed 7 months ago by torhp

I looked into the project honey pot data and I don't find it to be very supportive of the "Tor is a source of abuse" hypothesis. Certainly not in the sense that it can be used to justify blocking Tor users.

So I looked at the list of XFF proxies someone linked to above and coincidentally I found Singapore's number one ISP near the top of the list which piqued my curiosity.

I used to live in Singapore and at that time I was using Tor pretty much daily. I can tell you that as a residential clearnet internet user, I don't remember once coming across the cloudflare captcha problem. As a Tor users of course I did get locked out of websites by cloudflare though, so comparing honeypot numbers for Tor versus Singapore ISP's NAT hardware is interesting to me. Let's get down to it.

First of all, the ISP alluded to above is Singtel, but I was actually a customer of Starhub (Singapore's number 2 ISP), but I found them in the honeypot data too and checked their scores. Their two listed IPs have threat scores of 40 and 26.

Two IP addresses isn't a huge amount though, so I checked out a couple more - I found an IP listed as being the outbound proxy for Vietnam's state owned ISP. They only have one IP listed so it may be a single carrier grade NAT device for the whole country - Vietnam I believe has a national firewall so that seems possible. Their score was 57. I checked one more IP which was one belonging to an ISP in Thailand. Its score was 30.

I then pseudo randomly selected (scroll, point and click) four Tor fast exit nodes from torstatus.blutmagie.de Their scores were 50, 42, 40 & 41.

To summarise:

Starhub 1(Singapore): 40
Starhub 2(Singapore): 26
Vietnam: 57
Thailand: 30

Tor Fast Exit 1: 50
Tor Fast Exit 2: 42
Tor Fast Exit 3: 40
Tor Fast Exit 4: 41

Limited samples not withstanding, the results are pretty interesting. Vietnam which apparently has one public IP address for the whole country has a worse threat score than the Tor exits. Is anyone under the impression that Cloudflare breaks the internet for the whole of Vietnam in the same way they do for Tor users? It is news to me if so. The other inference is that public shared IP addresses are prone to having high threat scores in general, which seems obvious.

I would like to get greater clarity from Cloudflare on how they interpret these threat numbers, and they have done a good job of engaging so far so hopefully we might get something. We have heard that Tor is not singled out specifically, but rather that it is treated as a source of abuse as per these threat scores. So how? If a whole country is behind a carrier grade NAT with a higher threat score than typical Tor exit nodes, is that country being treated as a threat / abuse source similar to Tor? Do they get unsolvable Captchas with a similar frequency as Tor users? What else feeds into this heuristic?

comment:142 follow-up: Changed 7 months ago by SatoshiNakamoto

1) There's some metrics above about the % of tor nodes that abuse is coming from, but unless I missed it, there's no details of % of *all traffic* that is abuse. Lunar made the point that we should be comparing these numbers to the abuse numbers for carrier-grade NAT but we should also compare to the baseline of a typical IP. We should expect some kind of relationship like tor &gt; carrier-nat &gt; otherwise, but is this actually what we see? Even granting the technically infeasible goal of stopping this abuse, it's gotta be in context. What is p(abuse | tor) / p(abuse | non-tor) ?

2) The register article is also behind cloudfare, so if you're expecting us to read it and get any information from it, you may be sorely disappointed. I'm batting like 3 for 250+ today for loading pages through cloudfare.

3) Not sure if this is the right ticket or not(there's a lot going wrong here for one ticket), but in my particular case until today there seems to be 2 situations that you can get into depending who you allow javascript for.

i) Google and website :


palemoon ( 26.0.1 ) presents user with :


[ ] I'm not a robot reCAPTCHA

selecting this:
results in 100% cpu for awhile and then
"Cannot contact reCAPTCHA. Check your connection and try again."

100% of the time.

ii) Neither google *nor* website.

palemoon presents user with:

http://0bin.net/paste/uCUCp72l6EYqvVNA#6E+2Hi3izY37X+nQ-lNLzqZ5Jx9HukKLkjl/hrj3+Py

as you can see it's all garbled to hell. This wasn't the case ever before today. Until today we could get away with disabling javascript, and select the particular pictures, at least on my setup. The check boxes might be associated with boxes in order, but sometimes only 6 boxes show up, and as of yet I haven't been able to solve one since they started looking that way, though one page just happened to load without the CAPTCHA page just recently(?).

so really if you're fixing issues as they come up on your side, there are two broken use cases right there.

It would be nice if there was somewhere on *cloudflare*'s side to report specific tor-related client issues that wasn't behind the great cloudwall. That would be one way for Cloudfare to work together with the tor network, beyond having one thread on tor's side, wouldn't it?

comment:143 in reply to: ↑ 139 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

Replying to jgrahamc:

We are not just using Project Honeypot but it's an input. I'm using it here because all of us can look at their data and draw conclusions about any IP address or group of IP addresses.

It is hard to address abuse if we cannot understand what that word means.

I've said a few times what abuse means. It means things like SQL injection, comment spamming, harvesting email addresses and HTTP DoS that exploits slowness on a web server to knock it over.

What is the p value as asked above?

I think Project Honeypot data is more of an art than a just process or even a fully explained science.

They publish all their data and you can look up any IP to see what they've seen from that IP. I'm not sure how to make progress on this then. Do you have an alternative source of information that would help measure abuse coming through Tor?

Please reply to the analysis of the XFF dataset: Does CloudFlare censor the entire country of Vietnam as hard as it does to many Tor exit nodes?

comment:144 Changed 7 months ago by paxxa2

Here is a summary of some unaddressed points CloudFlare could come back to, if they are wondering how to continue to engage with this ticket:

  1. What kind of per browser session tracking is actually happening?
  2. What would a reasonable solution look like for a company like Cloudflare?
  3. What is reasonable for a user to do? (~17 CAPTCHAs for one site == not reasonable)
  4. Would "Warning this site is under surveillance by Cloudflare" be a reasonable warning or should we make it more general?
  5. What is the difference between one super cookie and ~1m cookies on a per site basis? The anonymity set appears to be *strictly* worse. Or do you guys not do any stats on the backend? Do you claim that you can't and don't link these things?
  6. Cloudflare asks: “Is it possible to prove that a visitor is indeed human, once, but not allow the CDN/DDoS company to deanonymize / correlate the traffic across many domains?” Answer and follow-up question: Here is a non-cryptographic, non-cookie based solution: Never prompt for a CAPTCHA on GET requests. For such a user - how will you protect any information you've collected from them? Will that information be of higher value or richer technical information if there is a cookie (super, regular, whatever) tied to that data?
  7. Let's be clear on one point: humans do not request web pages. User-Agents request web pages ... It might be true that there is some kind of elaborate ZKP protocol that would allow a user to prove to CloudFlare that their User-Agent behaves the way CloudFlare demands, without revealing all of the user's browsing history to CloudFlare and Google. Among other things, this would require CloudFlare to explicitly and precisely describe both their threat model and their definition of 'good behaviour', which as far as I know they have never done.
  8. How many people are actively testing with Tor Browser on a daily basis for regressions? Does anyone use it full-time?
  9. If I was logged into Google (as they use a Google Captcha...), could they vouch for my account and auto solve it? Effectively creating an ID system for the entire web where Cloudflare is the MITM for all the users visiting users cached/terminated by them?
  10. Regarding “What sort of data would qualify as an 'i'm a human' bit? Let's start with something not-worse than now: a captcha solved in last <XX> minutes.” – Is this something that CloudFlare has actually found effective? Are there metrics on how many challenged requests that successfully solved a CAPTCHA turned out to actually be malicious?
  11. I'd really like it if it was CAPTCHA free entirely until there is a POST request, for example. A read only version of the website, rather than a CAPTCHA prompt just to read would be better wouldn't it?
  12. CloudFlare is in a position to inject JavaScript into sites. Why not hook requests that would result in a POST and challenge after say, clicking the submit button? It seems reasonable in many cases to redirect them on pages where this is a relevant concern? POST fails, failure page asks for a captcha solution, etc.
  13. Actually, a censorship page with specific information ala HTTP 451 would be a nearly in spec answer to this problem. Why not use that?
  14. Why not just serve them an older cached copy?
  15. Do you have any open data on (“unfortunately many Tor exit IP's have bad IP reputation, because they _ARE_ often used for unwanted activity”)?
  16. CF asks: “What do we do to implement zero-knowledge proofs both on ddos protection side and on TBB side?” My first order proposition would be to solve a cached copy of the site in "read only" mode with no changes on the TBB side. We can get this from other third parties if CF doesn't want to serve it directly - that was part of my initial suggestion. Why not just serve that data directly?
  17. What about slowing down recurrent requests? it's really not something that can be solved on the Tor side.
  18. What kind of DoS can you guys possibly see through Tor? The network in total capacity has to be less than a tiny fraction of the capacity at *one* of your PoPs. Could you please give us actual data here? I've seen some basic CF API data - what is exposed seems to be quite minimal. As far as I can tell - the main data is score data that is from project honeynet. That has a lot of history that is extremely problematic in my view.
  19. Those at-risk parties are not just a matter of ethics, they are a source of surveillance capital for CloudFlare which is useful for generating so-called "threat" scores as well as other data. I assume that 0days found in that process are submitted to CERT, the same CERT that exploited Tor Hidden Service users, I might add.
  20. In short - those at risk services are paying for this protection with their user/attacker data which is extracted with surveillance by CloudFlare. It may be ethical in motivation but unless I completely misunderstand the monitoring by CloudFlare of its own network, it appears to be sustained with surveillance more than pure good will.
  21. Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely? That is, present a CAPTCHA only when:the server owner has specifically requested that CAPTCHAs be usedthe server is actively under DoS attack, and the client's IP address is currently a source of the DoS.
  22. Has CloudFlare considered other CAPTCHAs, or discussed reCAPTCHA's problems with Google?
  23. could the FBI go to Google to get data on all CloudFlare users? Does CF protect it? If so - who protects users more?
  24. Building the infrastructure for a zero-knowledge proof system sounds like a fascinating but expensive and long-term project. And I wouldn't be confident that CloudFlare would even adopt such a thing once it became available, unless they made a significant investment in the work at the beginning.
  25. Marek, do you have any thoughts about my suggestions for reducing CAPTCHA use in comment:17?
  26. What does attempting to prove "i'm-a-human" have to do with addressing DDoS attacks?
  27. Centralization ensures that your company is a high value target. The ability to run code in the browsers of millions of computers is highly attractive. The fact that CF and Google appear to both appear in those captcha prompts probably ensures CF isn't even in control of the entirety of the risk. Is it the case that for all the promises CF makes, Google is actually in control of the Captcha - and thus is by proxy given the ability to run code in the browsers of users visiting CF terminated sites?
  28. Should we be reaching out to Google here?
  29. Is (Project HoneyNet) data the reason that Google's captchas are so hard to solve? (stated answer “I don't know if there's any connection between Project Honeypot and Google's CAPTCHAs” is not an answer).
  30. How do we vet this information or these so-called "threat scores" other than trusting what someone says?
  31. Are you convinced that (offering up a read only page) is strictly worse than the current situation? I'm convinced that it is strictly better to only toss up a captcha that loads a Google research when a user is about to interact with the website in a major way.
  32. Does that mean that Google, in addition to CF, has data on everyone hitting those captchas?
  33. When a user is given a CF captcha - does Google see any request from them directly? Do they see the Tor Exit IP hitting them? Is it just CF or is it also Google? Do both companies get to run javascript in this user's browser?
  34. Could run the exact same test against all Comcast IP addresses aggregated as just once or another significant ISP?
  35. How are you handling CGNs so far?
  36. So what happens if me (as a site/server admin) don't need this (or part of this).
    Specifically:
    As a server admin, if my site is not under DDoS (or spam) attack, then its visitors should not get the captcha challenge.
    As a server admin I should be able to choose if I want this kind of protection and potentially completely disable it.
    As a server admin, I want more sane defaults (lower security level).
  37. CAPTCHAs are a fundamentally untenable solution to dealing with DDOS attacks. Algorithmic solutions will always catch up to evolving CAPTCHA methods. CloudFlare and other service providers should recognize that is the inevitable direction technology is going and abandon it now. An alternate solution is a client proof-of-work protocol.
  38. Why does CloudFlare not run a .onion proxy for their sites?
  39. If all Tor exits are known, why isn't there even a control panel option for customers to say "Okay, I know Tor traffic is good, allow it unconditionally"? → Answer: “We will add this feature. Our customers will be able to 'whitelist' Tor so that Tor users visiting their web sites will not be challenged. This feature is coded and will be released shortly.” Follow up: Will that be the new default until a site decides to actively block Tor?
  40. So my question is to Cloudflare, their CTO was on here earlier. Why exactly are you not able to just implement a Captcha system that works?? Seriously, is it that hard? As far as I know, you have recently moved over to serving up Google captchas, but it still doesn't work? Is CF's CTO really OK and comfortable with the fact that his team couldn't implement this after apparently trying for a few years? Seriously!! Captcha's as a concept have been around for a pretty long time now.
  41. I always find it a demeaning and insulting attitude towards humans that we are being asked by rooms full of servers (which handle enormous amounts of requests and should be able handle the few extra ones coming from Tor exits without breaking an electronic sweat, honestly) to solve puzzles. I am very angry about this attitude btw because my time is infinitely more valuable than your servers'. Being treated as a CAPTCHA-solving bot makes people angry, understand?
  42. Fantastic to hear that you are experiencing the same issues (CAPTCHA loops) as the rest of us. How do we ensure that it not only gets fixed but that it also never is left to our end users alone to detect these kinds of issues?
  43. So Mr. jgrahamc, are captcha's part of Government Technology? Why is javascript so necessary. Do you measure per click reaction time? Do you correlate it with previous data sets? With enough signal gathered, can you then establish unique profiles of people?
  44. So how much would these customers care for the anonymous eyeballs of a relatively small group (in relation to the rest of the "net") of privacy-active users of a technology that attempts to destroy their βusiness model? Isn't this also your βusiness model, Cloudflar? Isn't this the very thing you do with the traffic from all those sites you MITM? I wonder who do you sell to though, hmm...
  45. My concern about Google is not that people should not be free to use their services - it is that CF *colludes* with Google when a user has not at all consented. How many server operators know that the CAPTCHA is hosted by Google, when they use CF for "protection" services? All of them? None of them? Did anyone get a choice? Tor users certainly did not get a choice when they are automatically flagged based on an IP reputation system and then redirected to Google.
  46. CF says: “We fixed the bug that caused a new CAPTCHA to be served for a site when the circuit changes.” Doesn't this mean that you've now got cross circuit tracking for Tor Browser users, effectively? I assume that is by issuing a cookie that isn't tied to a given IP address - though again without any transparency, I feel like it is unclear what was actually done in any technical sense.
  47. CF says: “We've reproduced the "CAPTCHA loop" problem and have an engineer looking into what's happening.” Is there a timeline for this? Will they report back on this bug?
  48. Does this indeed mean that Google, because of actions by CF, has data on every person prompted for a CAPTCHA?
  49. Any American third party presents similar problems as Google. On the one hand, they are a PRISM provider. On the other, they probably have the best security team in the world. Why aren't you guys just hosting your own CAPTCHA solution or proxying it to Google in such a way that Google gets nothing directly from your users?
  50. Does this fixed CAPTCHA record users' reaction time, order of clicks, or mouse movements? CF Answers: “The fix I am talking about does not involve JavaScript or any of those things at all.” Follow up: The subject of my repeated question is the ability of CF CAPTCHA to capture response of users such as mouse movements, reaction time, order of checkbox selection. Is this kind of information transferred out of users device by submitting a CF CAPTCHA?
  51. It would be nice if this wasn't a closed discussion (at Cloudflare) with answers thrown over the wall. How can we include other people in these discussions?
  52. Why is CF even blocking Tor on sites that don't historically receive abusive traffic from Tor IPs in the first place? The "whitelist tor IPs" thing should be the default on all sites and only turned off when significant abusive traffic patterns are detected from Tor Ips.
  53. I'd also seriously look into how you are addressing DDoS from the network layer (specifically your edge router/firewall/load balancing configurations), how you scale your client infrastructure elastically, and specifically how you define your threat model. Two subpoints: your own engineer has admitted that captcha is a terrible isn't a salable way to address this problem, stating "we struggle to even serve captchas" [edit: while under attack]. So I'd challenge that this is an effective solution for DDoS.
  54. I'm with several others here seriously questioning the SNR and throughput constraints around blanket allowance of Tor infrastructure. It's like using a hatchet to remove a fly from your friends forehead. Small problem, oblique solution. Please remember that exit nodes are communal, so pretend for example that every time you wanted to blacklist a /32 ipv4 address, instead you were blacklisting an entire /24 public network.
  55. Could you please make some comparisons of the abuse in question? Is CloudFlare really just using Project Honeynet data here?
  56. What is the p value as asked above?
  57. Please reply to the analysis of the XFF dataset: Does CloudFlare censor the entire country of Vietnam as hard as it does to many Tor exit nodes?
  58. Another comment about the broken CloudFlare captchas is that they're always in English for me. Is that always the case? For those who don't speak English, they're even more confused when they are censored with a looping and thus broken captcha security solution...?

comment:145 in reply to: ↑ 141 Changed 7 months ago by lunar

Replying to torhp:

To summarise:

Starhub 1(Singapore): 40
Starhub 2(Singapore): 26
Vietnam: 57
Thailand: 30

Tor Fast Exit 1: 50
Tor Fast Exit 2: 42
Tor Fast Exit 3: 40
Tor Fast Exit 4: 41

Limited samples not withstanding, the results are pretty interesting.

Indeed. Thanks a lot for your research!

comment:147 Changed 7 months ago by madD

BIOMETRICS ALERT
An eye-opening article by a data mining researcher, Igor Savinkin, http://scraping.pro/no-captcha-recaptcha-challenge/
says:
"For this new type of CAPTCHA the main evidence will be browser behaviour, rather than check box value.
mouse movement, its slightness and straightness
page scrolls
time intervals between browser events
keystrokes
click location history tied to user fingerprint

All these criteria, are stored in the browser’s cookie. These criteria are processed by Google’s server"

He also states that the communication between the user and Google server is encrypted.

It should be emphasized, that there is a DARPA technology to identify people by mouse movements and typing http://www.itnews.com.au/news/users-ided-through-typing-mouse-movements-365221. In 2013 this technology was "being extended to capture mouse movements and touch inputs from mobile devices".

CloudFlare highly probably is an accomplice in a mass biometrics collection and deanonymization service. There is no wonder that we got GLOMARed by their CTO on most occasions before he went silent completely. This program surely violates privacy laws, at least in Europe, because the users get no warning that their bodily movements are recorded and sent for analysis in the USA.
I wonder under which legal frame conducts CloudFlare this intelligence operation in the EU, is it under never-to-be-defunct Safe Harbor agreement? Or does CF have a special agreement, does anybody know?

CAPTCHA must be understood as CAPTURE&GOTCHA you bloody data-slaves!

comment:148 follow-ups: Changed 7 months ago by cypherpunks

Is CloudFlare trying to protect against anything besides these 4 categories of unwanted traffic?

  1. Comment Spam
  2. DDOS
  3. Vulnerability scanning
  4. Crawling

Of course different customers probably have different priorities, but it seems to me like captchas on GET requests are mostly useful for stopping crawling. (Comment spam is POST, DDOS over Tor would be stupid as botnets are cheap and more effective, and for vuln scanning it should be pretty easy to have high confidence that typical requests are legit using simple heuristics.)

So whats up?

Can any CloudFlare person explain if there are more than these 4 categories of unwanted traffic?

And what portion of CloudFlare customers are anti-web-crawling?

comment:149 in reply to: ↑ 142 Changed 7 months ago by jgrahamc

Replying to SatoshiNakamoto:

It would be nice if there was somewhere on *cloudflare*'s side to report specific tor-related client issues that wasn't behind the great cloudwall. That would be one way for Cloudfare to work together with the tor network, beyond having one thread on tor's side, wouldn't it?

There's CloudFlare's regular support, but, to be honest, there are lots of people (including me) reading this thread and suggestions made in it (and the complaints) and working internally to address them. I'm happy for folks to add specific problems they've seen (e.g. on the "Banana Wumpus 4.2 Mobile Browser the CAPTCHA wasn't rendered") so I can get them into our bug system. I've received a few reports like that already.

comment:150 in reply to: ↑ 82 ; follow-up: Changed 7 months ago by cypherpunks

Replying to jgrahamc:

  1. We'll roll out the ability for any CloudFlare web site to whitelist Tor so that Tor users will not see CAPTCHAs within days.

For the free tier too? Someone told me recently that they have fewer controls, which would make sense, but if that means that they're stuck with the default tor-blocking policy then that is obviously a big problem.

Also, another repeated question which I haven't seen CloudFlare answer yet so I'll restate it here: are you or are you not selling or otherwise sharing the very valuable analytics data you're in an ideal position to collect? Specific clickstream data, perhaps? Or maybe something derived from it, like "people who go to this website also go to this other website"?

comment:151 in reply to: ↑ 148 Changed 7 months ago by cypherpunks

Replying to cypherpunks:

Can any CloudFlare person explain if there are more than these 4 categories of unwanted traffic?

Following up on this question, what specific type of unwanted traffic is CloudFlare attempting to protect *itself* against when it just served me a captcha in response to a GET / request to https://cloudflare.com/ ?

comment:152 in reply to: ↑ 150 Changed 7 months ago by jgrahamc

Replying to cypherpunks:

Replying to jgrahamc:

  1. We'll roll out the ability for any CloudFlare web site to whitelist Tor so that Tor users will not see CAPTCHAs within days.

For the free tier too? Someone told me recently that they have fewer controls, which would make sense, but if that means that they're stuck with the default tor-blocking policy then that is obviously a big problem.

It is true that free customers have fewer controls, but they will be able to whitelist Tor.

Also, another repeated question which I haven't seen CloudFlare answer yet so I'll restate it here: are you or are you not selling or otherwise sharing the very valuable analytics data you're in an ideal position to collect? Specific clickstream data, perhaps? Or maybe something derived from it, like "people who go to this website also go to this other website"?

We are not.

We've written about our logging in the past (https://blog.cloudflare.com/what-cloudflare-logs/) and I gave a talk about that (http://www.thedotpost.com/2015/06/john-graham-cumming-i-got-10-trillion-problems-but-logging-aint-one). Also here's our transparency report: https://www.cloudflare.com/transparency/

Tor users may notice changes starting today as I altered the speed with which we download the Tor exit node list to make sure that it's more up to date and we are rolling out fix for a caching bug that was identified (see earlier) and affected identifying an IP as a Tor exit node.

comment:153 follow-up: Changed 7 months ago by ioerror

Thanks for continuing to engage jgrahamc. Many of us, myself included, would really like to see the list or the general issued raised in comment:144 addressed. I'm especially keen to see how Tor exits compare with the XFF proxy IP addresses; are any of those IP addresses being treated specially?

comment:154 in reply to: ↑ 153 ; follow-up: Changed 7 months ago by jgrahamc

Replying to ioerror:

Thanks for continuing to engage jgrahamc. Many of us, myself included, would really like to see the list or the general issued raised in comment:144 addressed. I'm especially keen to see how Tor exits compare with the XFF proxy IP addresses; are any of those IP addresses being treated specially?

Plan to keep engaging, but I'm concentrating most of my effort on solving the problem (i.e. keeping our customers safe while making life better for Tor users). Working on statistics.

comment:155 Changed 7 months ago by freetheinternet

Last edited 7 months ago by freetheinternet (previous) (diff)

comment:156 Changed 7 months ago by SatoshiNakamoto

The *immediate* problem, at least in my particular case( palemoon ( 26.0.1 )), with javascript *disabled* appears to have been resolved, at least momentarily. *With* javascript enabled,

"the page at http://www.google.com says:
Cannot contact reCAPTCHA. Check your connection and try again."

After spinning the CPU up for awhile. This means I can get away with disabling javascript, but others who don't know to do that will be effectively blocked at that poinnt.

comment:157 in reply to: ↑ 154 Changed 7 months ago by ioerror

Replying to jgrahamc:

Replying to ioerror:

Thanks for continuing to engage jgrahamc. Many of us, myself included, would really like to see the list or the general issued raised in comment:144 addressed. I'm especially keen to see how Tor exits compare with the XFF proxy IP addresses; are any of those IP addresses being treated specially?

Plan to keep engaging, but I'm concentrating most of my effort on solving the problem (i.e. keeping our customers safe while making life better for Tor users). Working on statistics.

Thank you very much for the update.

comment:158 Changed 7 months ago by jgrahamc

The ability to whitelist Tor exit nodes has been rolled out but not announced yet (the marketing/support folks need to write up the documentation etc.) but I've enabled it on one of my domains and not on another so that people can test.

http://plan28.org: Tor exit nodes are whitelisted

http://jgc.org: Tor exit nodes are not whitelisted (CloudFlare default handling).

So, you should see no CAPTCHAs on plan28.org but CAPTCHAs on jgc.org. You should only get a CAPTCHA the first time you visit jgc.org unless you use a new Tor identity. Appreciate bug reports; I've been testing using the TorBrowser and repeatedly switching to a new circuit and it seems OK but this is very beta right now.

If you do see a CAPTCHA when switching circuits it would be handy to know the IP address of the exit node and the UTC time so I can see if it's caused by a bug or us not having an up to date list of exit nodes.

comment:159 Changed 7 months ago by jgrahamc

Also, I was looking at how often we update our exit node list and wrote a little program to visualize the coming and going of nodes. The code is here: https://github.com/jgrahamc/torexit and I've been running a cron job that does this every 15 minutes:

curl -s -o $HOME/torhoneydata/exitlist_`date --utc +%s` https://check.torproject.org/exit-addresses

Here are 24 hours of that data (columns are exit nodes, rows are 15 minute increments). White means that the exit node did not appear in the exit node list when that 15 minute cron job ran. So you can see the coming and going of nodes.

https://i.imgur.com/qQH09pz.png

comment:160 Changed 7 months ago by madD

Webmasters who use Google's reCaptcha are reporting since 24th February unannounced changes of server response messages, and disruptions in service. Needless to say, Google does not react on their complaints in support forum. groups.google.com/forum/?_escaped_fragment_=topic/recaptcha/5zlo4DpqhWI#!topic/recaptcha/5zlo4DpqhWI
CloudFlare with their millions of users do not complain into that forum at all, isn't it strange?

Neither FAQ for reCaptcha / Google Privacy / Google Terms contain information about biometrics capture by reCaptcha v2. If this is the case, Google knows that it does so illegaly. Implicitly, CloudFlare could publicly deny knowing it, to shrug off accusations. However, they don't deny it. Why? Non disclosure agreement of some kind?

While CloudFlare are willing to discuss the reCaptcha trigger conditions, they won't talk about the actual reCaptcha v2 code. Non the less there must be some exchange with Google, because of the unannounced reCaptcha v2 changes ever since this ticket got attention.

comment:161 in reply to: ↑ 41 Changed 7 months ago by cypherpunks

Replying to ioerror:

Replying to jgrahamc:

I'm not convinced about the R/O solution. Seems to me that Tor users would likely be more upset the moment they got stale information or couldn't POST to a forum or similar. I'd much rather solve the abuse problem and make this go away completely.

Are you convinced that it is strictly worse than the current situation? I'm convinced that it is strictly better to only toss up a captcha that loads a Google research when a user is about to interact with the website in a major way.

+1000

comment:162 in reply to: ↑ 119 ; follow-up: Changed 7 months ago by cypherpunks

Replying to jgrahamc:

Sorry for the double post there. Got stuck in a loop of CAPTCHAs on this site and was unable to submit.

Is this a joke? There are no captchas here.

comment:163 follow-up: Changed 7 months ago by cypherpunks

Thank you for the new possibility to whitelist Tor, jgrahamc.

An argument I have often seen raised, acknowledged, but then silently dropped over the last year was the one of the read only option, though. The arguments made for delivering the contents via onion services were sound as well. If Facebook can do it why shouldn't you?

comment:164 in reply to: ↑ 162 Changed 7 months ago by cypherpunks

Replying to cypherpunks:

Replying to jgrahamc:

Sorry for the double post there. Got stuck in a loop of CAPTCHAs on this site and was unable to submit.

Is this a joke? There are no captchas here.

Regrettably there are and they don't work very well. The 'trac' software uses google captchas in some situations (for example when one of its cranky filters trips on a word it doesn't like and logically concludes your comment is 'spam'). There are a number of tickets about this.

comment:165 in reply to: ↑ 148 Changed 7 months ago by cypherpunks

Replying to cypherpunks:

Is CloudFlare trying to protect against anything besides these 4 categories of unwanted traffic?

@jgrahamc can you enlighten us?

comment:166 in reply to: ↑ 163 ; follow-ups: Changed 7 months ago by jgrahamc

Replying to cypherpunks:

Thank you for the new possibility to whitelist Tor, jgrahamc.

An argument I have often seen raised, acknowledged, but then silently dropped over the last year was the one of the read only option, though. The arguments made for delivering the contents via onion services were sound as well. If Facebook can do it why shouldn't you?

On the R/O mode I'm mostly opposed to working on it because I've got X engineering resources and I'd rather spend them on a solution that allows legitimate Tor users 'normal' access to the web and not some special mode. I think Tor users are better off and CloudFlare a stronger company if I do that.

We've debating internally offering .onion addresses to our customers and/or running exit nodes just for our customer base. Currently there's no work happening on this but neither are out of the question (they've just tended to get prioritized far down the list).

comment:167 in reply to: ↑ 166 Changed 7 months ago by madD

Cloudflare now blocks 60-80% of the exits
research by Sadia Afroz, University of Berkeley, https://twitter.com/sheetal57/media
http://eecs.berkeley.edu/~sa499/

https://pbs.twimg.com/media/CcGst-OUkAA0qxc.jpg:large

Last edited 7 months ago by madD (previous) (diff)

comment:168 Changed 7 months ago by cypherpunks

The most poisonous thing about those graphical captchas is having to interpret a constant barrage of ontological corner cases from the viewpoint of some silly machine learning algorithm, like is the post holding a sign part of that sign

And anyway at least 20% seem to be Completely Automated Public Turing tests to tell Computers and Humans who don't see thin street sign slices or that other street sign in the background Apart

Last edited 7 months ago by cypherpunks (previous) (diff)

comment:169 Changed 7 months ago by strcat

Hi jgrahamc,

As a CloudFlare user, I've noticed that there are many users encountering the captchas without the events appearing in the traffic log. I don't think sites have any idea how aggressive the feature is by default because it seems that only 1/100 events ends up in the traffic log. I think you should consider shipping it as Essentially Off by default and allowing sites to opt-in to more aggressive checking if they really have anti-spam or anti-DoS problems. Most sites do not have those problems. They don't allow anonymous comments and already have captchas for registration. They may encounter spikes in spam or a DoS attack, in which case it makes sense to crank up that setting. If your customers realized how many users they were losing, they would be upset.

comment:170 Changed 7 months ago by strcat

Some irony: Trac decided my comment was probably spam and had me fill out a recaptcha.

comment:171 follow-up: Changed 7 months ago by aperture

As a website owner, I had to take the somewhat difficult decision to block Tor on some services in order to minimize disruption. Creating a special read-only or restricted mode for Tor users were not feasible as I have engineering time constraints. I suspect this is fairly common.

Fundamentally, site owners typically rely on identifiers like IP, email, CAPTCHA, etc to weakly identify users. Each of these resources have a small cost, and hence blocking abuse is a possibility as there is a cost to abuse. Tor removes these identification vectors, making individual blocking unfeasible.

This is not a question of humanness or the turning test. It's about introducing a progressive cost to privileged actions (whether that's creating an account, posting on a forum, etc) that has zero monetary cost for the user.

To resolve this problem, there needs to be an easy way (both from a site owner, and a user's perspective) of applying a cost to privileged actions, when conventional identification methods do not work. One option is bitcoin micropayments, which is already being done on many bitcoin-related sites with good success. Bitcoin isn't accessible to the vast majority of people though.

Another more promising option is proof of work. Unfortunately PoW heavily tilts in the favor of botnets, spammers running a Xeon, etc. Decentralized and possibly zero knowledge identity is what appears to the most promising solution.

In the interim, I think resolving CAPTCHA loop issues on Tor is a good fix. 1 CAPTCHA per site is too much, but it's better than nothing.

As for the read only concept, I just don't think it'd work. Many modern web sites submit data with AJAX post requests or websockets; you can't intercept that and return a CAPTCHA. <form> for POST is getting rarer and rarer, and whatever cloudflare does needs to work for almost every site; not just "some sites" or even "a majority of sites".

@jgrahamc: I'm glad to see the whitelist tor option. This has certainly made me consider re-subscribing to Cloudflare Business for one of my sites.

comment:172 Changed 7 months ago by aperture

FYI, I encountered a 8 recaptcha loop while signing up here! I entered the wrong captcha initially, and I think the "redirect on success page" got set to /projects/tor/captcha. When I successfully completed the CAPTCHA, I still got redirected to the captcha page.

While I'm talking about reCAPTCHA, it is pretty much an open secret that No CAPTCHA works by collecting as much information correlated to 'bot or not' as possible, and feeding it to a neural network. Google has a nice source for 'bot or not' by looking at deactivations of the Google account registration flow.

Such a system cannot be easily repurposed to identify users -- the response is a 'bot or not' chance. I doubt they'd want to anyway when most people are signed into their Google accounts.

Last edited 7 months ago by aperture (previous) (diff)

comment:173 follow-up: Changed 7 months ago by polyclef

Cloudflare is using recaptcha. Recaptcha has been broken for years.

http://bitland.net/captcha.pdf describes how to defeat captchas in general and a prior version of recaptcha in particular.

Many challenges are overly simple and can be broken with existing tools that require no further modification.

Example:

http://bitland.net/recaptcha-001.jpg

can be solved with tesseract

% tesseract recaptcha-001.jpg stdout -psm 4
outputs
1307

Other challenges may require simple processing such as erode/dilate, etc

comment:174 follow-up: Changed 7 months ago by cypherpunks

@jgrahamc

Has anyone at CF looked at the captcha bugs when browsing with Orfox yet? Still broken :(

comment:175 Changed 7 months ago by madD

@jgrahamc
WHO say that there are estimated 285 million people visually impaired worldwide: 39 million are blind and 246 million have low vision. If they want to enjoy the privacy of VPN or Tor, they are forced to take an Audio challenge (javascript only), in order to decrypt crappy telephone recordings of numbers (English only).

Furthermore, people with motorics handicap, like Stephen Hawking, can't provide human-like mouse movements for reCAPTCHA v2, but maybe don't know that they should elaborate with javascript to get static challenge.

What will CloudFlare advise to all of them?

#ttp://www.who.int/mediacentre/factsheets/fs282/en/

Last edited 7 months ago by madD (previous) (diff)

comment:176 follow-up: Changed 7 months ago by aperture

@polyclef: It's well known that the "simple reCAPTCHAs" like image classification and street number OCR relies on being logged into a Google Account and accessing from an IP with high reputation.

reCAPTCHA isn't broken; it has been a behavioural based system for ages. The purpose isn't really the actual input, but rather your behaviour entering the obvious input. Try $("input").val("1307"); and see how far you get (hint: not very much).

@madD: I think this is an issue you need to take up with Google, unfortunately. As many faults as reCAPTCHA has, there's no alternative. Rolling your own CAPTCHA takes years of effort and can have a high risk initially.

Last edited 7 months ago by aperture (previous) (diff)

comment:177 in reply to: ↑ 176 Changed 7 months ago by cypherpunks

Replying to aperture:

@polyclef: It's well known that the "simple reCAPTCHAs" like image classification and street number OCR relies on being logged into a Google Account and accessing from an IP with high reputation.

No, that's wrong. Why comment if you're clueless?

As many faults as reCAPTCHA has, there's no alternative. Rolling your own CAPTCHA takes years of effort and can have a high risk initially.

WTF? Both of those statements are outrageously false. What gives?

comment:178 in reply to: ↑ 174 Changed 7 months ago by jgrahamc

Replying to cypherpunks:

@jgrahamc

Has anyone at CF looked at the captcha bugs when browsing with Orfox yet? Still broken :(

Not sure. I need to check in with the SF office on that.

comment:179 in reply to: ↑ 171 Changed 7 months ago by jgrahamc

Replying to aperture:

@jgrahamc: I'm glad to see the whitelist tor option. This has certainly made me consider re-subscribing to Cloudflare Business for one of my sites.

Glad to hear it.

comment:180 in reply to: ↑ 166 ; follow-up: Changed 7 months ago by cypherpunks

Replying to jgrahamc:

Replying to cypherpunks:

Thank you for the new possibility to whitelist Tor, jgrahamc.

An argument I have often seen raised, acknowledged, but then silently dropped over the last year was the one of the read only option, though. The arguments made for delivering the contents via onion services were sound as well. If Facebook can do it why shouldn't you?

On the R/O mode I'm mostly opposed to working on it because I've got X engineering resources and I'd rather spend them on a solution that allows legitimate Tor users 'normal' access to the web and not some special mode. I think Tor users are better off and CloudFlare a stronger company if I do that.

Of course we all want the perfect solution in happy rainbow land, but let's face it, allowing read-only of the cache will take about 5% of the resources that the "proper" solution would take and make 95% of users happy. I would consider this a good first step in the right direction. It would also take a lot of pressure from the recaptcha issues, and might rather increase the elasticity in your resource planning instead of occupying more resources.

We've debating internally offering .onion addresses to our customers and/or running exit nodes just for our customer base. Currently there's no work happening on this but neither are out of the question (they've just tended to get prioritized far down the list).

That sounds promising. Maybe a collaboration with Tor developers is an option here? They also have priority lists, but I guess the Cloudflare issues are rather higher on their list than the Tor issues on yours.

comment:181 follow-up: Changed 7 months ago by SatoshiNakamoto

madD : whatever script you just ran to make that pretty graph, either
a) would you be interested in running it on https://pad.okfn.org/p/cloudflare-tor periodically
b) or providing the world with the source code so we can?
thanks.

comment:182 in reply to: ↑ 180 ; follow-up: Changed 7 months ago by jgrahamc

Replying to cypherpunks:

Of course we all want the perfect solution in happy rainbow land, but let's face it, allowing read-only of the cache will take about 5% of the resources that the "proper" solution would take and make 95% of users happy. I would consider this a good first step in the right direction. It would also take a lot of pressure from the recaptcha issues, and might rather increase the elasticity in your resource planning instead of occupying more resources.

I'm not sure how you come up with the 5% number but I think you underestimate how complicated deciding what R/O is in the web. Plenty of attacks come through GET requests. Doing the R/O mode seems like a nasty hack.

comment:183 Changed 7 months ago by cypherpunks

How nice it is for CloudFlare to work with us, thank you very much. It's nice to have a global active adversary that listens to us. One has to wonder for how long this will last, perhaps a few years, perhaps a decade? How long will it be until CloudFlare is eaten by the bigger fish, or bought out by a company like Google? Money trading hands decides our fate here, we'd be fools to think we can fix this.

The facts are that online we have authorities like Wikipedia, Gutenberg, CloudFlare, Google that decide what we can read or write based on what information we give. How did we even get to this point? We need to drastically change this structure, and we need to do it before things get worse. Not by asking, but by doing with or without permission.

To throw an idea out, let's mirror popular sites (Wikipedia, Gutenberg, news sites behind CloudFlare) using scraping tools and integrate our own GET cache using Tor Browser using IPFS or something similar.

comment:184 in reply to: ↑ 182 ; follow-up: Changed 7 months ago by cypherpunks

Replying to jgrahamc:

I'm not sure how you come up with the 5% number but I think you underestimate how complicated deciding what R/O is in the web. Plenty of attacks come through GET requests. Doing the R/O mode seems like a nasty hack.

To me R/O would be delivering the cache that you have. The request would never see the actual website. This would also discourage adversaries that repeatedly pull websites to have an automated advantage at idk ticket sales as the cache does not have to be the most recent.

I mean seriously. How hard can it be to deliver the cache instead of a captcha? I can't imagine that this takes one of your junior software engineers more than two hours to implement and then a day to deploy. But please give us better estimates, so we have an idea of what we are actually demanding here.

comment:185 in reply to: ↑ 184 ; follow-up: Changed 7 months ago by jgrahamc

Replying to cypherpunks:

Replying to jgrahamc:

I'm not sure how you come up with the 5% number but I think you underestimate how complicated deciding what R/O is in the web. Plenty of attacks come through GET requests. Doing the R/O mode seems like a nasty hack.

To me R/O would be delivering the cache that you have. The request would never see the actual website. This would also discourage adversaries that repeatedly pull websites to have an automated advantage at idk ticket sales as the cache does not have to be the most recent.

There are a lot of assumptions here. For example, this assumes that we have all the pages in cache and all the assets. It assumes that web pages can be displayed without any POSTs happening (so nothing dynamic at all).

In addition it ignores what happens if a Tor user comes to CloudFlare and we don't have the item in cache, or the item is outdated.

This idea just kicks the ball down the line. The right solution is to allow Tor users who are not behaving in a malicious manner 'normal' access to the web.

comment:186 in reply to: ↑ 185 Changed 7 months ago by wwaites

Replying to jgrahamc:

This idea just kicks the ball down the line. The right solution is to allow
Tor users who are not behaving in a malicious manner 'normal' access to the
web.

Well no not really. Cloudflare is fundamentally the *wrong* solution for
security, because it is about putting a bubble around broken and vulnerable
web sites. This is easier than fixing the actual problems so it is attractive.

This is quite apart from what might be the *right* use of Cloudflare or other
CDNs (modulo surveillance) which is efficiently delivering data from as close
to the edge as possible, and coincidentally being able to sink volumetric
DDoS attacks.

The problems with Tor arise, near as I can tell, almost exclusive from the
former. It's just bad architecture.

comment:187 in reply to: ↑ 181 Changed 7 months ago by madD

Replying to SatoshiNakamoto:

madD : whatever script you just ran to make that pretty graph, either
a) would you be interested in running it on https://pad.okfn.org/p/cloudflare-tor periodically
b) or providing the world with the source code so we can?
thanks.

I'm not the author, never claimed so. Below the image was a link to the source, @sheetal57, now it's above.
It's a good idea, that script should run periodically. However, I'd not disclose the code in order to avoid fingerpriting by CloudFlare. If the target list & sequence & timing are fixed in the code, it'd be easy for CF to make false positives, i.e. pass no captchas to script bot + a few tor users who happen to access target domain in the same timeslot as the script. It'd need proper randomization first. If you write to @sheetal57, tell her please.

comment:188 Changed 7 months ago by cypherpunks

I have noticed several times that one of the many Google resources required for recaptcha will get blocked by a Google captcha. There is no indication of this on the Cloudflare captcha page, I have to try all the urls to see which one is getting blocked. In the short term Cloudflare could improve reliability by either detecting this or by getting Google not to throttle their captcha resources.

comment:189 follow-ups: Changed 7 months ago by jgrahamc

The feature to whitelist the Tor network has been shipped and is documented here: https://support.cloudflare.com/hc/en-us/articles/203306930

comment:190 follow-up: Changed 7 months ago by yawning

Proof of concept, if you actually use this for anything other than "testing said proof of concept" you get what you deserve. The README.md has dire warnings about reduction in anonymity, and I will point and laugh at people that have bad things happen to them.

https://git.schwanenlied.me/yawning/cfc

comment:191 in reply to: ↑ 189 Changed 7 months ago by cypherpunks

Replying to jgrahamc:

The feature to whitelist the Tor network has been shipped and is documented here: https://support.cloudflare.com/hc/en-us/articles/203306930

Captcha'd trying to access this page. Try enabling the whitelist?
Edit: I got past after changing circuits three times so maybe that counts as an improvement?

Last edited 7 months ago by cypherpunks (previous) (diff)

comment:192 in reply to: ↑ 190 ; follow-up: Changed 7 months ago by cypherpunks

Replying to yawning:

Proof of concept, if you actually use this for anything other than "testing said proof of concept" you get what you deserve. The README.md has dire warnings about reduction in anonymity, and I will point and laugh at people that have bad things happen to them.

https://git.schwanenlied.me/yawning/cfc

archive.li (which is what archive.is serves all the images from) is using cloudflare now :(

comment:193 in reply to: ↑ 189 Changed 7 months ago by cypherpunks

Replying to jgrahamc:

The feature to whitelist the Tor network has been shipped and is documented here: https://support.cloudflare.com/hc/en-us/articles/203306930

Thanks! Will you enable this feature for support.cloudflare.com, cloudflare.com, etc?

comment:194 follow-up: Changed 7 months ago by samlanning

I've been thinking over this problem for a number of days now, and think I may have come to a solution that is somewhat of a Compromise.

(I've written this up in more detail as a blog post over at https://samlanning.com/blog/the_tor_cloudflare_problem/ that I'd love critique on).

But here's the important bit:

This idea requires work from both the Tor developers (specifically those who work on TBB), and the CloudFlare developers.

The User Experience

For non Tor users, or Tor users using an older TBB, the experience is unchanged. Older Tor users will still have to use a Captcha, which will grant them full access to a website as is currently done now. For users using the latest TBB, upon landing on a website protected by CloudFlare, they will see something like this:

https://i.imgur.com/zWhSuTg.png

Note: the wording in this screenshot is by no means final.

Now the user can choose to either ignore the warning, dismiss it, or click "Prove You're Human". Ignoring the warning will allow the user to continue using the site in a Read Only mode; here I think the most appropriate Implementation would be to use Cached-Only pages (not sending any requests on to the server). For any cache misses it can display the Captcha.

Now when a user submits a form, the page will remain in a "loading" state while a new tab is opened and focused for the user to complete a Captcha. (We could optionally have the same warning displayed on this page, but without the button or dismiss icon). Once the user has completed the captcha, the tab will close and the existing (paused) tab will continue (actually make the request).

A similar thing would happen for any AJAX or WebSocket requests, the request would be paused until a Captcha is completed in a separate tab or window.

This would allows for, I think, the minimum amount of friction for performing any particular task on a website, requiring a Captcha only when necessary, and indicating to a user that they are viewing a reduced-functionality version of a website.

A Technical Implementation

On the TBB side, the browser would need to indicate that it supports this "prove human" functionality by way of either User-Agent, or by specifying a particular header. For example, along with the request, it could send X-Human-Proof: Available.

The CloudFlare server, upon receiving a request, if:

  • The threat level has been determined as "CAPTCHA"
  • The user agent supports the "Human Proof" feature (i.e. has the appropriate X-Human-Proof header).
  • There is no cookie set for the Captcha (no existing proof-of-human).
  • The request is a GET.
  • The requested URL is cached.

Then return the cached contents, along with a header like X-Human-Proof-Required: <some URL to visit for Captcha>. In any other situation, behave as normal. (Note: the URL will need to be for the same domain as the request, so site-relative probably will make most sense, i.e. starting with /)

The TBB, upon seeing a response with the header X-Human-Proof-Required, will mark any domains that return this as "requiring human proof" (for the given session), and for any pages whose URL contains a domain in this list, display the bar shown in the screenshot (unless it's already been dismissed).

Now when any non-GET request is made to a domain marked as "requiring human proof" (whether AJAX, WebSocket or otherwise), pause the request, and open a new tab to the URL required (given in the X-Human-Proof-Required header). Wait for a response from the given domain that does not contain the X-Human-Proof-Required header, then continue the paused request (actually send the request to the server).

Future Improvements

This would give us a good foundation for building on iterative UX improvements, and improving mechanisms for how user agents prove to servers that they are being operated by humans. From here we could:

  • Submit an RFC for these headers, and try and make an official spec for the behaviour.
  • Make these changes in the client (handling of headers, pausing requests, opening challenge in new tab etc...) upstream, and across other browsers.
  • Iteratively improve the UI, such as displaying a blocking-dialog on any pages that are waiting on a captcha (or other challenge) to be completed.
  • Encourage websites that don't use CloudFlare, but block tor exit nodes to instead behave in this manner.

Potential Issues

The biggest issue I see with this solution is that it would require some non-trivial engineering effort from the Tor developers. For CloudFlare, I feel that this engineering effort would be comparatively less difficult. But I honesty feel it would pay off.

Another thing I did think of is that this mechanism may encourage website operators to more eagerly block Tor traffic and require "proof-of-humanness" to use a website to its full capacity, but I'm unsure about that.

After having given this idea some thought for a couple of days, other than the above points, I am yet to come up with any significant issues. Please let me know if you can think of any and I'll update this post.

I look forward to seeing if this idea can get us any further to finding a complete solution.

comment:195 in reply to: ↑ 192 Changed 7 months ago by yawning

Replying to cypherpunks:

Replying to yawning:

Proof of concept, if you actually use this for anything other than "testing said proof of concept" you get what you deserve. The README.md has dire warnings about reduction in anonymity, and I will point and laugh at people that have bad things happen to them.

https://git.schwanenlied.me/yawning/cfc

archive.li (which is what archive.is serves all the images from) is using cloudflare now :(

Proof of concept is a proof of concept. Switching the archive service used is a one line change (Suggestions accepted, though archive.is seems to have the least suck privacy/takedown policies).

Maybe someone should contact them to see if they are willing to whitelist Tor access.

I felt inspired and wrote the code for "scrape the DOM to see if it actually is a captcha page, and inject a unblock me now button" (https://imgur.com/MW71d3g). Not in git, I want to get some more things done before I push.

The button works even with NoScript set to paranoid values.

When I have time I'll finalize the UI. I'm leaning towards mirroring the NoScript UX with "Allow CloudFlare globally (Dangerous)" (Switches between the fast reject behavior and the new DOMscraping/injection based unblocking), and some menu items that allow manipulating the internal non-persistent white/blacklist.

(nb: I still would rather prefer clever crypto and I promised someone feedback about such.)

comment:196 in reply to: ↑ 194 ; follow-up: Changed 7 months ago by jgrahamc

Replying to samlanning:

(I've written this up in more detail as a blog post over at https://samlanning.com/blog/the_tor_cloudflare_problem/ that I'd love critique on).

I don't think this improves the situation. It doesn't help for non-human User-Agent (such as legitimate bots, apps and anything calling an API). The right solution is for us to start applying our attack detection technologies to Tor traffic and not make the first layer of defence the CAPTCHA.

comment:197 Changed 7 months ago by cypherpunks

@jgrahamc

Have you checked if Orfox support is happening? The user experience is still terrible with 0% success.

comment:198 in reply to: ↑ 196 Changed 7 months ago by cypherpunks

Well something needs to be done for the humans. Non-human user agents can presumably tell that a missing image has been individually captcha'd but nothing makes this apparent to humans browsing a site. If you know you're serving an image at least serve an error image instead of serving the captcha page in place of an image.

comment:199 Changed 6 months ago by jeffburdges

  • Sponsor set to None

A partial fix similar to Yawning's CFC extension might be an extension to provide a mailto: link to contact the website's operator based upon whois information. A mailto: link goes through the user's own email program, so it's likely to be read and allow discussion.

Also, mailto: links can provide initial text that explains the problem. This could mention that CloudFlare's upmarket competitors like Akamai see no problem with Tor users.

comment:200 Changed 6 months ago by paradox

It is getting worse.

Got a Google captcha. Clicked on audio challenge. Instead of a series of numbers the message was:
"Your computer or network maybe sending automated queries.
To protect our users, we cannot process your request right now."
(Here the full message: http://wikisend.com/download/403898/dos_captcha_audio.mp3)

Subsequently right after the audio message clicked on "Get a visual challenge".
Solved the warped image correctly. Got the content.

This happened repeatedly over the course of several hours. Each time with different content, requested from different exit node located in different countries.
Each time the same pattern, audio message denies, visual challenge carries on.

I think we should determine the responsibilities in the Google-Cloudflare mishmash for indiscriminately denying a minority group of privacy conscious users access to web content.

comment:201 follow-up: Changed 6 months ago by frustrated

Since there are Cloudflare people on this thread, I have a number of questions.

  1. Why is it that the images on the captcha load individually, slowly?
  1. Why is it that the captchas usually fail, and you have to do it over and over again?

I'm serious. I went to look up a Bible passage, REPEATEDLY solved captchas for OVER A HALF HOUR, and gave up and grabbed the bible on my shelf. I was so angry I wanted to break something.

comment:202 in reply to: ↑ 173 Changed 6 months ago by saint

  • Cc saint added

Replying to polyclef:

Cloudflare is using recaptcha. Recaptcha has been broken for years.

http://bitland.net/captcha.pdf describes how to defeat captchas in general and a prior version of recaptcha in particular.

Many challenges are overly simple and can be broken with existing tools that require no further modification.

polyclef mentions that it's possible to derive the answer for the old-style street number captchas using tesseract [1]. Interestingly, there is a version of tesseract in javascript [1]. This is probably not especially useful for the current "select all boxes that contain one pixel of street sign" Recaptcha system, but if there were a way to trigger the old behavior, these techniques could be used together.

[1] http://tesseract.projectnaptha.com/

Last edited 6 months ago by saint (previous) (diff)

comment:203 in reply to: ↑ 87 Changed 6 months ago by cypherpunks

Replying to jgrahamc:

I'll wrap up with a question. How are you intending on rolling out this new feature? Is it going to be opt-in, opt-out, will there be an email sent to your customers about using it? I think that this is something that the community is greatly interested in.

Almost everything we announce goes on our blog so I imagine we'll do it that way. It gets emailed to people who subscribe to the blog. I don't know if it'll be emailed to all customers (mostly because we don't tend to send them a lot of email and it's the marketin group that decides). The current plan is for this to be opt-in.

1) Whitelisitng was NOT announced on blog.cloudflare.com, can you fix that?
2) nor did customers get any email
3) The chosen firewall identifier is confusingly "T1", entering "Tor" leads to error without hint:

http://3j3j5hyf44hgggod.onion/bbo8ui6i.png

comment:204 follow-up: Changed 6 months ago by cypherpunks

I don't know if its coincidental or if Cloudflare is taking its douchebagery to new levels but now accessing some pages even with archive.is is still blocked.

https://www.aei.org/publication/gen-michael-hayden-on-apple-the-fbi-and-data-encryption/

https://archive.is/7u5P8

This whole "enagagement with Tor" is looking like a damage control tactic instead of saying they block us outright and having customers leave. Fuck you Cloudflare.

comment:205 in reply to: ↑ 204 ; follow-up: Changed 6 months ago by cypherpunks

Replying to cypherpunks:

I don't know if its coincidental or if Cloudflare is taking its douchebagery to new levels but now accessing some pages even with archive.is is still blocked.

https://www.aei.org/publication/gen-michael-hayden-on-apple-the-fbi-and-data-encryption/

https://archive.is/7u5P8

This whole "enagagement with Tor" is looking like a damage control tactic instead of saying they block us outright and having customers leave. Fuck you Cloudflare.

Since CF manages whitelisting as a hidden feature, we're gonna have to contact their customers directly. And inform them politely about whitelisting, or better, onionizing their service. I see no other option at this point. @jgrahamc stopped posting 3 weeks ago, the number of daily captchas does not sink, at least in my experience, and I don't believe in miraculous tesseracts, they only work until next generation of gropeware is released. CF and Google conduct a digital form of TSA's groping.

The problem is how to deliver thousands of CF customers our informing email without getting that email globaly blacklisted within one microsecond :)

madD

comment:206 in reply to: ↑ 205 ; follow-up: Changed 6 months ago by jgrahamc

Replying to cypherpunks:

Since CF manages whitelisting as a hidden feature

Not a hidden feature. CEO plans to blog about Tor and this will be included.

we're gonna have to contact their customers directly. And inform them politely about whitelisting, or better, onionizing their service.

CEO wants us to issue .onions automatically for sites on CloudFlare to make things easier all round: https://twitter.com/eastdakota/status/710357574579650560

@jgrahamc stopped posting 3 weeks ago, the number of daily captchas does not sink, at least in my experience, and I don't believe in miraculous tesseracts, they only work until next generation of gropeware is released. CF and Google conduct a digital form of TSA's groping.

I'm here, not posting unless I have something useful to say.

comment:207 in reply to: ↑ 206 Changed 6 months ago by cypherpunks

Replying to jgrahamc:

I'm here, not posting unless I have something useful to say.

Agree. For my part, I liked best when you educated the Tor community that the exit nodes fluctuate. https://trac.torproject.org/projects/tor/ticket/18361#comment:159
Must have been a novelty to many.

madD

comment:208 Changed 6 months ago by jgrahamc

As promised the CEO has written about Tor including information on how to whitelist Tor exit nodes on CloudFlare: https://blog.cloudflare.com/the-trouble-with-tor/

comment:209 Changed 6 months ago by tne

  • Cc tne added

comment:210 in reply to: ↑ 201 Changed 6 months ago by jgrahamc

Replying to frustrated:

Since there are Cloudflare people on this thread, I have a number of questions.

  1. Why is it that the images on the captcha load individually, slowly?
  1. Why is it that the captchas usually fail, and you have to do it over and over again?

I'm serious. I went to look up a Bible passage, REPEATEDLY solved captchas for OVER A HALF HOUR, and gave up and grabbed the bible on my shelf. I was so angry I wanted to break something.

I would be interested to know if this situation is now resolved.

comment:211 follow-up: Changed 6 months ago by jgrahamc

Are Tor users seeing easier to pass CAPTCHAs now?

comment:212 in reply to: ↑ 211 ; follow-up: Changed 6 months ago by tne

Replying to jgrahamc:

Are Tor users seeing easier to pass CAPTCHAs now?

I believe for the first time ever I got through one in a single pass a few minutes ago, where it used to take me two passes at a minimum before (~past year, probably since forever). Small datapoint, but it sounds related.

Side question: Would you mind pointing me to some clarifications regarding the difficulties around shielding only the sites that are under attack and leaving the rest open? I've went through this thread, two HN threads and CF's blog -- given the large amount of discussion I feel like I might have missed it. If not, would you elaborate a little bit on that here or elsewhere? I saw the question pop up quite a few times and I'm interested myself.

Thank you.

comment:213 in reply to: ↑ 212 ; follow-up: Changed 6 months ago by jgrahamc

Replying to tne:

Replying to jgrahamc:

Are Tor users seeing easier to pass CAPTCHAs now?

I believe for the first time ever I got through one in a single pass a few minutes ago, where it used to take me two passes at a minimum before (~past year, probably since forever). Small datapoint, but it sounds related.

Good to hear. Hoping that others will have the same experience.

Side question: Would you mind pointing me to some clarifications regarding the difficulties around shielding only the sites that are under attack and leaving the rest open?

Do you mean could we only show a CAPTCHA to sites that we already know are under attack?

comment:214 in reply to: ↑ 213 ; follow-up: Changed 6 months ago by tne

Replying to jgrahamc:

Do you mean could we only show a CAPTCHA to sites that we already know are under attack?

Yes. I can only assume this was on the table at some point, but I feel I don't have a full understanding of the problem.

comment:215 in reply to: ↑ 214 ; follow-up: Changed 6 months ago by jgrahamc

Replying to tne:

Replying to jgrahamc:

Do you mean could we only show a CAPTCHA to sites that we already know are under attack?

Yes. I can only assume this was on the table at some point, but I feel I don't have a full understanding of the problem.

CloudFlare already does that for sites that are under DDoS (under some circumstances) but it doesn't really make sense here. The Tor network isn't a source of DDoS for us, it's a source of all sorts of other abuse (see above).

comment:216 in reply to: ↑ 215 ; follow-up: Changed 6 months ago by tne

Replying to jgrahamc:

CloudFlare already does that for sites that are under DDoS (under some circumstances) but it doesn't really make sense here. The Tor network isn't a source of DDoS for us, it's a source of all sorts of other abuse (see above).

Indeed, that's actually the source of my question. I imagine the classification of requests that are participating in a DDoS is somehow different from that of requests participating in other kinds of abuse. If you could shed some light on how it is so, I would be very grateful.

(I have a few guesses, but obviously I'd love to avoid speculating if I can avoid it.)

comment:217 in reply to: ↑ 216 ; follow-up: Changed 6 months ago by jgrahamc

Replying to tne:

Replying to jgrahamc:
Indeed, that's actually the source of my question. I imagine the classification of requests that are participating in a DDoS is somehow different from that of requests participating in other kinds of abuse. If you could shed some light on how it is so, I would be very grateful.

Yes. We have all sorts of different systems for dealing with different types of abuse because they are quite different. The IP reputation part, which is the source of the CAPTCHAs that Tor users are seeing, is a small part.

comment:218 in reply to: ↑ 217 ; follow-up: Changed 6 months ago by tne

Replying to jgrahamc:

Yes. We have all sorts of different systems for dealing with different types of abuse because they are quite different. The IP reputation part, which is the source of the CAPTCHAs that Tor users are seeing, is a small part.

Sure, I think we all understand that; the decision to block using a CAPTCHA is based on the reputation of the origin IP only. Can you, in addition, take into account the status of the destination site? (Similar to what you do in DDoS situations when you classify sites as "Under attack" in order to, as I understand it, deploy different countermeasures.)

Of course, as you say, we're not talking about DDoS situations -- the "Under attack" terminology might not be appropriate. Say "Observing abuse" instead if that helps.

So: if the site is "actively observing abuse" and the IP has bad reputation, block using a CAPTCHA as usual. If the site is not "actively observing abuse" or the IP reputation is good, let the request go through.

My question (hopefully clarified now) is: How hard would it be to establish (and remove) this "observing abuse" status (if it makes sense at all)?

The obvious assumption here is that a non-trivial amount of sites are not being actively abused and so it doesn't make sense to put the walls up around them, since it unfortunately prevents many legitimate users from reaching them painlessly as well (or at all, depending on their patience). Barring evidence to the contrary, I believe this assumption to be true. Intuitively, it wouldn't help the most popular sites, which are undoubtedly under *constant* abuse, but it would alleviate a big chunk of the pain expressed in this whole debate.

comment:219 in reply to: ↑ 218 ; follow-up: Changed 6 months ago by jgrahamc

Replying to tne:

Replying to jgrahamc:
Sure, I think we all understand that; the decision to block using a CAPTCHA is based on the reputation of the origin IP only. Can you, in addition, take into account the status of the destination site? (Similar to what you do in DDoS situations when you classify sites as "Under attack" in order to, as I understand it, deploy different countermeasures.)

We will throw CAPTCHAs in other situations not just for IP reputation. CAPTCHA is one of a number of countermeasures we have and is used in different ways.

So: if the site is "actively observing abuse" and the IP has bad reputation, block using a CAPTCHA as usual. If the site is not "actively observing abuse" or the IP reputation is good, let the request go through.

My question (hopefully clarified now) is: How hard would it be to establish (and remove) this "observing abuse" status (if it makes sense at all)?

I'm not sure that totally makes sense. It's better to think at an individual request level and ask "Does this request indicate abuse?" and then decide what to do. Of course, we can take into account other things as well, but we wouldn't want to wait around and measure abuse and then say "OK, now we'll start blocking it" because it might be too late (i.e. the customer may have been hacked/attacked in some way). I think both Tor users and our customers will be happy with a solution like that.

comment:220 in reply to: ↑ 219 ; follow-up: Changed 6 months ago by tne

Replying to jgrahamc:

I'm not sure that totally makes sense. It's better to think at an individual request level and ask "Does this request indicate abuse?" and then decide what to do. Of course, we can take into account other things as well, but we wouldn't want to wait around and measure abuse and then say "OK, now we'll start blocking it" because it might be too late (i.e. the customer may have been hacked/attacked in some way). I think both Tor users and our customers will be happy with a solution like that.

The delay issue was my guess. I don't think the answer is as clear-cut however; it's a trade-off. Many sites will be fine with a few misses before your countermeasures kick in if that means they can still handle them easily without losing their own users/visitors whenever another random site at the other side of the planet is attacked from a shared IP. This is especially the case with spam abuse for example, which is not as dramatic as a breach and yet is probably the number one reason you assign bad rep scores (any published data on this?).

It's not like you can catch everything all the time even now anyway, it's defense in depth and it's all in the numbers. Only you will know if it really makes sense (you have the data) and I appreciate your replies and the time you take to consider this suggestion. It is not for me to say of course, but I like to believe the suggestion is worth your time (from my admittedly limited perspective, I see potential to calm many people down this way -- it is not mine although it is an obvious one, many people are asking here and elsewhere).

I agree wholeheartedly with your mention of focusing on individual requests instead (who wouldn't?). The problem is, it's just a promise at this point. If you could really do it efficiently and reliably, this entire discussion would be moot -- you could drop IP rep altogether. However, you don't, so evidently you can't (yet) do it efficiently and reliably, and timing matters. Whatever long-term plans CF might have regarding a strictly request-level approach, any short-term compromises will help. Also, can we honestly believe strictly-request-level solutions will someday be completely satisfactory? Data correlation is extremely powerful and the temptation (or even pressure from your customers, direct or indirect) will always remain to leverage it (as evidenced by your apparently very successful IP reputation system). Attempting to reduce CF's reliance on it is a noble goal that I support, I'm just afraid it is a mirage that will only perpetuate the status quo (which, in my view and that of many others, is hardly tenable). Hopefully I don't come across as a defeatist, I'm just trying to be realistic (hence the more nuanced suggestion).

Last edited 6 months ago by tne (previous) (diff)

comment:221 in reply to: ↑ 220 ; follow-up: Changed 6 months ago by jgrahamc

Replying to tne:

I agree wholeheartedly with your mention of focusing on individual requests instead (who wouldn't?). The problem is, it's just a promise at this point. If you could really do it efficiently and reliably, this entire discussion would be moot -- you could drop IP rep altogether. However, you don't, so evidently you can't (yet) do it efficiently and reliably, and timing matters.

We already do examine individual requests to look for abuse. That's part of the layers of defense we give web sites.

Whatever long-term plans CF might have regarding a strictly request-level approach, any short-term compromises will help.

I'm working short and medium on this not long. Short term we've introduced the ability for sites to whitelist Tor, we changed our clearance cookie so that it applies across circuit changes, and we've recently made changes to the CAPTCHAs which should stop people getting stuck in loops of CAPTCHAs. I'm also working on a slightly less short term project to apply other technologies (non-CAPTCHA) to Tor. The important thing there is that I need to measure their effectiveness in this situation and will do so.

Attempting to reduce CF's reliance on it is a noble goal that I support, I'm just afraid it is a mirage that will only perpetuate the status quo (which, in my view and that of many others, is hardly tenable). Hopefully I don't come across as a defeatist, I'm just trying to be realistic (hence the more nuanced suggestion).

I'm not spending my time here as some sort of mirage or PR exercise.

comment:222 in reply to: ↑ 221 ; follow-up: Changed 6 months ago by tne

Replying to jgrahamc:

Replying to tne:

I agree wholeheartedly with your mention of focusing on individual requests instead (who wouldn't?). The problem is, it's just a promise at this point. If you could really do it efficiently and reliably, this entire discussion would be moot -- you could drop IP rep altogether. However, you don't, so evidently you can't (yet) do it efficiently and reliably, and timing matters.

We already do examine individual requests to look for abuse. That's part of the layers of defense we give web sites.

Exactly; it's "part of" your solution. In and of itself, it isn't sufficient. This means you'll continue to rely on IP rep. Nobody likes that, not even you I reckon, but it's the best you have right now. Dealing with that reality, I think there are ways to reduce the pain in specific areas (e.g. sites that are not being "actively abused") and that are worth exploring. Would you comment on that?

Whatever long-term plans CF might have regarding a strictly request-level approach, any short-term compromises will help.

I'm working short and medium on this not long. Short term we've introduced the ability for sites to whitelist Tor, we changed our clearance cookie so that it applies across circuit changes, and we've recently made changes to the CAPTCHAs which should stop people getting stuck in loops of CAPTCHAs. I'm also working on a slightly less short term project to apply other technologies (non-CAPTCHA) to Tor. The important thing there is that I need to measure their effectiveness in this situation and will do so.

I know, I've been following the discussion. I probably should have thanked you and your team for that beforehand. As I said, I even benefit from some of those changes, and that's great.

I'm looking forward to those non-CAPTCHA approaches. It's good to hear they're planned for the "short to medium term", since for many people those are the ones that matter most.

(Note that this is orthogonal to the point I was making; but that's OK.)

Attempting to reduce CF's reliance on it is a noble goal that I support, I'm just afraid it is a mirage that will only perpetuate the status quo (which, in my view and that of many others, is hardly tenable). Hopefully I don't come across as a defeatist, I'm just trying to be realistic (hence the more nuanced suggestion).

I'm not spending my time here as some sort of mirage or PR exercise.

Given the whole thread above I understand the tone, but I'd like not to be caught in the crossfire. I'm referring to a technical mirage (I think it's fair to say at this point that dropping IP reputation is not a goal you can set a date for right now, and maybe you'll never be able to). I have yet to see anything that would suggest CF is trying to mislead anyone deliberately, and I'm not trying to imply it myself.

Assumption: By "It's better to think at an individual request level and ask "Does this request indicate abuse?" and then decide what to do. Of course, we can take into account other things as well, but [...]" you didn't really mean that you were aiming to do that exclusively, as that would prevent you from using an IP reputation system (which uses data besides the isolated request, i.e. reputation scores gathered via other customer sites). I interpreted it like that however, and we might have talked past each other. If that's correct, what I said will make more sense.

comment:223 in reply to: ↑ 222 ; follow-ups: Changed 6 months ago by jgrahamc

Replying to tne:

Exactly; it's "part of" your solution. In and of itself, it isn't sufficient. This means you'll continue to rely on IP rep. Nobody likes that, not even you I reckon, but it's the best you have right now.

Every web property of significant size uses some sort of IP-based reputation. It's one way web sites deal with abuse (sometimes it's super-manual: web site admins look at logs and restrict certain IPs). No plan to ditch IP reputation, but CloudFlare likes to make continuous improvements and I think this is an area where we can do that.

Dealing with that reality, I think there are ways to reduce the pain in specific areas (e.g. sites that are not being "actively abused") and that are worth exploring. Would you comment on that?

I need to think about it. I don't have a ready answer to whether that would work. Will do some internal investigation.

I know, I've been following the discussion. I probably should have thanked you and your team for that beforehand. As I said, I even benefit from some of those changes, and that's great.

Good, I'm glad to hear the changes we are making are helping.

Assumption: By "It's better to think at an individual request level and ask "Does this request indicate abuse?" and then decide what to do. Of course, we can take into account other things as well, but [...]" you didn't really mean that you were aiming to do that exclusively, as that would prevent you from using an IP reputation system (which uses data besides the isolated request, i.e. reputation scores gathered via other customer sites). I interpreted it like that however, and we might have talked past each other. If that's correct, what I said will make more sense.

It's more a question of how you mix this stuff. Suppose you have a bad IP reputation for some IP, plus you look at the request and it looks it might be SQLi then you impede that, but if you see a request and it's clean then you downplay the IP reputation and let the request through. Equally you have a good IP and it's certain it's a known exploit, you block that.

comment:224 in reply to: ↑ 223 Changed 6 months ago by tne

Replying to jgrahamc:

I need to think about it. I don't have a ready answer to whether that would work. Will do some internal investigation.

Greatly appreciated.

It's more a question of how you mix this stuff. Suppose you have a bad IP reputation for some IP, plus you look at the request and it looks it might be SQLi then you impede that, but if you see a request and it's clean then you downplay the IP reputation and let the request through. Equally you have a good IP and it's certain it's a known exploit, you block that.

That is more nuanced. We're now on the same wavelength; thank you for your patience.

comment:225 in reply to: ↑ 223 ; follow-up: Changed 6 months ago by lunar

Replying to jgrahamc:

Every web property of significant size uses some sort of IP-based reputation. It's one way web sites deal with abuse (sometimes it's super-manual: web site admins look at logs and restrict certain IPs). No plan to ditch IP reputation, but CloudFlare likes to make continuous improvements and I think this is an area where we can do that.

I think this is very short-sighted. The future is Carrier-grade NAT and IPv6. Both makes IP reputation highly impractical. Or maybe you have a fancy solution. I'd be curious to hear about it!

comment:226 in reply to: ↑ 225 Changed 6 months ago by jgrahamc

Replying to lunar:

Replying to jgrahamc:

Every web property of significant size uses some sort of IP-based reputation. It's one way web sites deal with abuse (sometimes it's super-manual: web site admins look at logs and restrict certain IPs). No plan to ditch IP reputation, but CloudFlare likes to make continuous improvements and I think this is an area where we can do that.

I think this is very short-sighted. The future is Carrier-grade NAT and IPv6. Both makes IP reputation highly impractical. Or maybe you have a fancy solution. I'd be curious to hear about it!

The future is IPv6 not CGN. Really major ISPs are making a massive push to IPv6 as are mobile carriers.

We do have a fancy solution that we are working on, but I don't want to make a specific promise yet.

comment:227 follow-up: Changed 6 months ago by toruser250

why are you guys even using googles captcha? not only do I have to solve them constantly on all of the sites i visit, they're not stopping bots either. only us.

www.gizmodo.co.uk/2016/04/bots-can-now-fool-human-verifying-captchas/

www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf

turns out simply having a 9 day old google.com cookie gives you the free pass to the checkbox mode, no test needed, without any tracking or work on their end. maybe we should just collect google.com cookies and inject them into our Tor requests to bypass the checkbox and automate the whole process.

Last edited 6 months ago by toruser250 (previous) (diff)

comment:228 in reply to: ↑ 227 Changed 6 months ago by phw

Replying to toruser250:

why are you guys even using googles captcha? not only do I have to solve them constantly on all of the sites i visit, they're not stopping bots either. only us.

This deserves emphasis. I know, it's an unpleasant question because the business model of many CloudFlare customers relies on being able to distinguish bots from people. Many years ago, that was a reasonable assumption. Today, it no longer is. Here's the more comprehensive version of the paper toruser250 referenced:
http://www.cs.columbia.edu/~polakis/papers/sivakorn_eurosp16.pdf

These folks were able to automatically solve ~71% of reCAPTCHA challenges. I'm quite sure that's better than what many humans can do, me included. CAPTCHAs are obsolete. Unfortunately, it's not much fun to explain that to companies whose business model did not adapt, so I can somewhat relate to CloudFlare's issues.

comment:229 follow-up: Changed 5 months ago by tarmick

signed up because i saw this today: motherboard.vice.com/en_uk/read/the-cloudflare-and-tor-stalemate-is-harming-users

just signing up on torproject required me to solve 10+ google recaptchas. it's insane. i had to solve google captcha to talk about google captcha on a tor site. why can't we at least be safe from the nonsense of it here?

it's one thing for cloudflare to care about solving tor (which i think they do, by way of engaging here) but it's another for google to do something about their captcha. i just don't understand why cloudflare doesn't use something else or something special for tor users. It's pretty clear google either do not care about Tor users or they can't because their security solution doesn't scale well.

is the google recaptcha team engaging on this discussion? is their "threat level" linked to cloudflares? or do we get shit listed on both and have to deal with both constant captcha (cloudflare) and shitty impossible captcha (google).

comment:230 in reply to: ↑ 229 Changed 5 months ago by jgrahamc

Replying to tarmick:

it's one thing for cloudflare to care about solving tor (which i think they do, by way of engaging here) but it's another for google to do something about their captcha. i just don't understand why cloudflare doesn't use something else or something special for tor users. It's pretty clear google either do not care about Tor users or they can't because their security solution doesn't scale well.

We're actively looking into two things: fewer CAPTCHAs and a better CAPTCHA solution. The latter is pretty hard because we need something that scales to our size, works internationally, actually does what it says on the tin. We have made changes (with the help on Google) to reduce the complexity of CAPTCHAs being seen by Tor users and continue to work on this.

comment:231 Changed 5 months ago by cypherpunks

Please make CAPTCHAs solvable without a pointing device like mice, trackpads or touch panels.

Dumb tech people tend to think that everyone uses software like they do. ;)

comment:232 Changed 4 months ago by paradox

Even without Tor. "Cloudflare is ruining the internet for me".
http://www.slashgeek.net/2016/05/17/cloudflare-is-ruining-the-internet-for-me/

comment:233 Changed 4 months ago by lissacoffey

When tied with calling sky, it seems like a basic analytics problem to enumerate users and most sites visited in a given session.

comment:234 Changed 4 months ago by cypherpunks

FWIW the CAPTCHAs given when JavaScript is disabled are IMO easier to solve than with Javascript enabled.

Without JavaScript i get the type of CAPTCHA that simply asks me to tick the boxes that match the given criterion. I get to review the submission before actually submitting it and therefore get these right on the first try most of the time. With JavaScript i get the type of CAPTCHA that replaces the images with new ones until none match the given criterion. These take much longer to solve because the images are replaced multiple times and it seems one accidental misclick invalidates the entire CAPTCHA.

I would appreciate it if the JavaScript version would function more like the non-JavaScript version in terms of ease of solving and the option to review submissions.

Lastly, the non-JavaScript version requires users to submit a long code to authenticate them solving the CAPTCHA successfully. Since CloudFlare is a MITM, wouldn't CloudFlare be able to handle this submission automatically? This would remove one more step that users have to take.

comment:235 Changed 4 months ago by to1

When I attempt to post HERE on trac.torproject.org, I see the following error:

Submission rejected as potential spam

Content contained these blacklisted patterns: 'h t t p :', '(?i) b u s i n e s s'

Below the error message, a CAPTCHA is displayed. If I solve the CAPTCHA, the page simply refreshes. I see the same error message, but now there is no CAPTCHA, no submit button, or anything else. I recreated my post and solved the CAPTCHA correctly a dozen times, but it would not accept a correct solution. I see that another user encountered the same problem here and assumed it was a Cloudflare issue. They clearly did not understand that Tor Project does not use Cloudflare. However, the CAPTCHA system is obviously broken in Trac -- including the audio version:

https://www.gstatic.com/recaptcha/api/audio/dos_captcha_audio.mp3

Google says that Tor Project has configured the CAPTCHA system improperly:

"Please take a look at the reCAPTCHA widget in this link. If it shows a normal captcha (letters or numbers), then your computer and network are safe, but the site you came from could have a configuration problem. Please contact them directly so they can get it fixed."

https://www.google.com/recaptcha/securityhelp

update: a method to bypass the CAPTCHA has been found

Last edited 4 months ago by to1 (previous) (diff)

comment:236 Changed 4 months ago by to1

http://www.crimeflare.com/gifs/daddy5.jpg
To answer the original post, I do believe a surveillance warning is justified, for the following reasons:

1) Cloudflare is based in a country with secret courts, secret police and secret prisons that are above the law -- and this secret government has characterized Cloudflare's data as extremely valuable

http://www.crimeflare.com/honeypot.html

2) The CEO says "Cloudflares strength lies in the DATA it collects -- not in its CODE."

https://motherboard.vice.com/en_uk/read/us-security-firm-defends-partnership-with-censorship-happy-chinese-giant-baidu

3) The U.S. federal government is a Cloudflare customer
4) Some Cloudflare customers are paying over 1 million dollars per year for an undisclosed service
5) The gestapo routinely visits Cloudflare to collect information about its users & customers
6) Cloudflare has no intention to shut down as Lavabit did in order to protect the user from unlawful surveillance
7) Cloudflare has never stated that a government agency did not install wiretapping equipment or software on the same premises as a Cloudflare server

http://www.forbes.com/sites/kashmirhill/2014/07/30/cloudflare-protection

8) Cloudflare has never indicated that the architecture of its content distribution network is resistant to warrantless mass surveillance

http://www.crimeflare.com/cfssl.html
http://www.crimeflare.com/cfgrowth.html

9) Cloudflare has given the Chinese government unprecedented censorship capability

http://motherboard.vice.com/read/cloudflare-baidu-partnership-yunjiasu-china
https://motherboard.vice.com/en_uk/read/us-security-firm-defends-partnership-with-censorship-happy-chinese-giant-baidu

10) Cloudflare is responsible to big investors, not to the public

http://techcrunch.com/2015/09/22/cloudflare-locks-down-110m-from-fidelity-microsoft-google-baidu-and-qualcomm/
https://angel.co/cloudflare
https://www.crunchbase.com/organization/cloudflare/investors


Cloudflare is not merely undermining the security of the Tor network --they are also violating EU law by failing to clearly disclose the privacy risks of Cloudflare cookies. But there is an even larger problem here: Cloudflare is breaking countless web sites, desktop apps, Smart TV's & other devices, provoking millions to anger. This encapsulates the issue and I think it bears repeating:

Maybe CloudFlare could be persuaded to use CAPTCHAs more precisely?
https://trac.torproject.org/projects/tor/ticket/18361#comment:17

A later response indicates that Cloudflare might be ready to address this:

"The right solution is for us to start applying our attack detection technologies to Tor traffic and not make the first layer of defence the CAPTCHA."

https://trac.torproject.org/projects/tor/ticket/18361#comment:196

On the surface that was an encouraging statement, but as time passes, we see that no effort is being made to implement a working solution. The Akamai study revealed that 99% of malicious traffic does not originate from Tor, and the conversion rate of traffic into sales is the same on Tor as everywhere else. But Cloudflare's CEO just continues to scapegoat Tor, defending the ineffective technique of IP blacklisting to sell a bogus security service.

I still think we ought to engage them in discussion, but why are Cloudflare employees so upset by a few "inflammatory" comments when the company's actions offend millions of people around the globe on an hourly basis by breaking devices which they worked hard to purchase? Cloudflare's approach simply labels an IP address as hostile: it makes no attempt to characterize traffic. How could they NOT expect to upset the public with this ham-handed approach? I would suggest they study how artificial intelligence and machine learning is being applied in hybrid intrusion detection systems.

Under the current regime, internet apps & devices which request a public resource via HTTP often receive junk data from Cloudflare servers which does not conform to protocol standards. For example, some antivirus apps will no longer update, but neither the user or the developer is aware that the app was broken by Cloudflare's traffic tampering. The world wide web was not designed to accommodate conditional access methods at the protocol level, and this cannot be changed just to accommodate Cloudflare's dysfunctional business model. Instead of doing the work that is needed to deploy a proper IDS, Cloudflare's CEO wants the internet to be redesigned just for him. And he says Tor is being unreasonable!?

For people in places like China or Iran, it is unquestionably a form of censorship if you can no longer use Tor to access foreign news on your TV -- and that is now a common occurrence when over 2 million web sites are hosted by Cloudflare. Most web masters dont even realise how many visitors or customers they are losing, since Cloudflare is concealing the statistics:

https://trac.torproject.org/projects/tor/ticket/18361#comment:169

Furthermore, the blind token scheme will not work for desktop apps & Smart TV's. Cloudflare's MITM traffic mangling demands a workaround that is just as unconventional as their approach to "security". If they will not obey the law or listen to reason, we must proceed with efforts to mitigate the Cloudflare problem through technological countermeasures:

1) Could we create a Tor network service that automatically caches Cloudflare cookies in the same way that we cache DNS resolution requests, so internet apps & media players always receive the requested resource (instead of receiving junk data which the app cannot process)?

2) Most Tor users would rather give up the ability to post or purchase from a Cloudflare customer than be forced to solve dozens (or hundreds) of CAPTCHAS a day. So fetching the site from another source in read-only mode is an attractive option:

When Cloudflare traffic tampering is detected by the Tor client, we could fetch the requested resource from one of the public web cache services. I would also propose that we automate the process of locating a cached resource across multiple cache providers. Could we convince the major cache providers to create a standard API which makes it easier to implement this feature?

3) There are several projects under development which aim to create a distributed hosting network that cannot be DDOSed (and cannot be owned or controlled by any government or corporation.) In the long term, this technology has the potential to replace Cloudflare. We should support and collaborate with those projects to every possible extent. A few useful references are listed below.

https://p2pfoundation.net/Category:P2P_Infrastructure
http://ipfs.io/
http://p2peducation.pbworks.com
http://www.swirl-project.org/

Last edited 4 months ago by to1 (previous) (diff)

comment:237 Changed 3 months ago by toruser250

found more discussion regarding cloudflare captcha here today.

news.ycombinator.com/item?id=12001964

they even mention what was mentioned above, using this exploit to make a plugin for tor to just bypasses this nightmare. most of the time recaptcha times out for me now.

even though they mention above they're working on this. nothing has changed.

comment:238 Changed 5 weeks ago by cypherpunks

jgrahamc: The CAPTCHA-loop is back. Why is that? Before giving up I was served 13 lakes, rivers, and of course ¡street signs! No matter which exit node was used.

Last edited 5 weeks ago by cypherpunks (previous) (diff)

comment:239 Changed 5 weeks ago by jgrahamc

Hmm. That shouldn't happen. I have raised this internally to find out why. Did you have JavaScript enabled or disabled?

comment:240 Changed 2 weeks ago by cypherpunks

The loop affected JavaScript disabled browsing. On the next day it was back to previous state, 1 or 2 solutions. Thanks jgrahamc

Note: See TracTickets for help on using tickets.