Opened 4 years ago

Closed 4 years ago

#16995 closed enhancement (worksforme)

Splitting the pool of bridges by seperating people depending on typing cadence

Reported by: elypter Owned by: isis
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords: bridge-dist, bridgedb-https, ml
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

with OCR getting better and better captchas soon wont be able to provide enough protection against bots fetching bridges anymore. but even if it was safe enough a censor could still hire a cheap worker to type in captchas all day long.

if you let a neural network group people by typing cadence and only supply a group with a subset of the bridges then a single person/bot will never be able to pull the whole database.

Child Tickets

Change History (6)

comment:1 Changed 4 years ago by yawning

Each IP address that requests bridges from bridgedb only gets a subset of the total bridges already. So a single person/bot isn't able to pull the whole database unless the person also has access to a lot of IP addresses.

I don't particularly see the point here, since there are more effective ways to gain lots of bridges quickly, but I'll leave it up to isis as to if this should be closed or not.

comment:2 Changed 4 years ago by cypherpunks

Are you serious? Why not ask people to provide a credit card number?

Persitent identifiers that can link back to a user or partition groups of users are very bad news.

Bundle a few features like you suggest and you might as well give every Tor user a nametag.

comment:3 Changed 4 years ago by elypter

i know how this sounds but it would only be useable for tracking if this fingerprinting happens in a second place as well and its also in the hands of the bridgedb page which if you dont trust could do much worse things.
the only way an external attacker could take advantage of this is if he has most of the bridges and does a sophisticated reverse engeneering attack on the neuronal network. and then he can only find out which group of bridges a user probably uses. btw there is no way to protect from typing cadance fingerprinting anyway if the client uses javascript on other websites.

that being said i know that there is a trade off between deanonymisation and bridge protection. i tried to find something that is difficult to fake,evenly distributed and cannot easily found out by an attacker who is watching the users on the network or with cooperating websites.

and the way it is now is far worse for anonymity btw. if an attacker is able to inject packets with fake ip adresses into the internet he would be able to find out which bridges are being sent to each ip address. so if he controls the node after the bridge of a connected user there is only a small set of ip addresses the user could probably have.

Last edited 4 years ago by elypter (previous) (diff)

comment:4 Changed 4 years ago by yawning

An adversary would need to both be able to inject (easy) and receive (hard) packets with fake IP addresses, since all the bridge distribution mechanisms are TCP based.

I still think this is unneeded, since there are far easier ways to mass-enumerate bridges that this does not defeat.

comment:5 Changed 4 years ago by elypter

if there is an easier way to get bridges then this should be tackled first of cause (if possible). if it's black market email addresses then maybe a captcha verification within the mail could help. if its the connection between a bridge and a public node which identifies it then a second layer of bridges could help. if its something else and it has not been written down anywhere in public yet you dont have to disclose it here. i just want to emphasize that it should not be given up upon it that easily. today it might not be such a big problem yet but this could change all of the sudden.and since many of the bridges probably have static ips it will be too late then. if bridges are easy to grab because there are not that many of them this might change as well. who knows what role tor plays in 5 years?

sorry btw if the easy way would have been easy to find. i looked quite a bit around but not in every corner.

Last edited 4 years ago by elypter (previous) (diff)

comment:6 in reply to:  description Changed 4 years ago by isis

Keywords: bridge-dist bridgedb-https ml added
Resolution: worksforme
Status: newclosed

Replying to elypter:

with OCR getting better and better captchas soon wont be able to provide enough protection against bots fetching bridges anymore. but even if it was safe enough a censor could still hire a cheap worker to type in captchas all day long.


CAPTCHAs (and many other Proof-of-Work systems) already provide little-to-no protection against enumeration. We do not intend to continue their usage in the long term for new Bridge distribution systems which we develop.

The current plan for moving forward is to create a new Bridge Distributor (#7520) which uses a variant of the rBridge scheme in order to anonymously record "good behaviour points" for Bridge users whose Bridges do not routinely become blocked. These "good behaviour points" may later be "spent" by a well-behaved user in order to obtain new Bridges or to invite friends into the system. Once this system is in place, and a suitable user-friendly mechanism exists within Tor Browser to interact with it, my plan is to allocate an increasing majority of new Bridges to that system. (The HTTPS and Email Distributors will be left in place, but will eventually contain only a minor portion of the total Bridges.)

Due to the overwhelming number of development hours required to implement this new Distributor, I will not have time to develop major improvements to the HTTPS and Email Distributors. Further, I would argue that doing so would be a waste of time, since, as mentioned above, these Distributors will not contain very many Bridges. However, I would gladly encourage you to contribute patches for less time-consuming anti-enumeration improvements to either the HTTPS or Email Distributors.

if you let a neural network group people by typing cadence and only supply a group with a subset of the bridges then a single person/bot will never be able to pull the whole database.


As mentioned by Yawning above, we already have simpler measures in place which provide precisely the same protection properties (in addition to grouping users by IP address subnet, we also rotate hashrings at regular intervals). Also, Tor Browser truncates timestamps, including those which could be used by a webapp to fingerprint user typing cadence.

Further, neural networks are likely overkill for this particular application. Using an SVM or even k-NN would be a more manageable approaches. If you wish to play with doing so in Python, I'd encourage you to check out the various classification algorithms provided by the Scikit project.

Closing for now, since I've no plans to implement anything like this, but please feel free to reopen if you'd like to contribute patches.

Note: See TracTickets for help on using tickets.