If we have a large number of fallback directory mirrors, we could make sure each operator only sees a small proportion of traffic by only including one fallback per operator.
We could identify operators by:
Contact Info
My Family
However, this is easy to break, because it only works for operators who correctly configure MyFamily, or use identical ContactInfo fields. (That said, these operators can still have their machines compromised.)
As a workaround, in #17158 (moved) I modified the script to output the number of fallbacks per ContactInfo. This will note in each fallback entry if an operators could see a large amount of client traffic.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
We could also just refuse to whitelist too many fallback directories from the same operator.
(Or, when we switch to opt-out, blacklist lower-weight fallbacks from operators who would otherwise have too many fallbacks.)
I like this, even if it's trivial to game we should at least not allow it by default.
In #15775 (moved) I saw that Exit relays are also on the list. Has this changed?
I am having some mixed thoughts about including Exit relays in the list of fallback directories. My thoughts are:
Exit relays are more scarce (even the network might not have the exit capacity overloaded, there are a lot of middle relays compared to exit relays).
Exit relays might need the extra traffic required for a fallback directory to serve exit circuits.
Exit relays have higher chances to be taken down (ISP gets mad because of abuse/blacklisting, seizures, etc.) which means (possibly) small performance penalties for the previous Tor releases that included them.
How will the list look like if we only consider middle relays with a certain minimum weight such that we will be able to hardcode them without any weight? I guess we still have plenty to choose from.
Proper configured fallbacks should not get excluded from whitelist. Those who are not lie and not leave empty are most likely those we can consider to be trustworthy.
I think AS, adressrange and country are more important values here. This could be a great point to add a pile of anonymity and security with a few amount of effort: Fallbacks in rare countrys could get extra weight. Same with with unique adressrange or fameless ASes. Here could something like a diversity weight take place. Some algorithm to rate rarity could possibly take place in other cases.
But even those who are doing proper description should not get assigned more than a cap of x% fallback weight. In cases of fallbacks with same family / contactinfo get in sum over x% i would tend to blacklist higher-weight fallbacks of those. This would reduce the cases of blacklisting multiple fallbacks of such a group to get under x% cap which would increase diversity again.
I like this, even if it's trivial to game we should at least not allow it by default.
In #15775 (moved) I saw that Exit relays are also on the list. Has this changed?
I am having some mixed thoughts about including Exit relays in the list of fallback directories. My thoughts are:
Exit relays are more scarce (even the network might not have the exit capacity overloaded, there are a lot of middle relays compared to exit relays).
I allowed Exit relays because some operators have under-utilised exits available.
The extra load is negligible for a large relay (20KB/s = 50GB/month)
Exit relays might need the extra traffic required for a fallback directory to serve exit circuits.
I asked operators of under-utilised exits to opt-in.
Exit relays have higher chances to be taken down (ISP gets mad because of abuse/blacklisting, seizures, etc.) which means (possibly) small performance penalties for the previous Tor releases that included them.
Perhaps, but #4483 (moved) ensures that Tor clients try 4 fallbacks and 1 authority in the first 10 seconds of bootstrap.
How will the list look like if we only consider middle relays with a certain minimum weight such that we will be able to hardcode them without any weight? I guess we still have plenty to choose from.
I'll have a look at how this changes the final list.
I will also add "Exit/non-Exit" to my list of diversity criteria to consider.
Proper configured fallbacks should not get excluded from whitelist. Those who are not lie and not leave empty are most likely those we can consider to be trustworthy.
Regardless of whether an operator is trustworthy or not, relays operated by a single operator are more likely to change address or go down at the same time. And I don't want to paint a target on any one operator by putting a large number of their relays in the list.
I think AS, adressrange and country are more important values here. This could be a great point to add a pile of anonymity and security with a few amount of effort: Fallbacks in rare countrys could get extra weight. Same with with unique adressrange or fameless ASes. Here could something like a diversity weight take place. Some algorithm to rate rarity could possibly take place in other cases.
I have access to IPv4 and IPv6 addresses, but I don't have access to country or AS in the python script I'm using. I don't have time to develop that feature before 0.2.8-rc, please feel free to open a ticket for it in "0.2.???". (I'm not sure if it will make it into 0.2.9, because we've already triaged many tickets out.)
Also, a system like this is easy to game: "Want to get a relay on the fallback list? Put it in a rare country/AS."
I prefer not to have a diversity weighting system, because they are complex, and mash together many criteria into a single number. It's hard to reason correctly about complex systems like this.
I propose instead that we limit the number of relays in the list that share certain attributes, like operator and IP address. This ensures good uptime, which I believe to be a significant threat to reliability. It also ensures that no one operator sees too much client traffic, which is also a significant threat to privacy. (Fortunately, mitigating one of these threats tends to mitigate the other.)
But even those who are doing proper description should not get assigned more than a cap of x% fallback weight.
I agree, see above.
In cases of fallbacks with same family / contactinfo get in sum over x% i would tend to blacklist higher-weight fallbacks of those. This would reduce the cases of blacklisting multiple fallbacks of such a group to get under x% cap which would increase diversity again.
We are weighting each fallback equally for client selection, because it's simpler and easier to reason about (#17905 (moved)). It also allows me to remove some complex code that down-weighted high weight fallbacks so they didn't see too much client traffic (and so we don't depend on them being up too much).
I propose we analyse the fallback list in 0.2.8 for relay diversity, and then do the same analysis when we make up the list for 0.2.9 in September 2016.
Diverting fallback weight / same IP in list could be wise:
2 instances running on 1 IP - 50% fallback weight each of it
4 instances running on 1 IP - 25% fallback weight each of it
shouldnt be too hard to implement + even increase good acting
...Sounds like there gonna be some potential useful knobs. Also leaving possibility to analyse after generating the list great to hear.
Diverting fallback weight / same IP in list could be wise:
2 instances running on 1 IP - 50% fallback weight each of it
4 instances running on 1 IP - 25% fallback weight each of it
shouldnt be too hard to implement + even increase good acting
...Sounds like there gonna be some potential useful knobs. Also leaving possibility to analyse after generating the list great to hear.
Having had experience with modifying weights, I'm reluctant to do it, because the results are not very transparent, and they have a complex relationship to the inputs and parameter choices. This makes it hard to reason about security, privacy, and reliability.
I tried modifying exit weights, but that doesn't seem to be necessary for diversity in the current list of fallbacks. The option is still in the script if we decide we want to do it.
I also tried modifying consensus weights so that the resulting client selection weights weren't too high. That was really complex code. Eventually we abolished client selection weights entirely (#17905 (moved)), and I removed the complex code.
See my next comment for a summary of the fallback selection process. If there's a particular diversity criteria you're concerned about, we can discuss it on tor-dev and try to reach a consensus about reasonable values.
Restricting fallbacks to one per operator is done in c157a31ee8bd84587e6e61b674b33c792154d74a in my branch fallbacks-201604-v9 in https://github.com/teor2345/tor.git
The majority of that branch is being reviewed in #17158 (moved).
One Per Operator
Here's a breakdown of the selection process:
100 fallbacks were selected
all 100 fallbacks passed the 15s consensus download check
73 more are available from distinct operators
104 were eliminated due to the "one per operator" restriction (family, contact, IP)
other fallbacks were commented-out in the whitelist because operators told me they were on the same machine, even though they had different IP addresses
if we need to in future, we can modify the restriction to one per IP, but 2 per family/contact
Bandwidth
The range of bandwidths selected is 5.4 - 70.4 MB/s, the minimum is 3MB/s, which is 100 times the expected extra load of 20-30KB/s. No weights were modified, but we do ensure that the script is harder to game by limiting the bandwidth or each relay to its "measured" bandwidth. (We approximate measured bandwidth based on a conversion that uses the median consensus weight to bandwidth ratio among fallbacks.)
Network Diversity
The script output the following analysis of the network diversity of the fallback list:
26/100 = 26% of fallbacks are on IPv6There are 5/100 = 5% fallbacks in the IPv4 /16 containing 178.62.199.22636/100 = 36% of fallbacks are in an IPv4 /16 with other fallbacksThere are 2/100 = 2% fallbacks in the IPv4 /24 containing 91.219.237.2442/100 = 2% of fallbacks are in an IPv4 /24 with other fallbacksThere are 5/26 = 19% fallbacks in the IPv6 /32 containing [2001:41d0:e:f67::114]16/26 = 62% of fallbacks are in an IPv6 /32 with other fallbacksThere are 2/26 = 8% fallbacks in the IPv6 /48 containing [2001:41d0:a:74a::1]2/26 = 8% of fallbacks are in an IPv6 /48 with other fallbacks40/100 = 40% of fallbacks are on IPv4 ORPort 44340/100 = 40% of fallbacks are on IPv4 ORPort 900120/100 = 20% of fallbacks are on other IPv4 ORPorts9/26 = 35% of IPv6 fallbacks are on IPv6 ORPort 4436/26 = 23% of IPv6 fallbacks are on IPv6 ORPort 900111/26 = 42% of IPv6 fallbacks are on other IPv6 ORPorts35/100 = 35% of fallbacks are on DirPort 8044/100 = 44% of fallbacks are on DirPort 903021/100 = 21% of fallbacks are on other DirPorts19/100 = 19% of fallbacks have the Exit flag
While I might like to tweak some of these figures slightly, none of them are critical.
(I'd be much more concerned if some of them were over 50%.)
Summary
I'm happy with the diversity of the fallback list.
As I said before, each restriction costs bandwidth, and I'm happy with the existing bandwidths, and the results of applying a one-per-operator rule. (I'm also concerned that some restrictions may reduce more important diversity criteria.)
I think that the other network diversity criteria the script analyses are acceptable.
I'd be interested to see how the fallbacks are spread between ASs or countries, if anyone else wants to do that analysis. Remember that the countries derived from VPS IPs are sometimes inaccurate.
One operator withdrew a fallback that was on the hard-coded fallback list. I have pushed a fixup after rebuilding the list. 11 fallbacks on the list changed (another turns up in the diff, but is just a reordering). It's good to see the list is stable, even though I changed 14 whitelist/blacklist entries.
Here is the updated analysis:
One Per Operator
100 fallbacks were selected
all 100 fallbacks passed the 15s consensus download check
78 more are available from distinct operators
102 were eliminated due to the "one per operator" restriction (family, contact, IP)
Bandwidth
The range of bandwidths selected is 6.0 - 67.2 MB/s.
Network Diversity
Here is the updated network diversity analysis:
27/100 = 27% of fallbacks are on IPv6There are 4/100 = 4% fallbacks in the IPv4 /16 containing 37.187.7.7432/100 = 32% of fallbacks are in an IPv4 /16 with other fallbacksThere are 2/100 = 2% fallbacks in the IPv4 /24 containing 91.219.237.2442/100 = 2% of fallbacks are in an IPv4 /24 with other fallbacksThere are 5/27 = 19% fallbacks in the IPv6 /32 containing [2001:41d0:e:f67::114]16/27 = 59% of fallbacks are in an IPv6 /32 with other fallbacksThere are 3/27 = 11% fallbacks in the IPv6 /48 containing [2001:41d0:a:74a::1]5/27 = 19% of fallbacks are in an IPv6 /48 with other fallbacksThere are 2/27 = 7% fallbacks in the IPv6 /64 containing [2a03:b0c0:3:d0::208:5001]2/27 = 7% of fallbacks are in an IPv6 /64 with other fallbacks40/100 = 40% of fallbacks are on IPv4 ORPort 44342/100 = 42% of fallbacks are on IPv4 ORPort 900118/100 = 18% of fallbacks are on other IPv4 ORPorts10/27 = 37% of IPv6 fallbacks are on IPv6 ORPort 4437/27 = 26% of IPv6 fallbacks are on IPv6 ORPort 900110/27 = 37% of IPv6 fallbacks are on other IPv6 ORPorts38/100 = 38% of fallbacks are on DirPort 8044/100 = 44% of fallbacks are on DirPort 903018/100 = 18% of fallbacks are on other DirPorts22/100 = 22% of fallbacks have the Exit flag
This isn't a significant change from the previous list.