Opened 20 months ago

Last modified 14 months ago

#24499 new defect

Bandwidth determination is flawed

Reported by: Hassprediger Owned by: tom
Priority: Medium Milestone:
Component: Core Tor/Torflow Version:
Severity: Normal Keywords: tor-bwauth
Cc: juga Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I'm running a non-exit node in Australia. Please read "Australia" as a high latency area, that has not many other nodes around. I set RelayBandwidthRate to 1100, which is around 20% of my connection.

Just like all people who start up a new node, I was wondering why the bandwidth was not used. Unfortunately most information, that can be found about the topic on the web is very outdated and does not apply to my situation.

The measured bandwidth never exceeded 200 kb/s on atlas. So I decided to use my node as bridge and send junk traffic. Just downloading Ubnuntu and doing unnecessary uploades and running the CLI speedcheck script through the network. Now the nodes, that I am connected to, ackowledge, how much data can be send from/to my node and the bandwidth estimation on atlas suddenly goes up extremely. Beyond 500 (still not 1100 though).

I thought I fixed it and turn my dummy-traffic-script off again. Now the estimation is down again, my node is mostly unused and I'll probably turn it off soon as it is just a waste of electricity.

Apparently the bandwidth is measured in that useless 50-kilobytes-way, that the tor client does for the original setup. Well, sending 50 kilobytes to a node and measuring the time is mostly a test of latency. So the test is currently faulty, it should actually send a few megabytes, or anything that results in a few seconds of measurement. Additionally it should also send 1 byte, to measure the latency and deduct it from the other measurement.

Currently nodes in Australia, even if they have a high bandwidth fibre connection are largely disadvantaged. Only because the 'authorities' or the majority of the network is in Europe.

Please don't get me wrong. I understand that a extremely high latency is also bad. A 1 BGit/s connection is maybe not particularly useful if it has a latency of 30 seconds for each TCP packet. So latency should not go unnoticed. But you could at least be so kind and announce on your website, that nodes in Australia are not welcome and will be disadvantaged by the algorithm. Therefore people who live in areas like Australia (high bandwidth, high latency, high electricity costs) can at least be aware, that is is useless to run a node and they don't need to bother with it.

Problems:

  1. The bandwidth estimation can be largely varied by sending unnecessary junk data over the tor network.
  2. The bandwidth estimation WILL be influenced, because the network uses it as a measure to determine if a node is "good" or should be used a lot.
  3. The bandwidth estimation measures latency and not bandwidth.
  4. Nodes that don't have many other nodes near, will be marked as useless, will go unused and therefore be turned off soon.
  5. The network will convert (or has already converted) to be concentrated in one location only, which is the highly connected areas in central Europe. The high speed nodes in North America are anyway on the ignore list of most people, right?

Child Tickets

TicketStatusOwnerSummaryComponent
#21990newUse a sensible default set of bandwidth serversCore Tor/Torflow
#22741closedMake a tool that sends bandwidth to relays stuck with low measurementsCore Tor/Torflow
#22744newRemove the smaller bandwidth authority filesCore Tor/Torflow
#24674newtomBandwidth authorities should use geographically distributed bandwidth serversCore Tor/Torflow

Attachments (1)

au_relays.xlsx (16.9 KB) - added by starlight 20 months ago.
Australia relays, bw ratings

Download all attachments as: .zip

Change History (11)

comment:1 Changed 20 months ago by teor

Component: Core TorCore Tor/Torflow
Keywords: tor-bwauth added
Owner: set to tom
Version: Tor: 0.3.1.8

Hi, thanks for reporting this issue.
I agree it can be frustrating - I have also run relays in Australia.

This is a known issue with the current bandwidth authority measurement system:
https://trac.torproject.org/projects/tor/wiki/doc/BandwidthAuthorityMeasurements

We are working on fixing it: we currently have bandwidth servers in Chile and Hong Kong, and we have experimented with using a CDN as a bandwidth server.

I'll use this ticket as a master ticket for the different things we can do to fix this issue, because it explains it well.

comment:2 Changed 20 months ago by starlight

Suggest you try removing the relay bandwidth limit. Even in regions favored by the BWAuths, non-exit nodes rarely see more than 30% link utilization by Tor. While the relay may still be rated lower than it should, it may acquire enough consensus weight to attract notable traffic. One megabit is not much. The path selection algorithm seems non-linear w/r/t low-end relays, where once one breaks above a certain level traffic goes up rapidly.

comment:3 in reply to:  2 ; Changed 20 months ago by teor

Replying to starlight:

Suggest you try removing the relay bandwidth limit. Even in regions favored by the BWAuths, non-exit nodes rarely see more than 30% link utilization by Tor. While the relay may still be rated lower than it should, it may acquire enough consensus weight to attract notable traffic. One megabit is not much.

30% is actually considered overloaded in most networks. 10% is good for low packet loss and low latency.

The last time I checked, the relay bandwidth limit was not active on the network. Please feel free to double-check, and if there is some limit, open another ticket to address that issue.

The path selection algorithm seems non-linear w/r/t low-end relays, where once one breaks above a certain level traffic goes up rapidly.

This is probably due to the Guard and Fast flags. If you can find better settings, please open another ticket to tune those flags.

comment:4 in reply to:  3 Changed 20 months ago by starlight

Replying to teor:

Replying to starlight:

Suggest you try removing the relay bandwidth limit. Even in regions favored by the BWAuths, non-exit nodes rarely see more than 30% link utilization by Tor. While the relay may still be rated lower than it should, it may acquire enough consensus weight to attract notable traffic. One megabit is not much.

30% is actually considered overloaded in most networks. 10% is good for low packet loss and low latency.

My relay--with no RelayBandwidthRate--is currently loaded at 25% according to Blutmagie. Nowhere near link saturation. Ping-Plotter says packet loss is couple of packets every couple of hours and I never notice any QOS issues in my browsing as a result of the relay.

The last time I checked, the relay bandwidth limit was not active on the network. Please feel free to double-check, and if there is some limit, open another ticket to address that issue.

I am not the reporter. In the second sentence he states: "I set RelayBandwidthRate to 1100, which is around 20% of my connection." That would be the limit he explicitly set on just his relay. I stand by my opinion that this is a very low limit, regardless of the continent one operates from.

The path selection algorithm seems non-linear w/r/t low-end relays, where once one breaks above a certain level traffic goes up rapidly.

This is probably due to the Guard and Fast flags. If you can find better settings, please open another ticket to tune those flags.

At RelayBandwidthRate set to 1100 the reporter will never see the Guard flag.

My observation is based on experience running on slower and faster speed connections regarding how much utilization comes at the relay--have years of experience running relays. In particular, if you look at my namesake relay's history, you will notice much of that experience was with a painfully slow connection.

comment:5 Changed 20 months ago by starlight

I pulled all relays in Australia and mostly likely identified the relay based RelayBandwidthRate=1100.

Two observations

1) Some other relays with a similar RelayBandwidthRate value are rated 10x better, possibly due to their setting a much higher RelayBandwidthBurst setting.

2) No doubt Australia is discriminated against by the Tor bandwidth system. But I see unfortunately the relay resides in the "TPG Telecom Limited" network. No relay in that Autonomous System (AS) garners a decent rating. Appears to me this is a situation where the bandwidth authorities are performing to some degree as intended and granting a low score to relays with poor effective reachability. This extract from Blutmagie illustrates {flags, name, ASN, cntry, avg-bw, cons-wght, days-up, ver, IP}:

__fs_ xwn2                  7545 AU    80    148   3 0.10 L 61.69.170.220   9001  None  C-61-69-170-220.syd.connect.net.au
__fs_ Unnamed               7545 AU    32     56   9 0.9  L 59.102.75.64    9001  9030  59-102-75-64.tpgi.com.au
___s_ BrutalesArschloch     7545 AU    21     23  15 1.8  L 60.240.245.76   1000  1001  60-240-245-76.static.tpgi.com.au
__f__ magnetic              7545 AU     8     83   8 0.10 L 220.245.192.5   9002  None  220-245-192-5.tpgi.com.au
e____ 73rmin470rx           7545 AU     7      3   2 5.14 L 14.202.230.49   9001  None  14-202-230-49.static.tpgi.com.au
___s_ Unnamed               7545 AU     3     30  31 5.14 L 220.245.39.46   9001  9030  220-245-39-46.tpgi.com.au
e__s_ WeekliLeaks           7545 AU     3      1  45 4.23 W 110.174.43.136  444   None  110-174-43-136.static.tpgi.com.au
_____ Punani4life           7545 AU     1      1   1 4.27 L 210.185.117.161 443   9030  210-185-117-161.tpgi.com.au
_____ UbuntuCore201         7545 AU     0      6   0 1.9  L 203.221.63.34   44431 None  203-221-063-34.tpgi.com.au
___s_ ididnteditheconfig    7545 AU     0      3  14 9.11 L 110.175.89.172  9538  None  110-175-89-172.static.tpgi.com.au

Choopa looks better:

__fs_ itsarelay            20473 AU   101    384  17 8.8  L 108.61.96.230   9001  None  108.61.96.230.vultr.com
__fs_ Someone              20473 AU    94    310  18 0.10 L 45.32.245.73    9001  None  45.32.245.73.vultr.com
__fs_ HeirloomReaper       20473 AU    91    219  36 1.7  L 45.63.24.164    9001  9030  heirloom.for-no-reason.net
e_fs_ birdyExit            20473 AU    78     94   5 9.13 L 45.76.115.159   443   80    45.76.115.159.vultr.com
__fs_ huntersthompson      20473 AU    78    225   0 1.9  L 45.63.25.179    9001  None  45.63.25.179.vultr.com
__fs_ pol                  20473 AU    58    304 114 5.14 L 45.76.119.205   443   80    45.76.119.205.vultr.com
__fs_ ZcashTor0au          20473 AU    51    131   6 1.8  L 45.32.246.15    20    21    45.32.246.15
__fs_ lochland             20473 AU    43    155  25 9.11 L 45.63.26.48     9001  None  45.63.26.48.vultr.com
__fs_ IcelandicTorProject  20473 AU    41    270   3 9.11 L 45.76.112.223   9001  9030  sydney.govt.is
__fs_ MidgarSector5        20473 AU    41    128 170 5.14 L 45.32.240.31    443   80    45.32.240.31.vultr.com

I would try removing the bandwidth maximum and tuning IP before giving up. Try this in sysctl.conf:

# Increase socket limits.
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304
net.ipv4.tcp_wmem = 4096  250000  4194304
net.ipv4.tcp_rmem = 4096  375000  4194304

run sysctl -p after editing the file.

Also IMO it's not unreasonable to pull some data through a relay as you describe to increase the self-measure bandwidth to a realistic representation of available capacity. Something like a 5-or-10 minute burst of activity twice a day should be sufficient to register with the self-measure logic. Combined with the other changes this could tip the relay into a higher scanner "band" where it may receive better treatment.

Changed 20 months ago by starlight

Attachment: au_relays.xlsx added

Australia relays, bw ratings

comment:6 Changed 20 months ago by starlight

The relay does not have the "Fast" flag at present. Bandwidth Authorities will not even attempt to measure relays without Fast, so putting a 'torsocks wget' (or "curl --socks5-hostname 127.0.0.1:9150') of some decent size file in the crontab may go a long way here.

comment:7 in reply to:  6 Changed 20 months ago by teor

Replying to starlight:

The relay does not have the "Fast" flag at present. Bandwidth Authorities will not even attempt to measure relays without Fast

Bandwidth authorities attempt to measure all Running relays. But clients won't use a relay for most purposes unless it has the Fast flag.

so putting a 'torsocks wget' (or "curl --socks5-hostname 127.0.0.1:9150') of some decent size file in the crontab may go a long way here.

This works because it changes the bandwidth partition on bandwidth authorities, so if the relay can carry more traffic, they will find out faster.

comment:8 Changed 20 months ago by starlight

Prospects for improvement for nodes hosted in TPG do not look good. Hurricane Electric's looking-glass (https://lg.he.net/) shows fairly terrible latency from Australia to Singapore, to Hong Kong and to Vancouver Canada. No obvious way exists to improve the bandwidth scoring of the relay in question aside perhaps from locating bandwidth scanners and servers directly in Australia.

core1.fra1.he.net> traceroute 60.240.245.76 source-ip 216.218.252.174 numeric
  SSH@core1.fra1.he.net>traceroute 60.240.245.76 source-ip 216.218.252.174 numeric

Tracing the route to IP node (60.240.245.76) from 1 to 30 hops

  1    30 ms   18 ms   49 ms 72.52.92.13
  2    80 ms   97 ms  100 ms 184.105.81.77
  3   178 ms  167 ms  153 ms 184.105.81.213
  4   321 ms  323 ms  376 ms 64.62.194.114
  5   322 ms  328 ms  350 ms 203.221.3.65
  6   375 ms  377 ms  374 ms 203.219.57.214
  7    *       *       *     ?
  8    *       *       *     ?
  9    *       *       *     ?
 10    *       *       *     ?
IP: Errno(8) Trace Route Failed, no response from target node.

# Entry cached for another 23 seconds.

core1.sin1.he.net> traceroute 60.240.245.76 source-ip 27.50.33.9 numeric
  SSH@core1.sin1.he.net>traceroute 60.240.245.76 source-ip 27.50.33.9 numeric

Tracing the route to IP node (60.240.245.76) from 1 to 30 hops

  1    73 ms   74 ms   74 ms 184.105.64.254
  2   174 ms  175 ms  174 ms 184.105.213.117
  3   184 ms  191 ms  201 ms 184.105.223.217
  4   271 ms  275 ms  274 ms 64.62.194.114
  5   283 ms  267 ms  273 ms 203.221.3.65
  6   325 ms  349 ms  334 ms 203.219.57.150
  7    *       *       *     ?
  8    *       *       *     ?
  9    *       *       *     ?
 10    *       *       *     ?
IP: Errno(8) Trace Route Failed, no response from target node.

# Entry cached for another 24 seconds.

core1.yvr1.he.net> traceroute 60.240.245.76 source-ip 216.218.252.160 numeric
  SSH@core1.yvr1.he.net>traceroute 60.240.245.76 source-ip 216.218.252.160 numeric

Tracing the route to IP node (60.240.245.76) from 1 to 30 hops

  1     3 ms    3 ms   14 ms 184.105.64.109
  2    22 ms   27 ms   33 ms 184.105.223.217
  3   241 ms  247 ms  249 ms 64.62.194.114
  4   249 ms  250 ms  252 ms 203.221.3.66
  5   299 ms  298 ms  334 ms 203.219.57.150
  6    *       *       *     ?
  7    *       *       *     ?
  8    *       *       *     ?
  9    *       *       *     ?
IP: Errno(8) Trace Route Failed, no response from target node.

# Entry cached for another 29 seconds.

core1.hkg1.he.net> traceroute 60.240.245.76 source-ip 27.50.33.1 numeric
  SSH@core1.hkg1.he.net>traceroute 60.240.245.76 source-ip 27.50.33.1 numeric

Tracing the route to IP node (60.240.245.76) from 1 to 30 hops

  1    60 ms   64 ms   47 ms 184.105.64.130
  2   156 ms  149 ms  150 ms 184.105.213.117
  3   168 ms  160 ms  170 ms 184.105.223.217
  4   247 ms  237 ms  243 ms 64.62.194.114
  5   243 ms  249 ms  250 ms 203.221.3.1
  6   299 ms  376 ms  298 ms 203.219.57.150
  7    *       *       *     ?
  8    *       *       *     ?
  9    *       *       *     ?
 10    *       *       *     ?
IP: Errno(8) Trace Route Failed, no response from target node.

# Entry cached for another 25 seconds.

comment:9 Changed 20 months ago by starlight

The bizarre local-region latency detailed next above results from damage on December 3rd to SEA-ME-WE_3 (https://en.wikipedia.org/wiki/SEA-ME-WE_3), https://www.itnews.com.au/news/aussie-internet-suffers-as-subsea-cable-cut-again-479052.

Until the submarine Internet cable linking Australia to Singapore is restored to service bandwidth measurement of Australian Tor relays will be substantially degraded.

comment:10 Changed 14 months ago by juga

Cc: juga added
Note: See TracTickets for help on using tickets.