Opened 3 years ago

Closed 2 years ago

#18688 closed defect (wontfix)

Relays should disable DirPort if RelayBandwidthRate is less than 50kb/s

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-03-unspecified-201612
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

While I was checking fallback directory mirrors for #17158, I encountered some relays that took more than a minute to serve a consensus.

Most took 150 seconds, which could be caused by a RelayBandwidthRate of 10 kilobytes a second.

I suggest that we disable the DirPort on relays with a RelayBandwidthRate less than 50 kilobytes a second (30s to serve a consensus).

This is an incomplete list, starting with those with the highest consensus weight:
217.198.117.122:80
212.47.250.44:80
158.69.112.86:80
50.7.178.34:80
191.101.251.172:80
51.254.249.177:80
188.165.232.40:80
104.236.38.231:8080
89.163.225.184:9030
185.31.230.69:9030
81.7.14.227:9030
62.210.238.33:9030
164.132.56.137:9030
212.107.149.145:9030
94.23.165.33:9031

I'm using this python function from scripts/maint/updateFallbackDirs.py (not yet merged to master) to find them:

from stem.descriptor.remote import DescriptorDownloader
def fallback_consensus_dl_speed(dirip, dirport, nickname, max_time):
  download_failed = False
  downloader = DescriptorDownloader()
  start = datetime.datetime.utcnow()
  # some directory mirrors respond to requests in ways that hang python
  # sockets, which is why we long this line here
  logging.info('Initiating consensus download from %s (%s:%d).', nickname,
               dirip, dirport)
  # there appears to be about 1 second of overhead when comparing stem's
  # internal trace time and the elapsed time calculated here
  TIMEOUT_SLOP = 1.0
  try:
    downloader.get_consensus(endpoints = [(dirip, dirport)],
                             timeout = (max_time + TIMEOUT_SLOP)).run()
  except Exception, stem_error:
    logging.info('Unable to retrieve a consensus from %s: %s', nickname,
                 exc)
    download_failed = True
  elapsed = (datetime.datetime.utcnow() - start).total_seconds()
  if elapsed > max_time:
    status = 'too slow'
    level = logging.WARNING
    download_failed = True
  else:
    status = 'ok'
    level = logging.DEBUG
  logging.log(level, 'Consensus download: %0.1fs %s from %s (%s:%d), ' +
               'max download time %0.1fs.', elapsed, status, nickname,
               dirip, dirport, max_time)
  return download_failed

Child Tickets

Change History (9)

comment:1 Changed 3 years ago by tscpd

We recommend lately to run a Relay if contributor has a min of 250 kilobytes/s each direction: https://www.torproject.org/docs/tor-relay-debian.html.en

Lower may should prefer being bridges?

comment:2 Changed 3 years ago by arma

Most of the highest weighted ones on you're list aren't up and haven't been up for days?

But the first one I found, 50.7.178.34, says

bandwidth 1073741824 1073741824 8115187

Are there relays that have high (or at least, non trivial) consensus weight but tiny relaybandwidthrate? I would be surprised.

comment:3 in reply to:  2 ; Changed 3 years ago by teor

Replying to arma:

Most of the highest weighted ones on you're list aren't up and haven't been up for days?

Maybe stem or my script is reporting success, when it's actually timed out?

But the first one I found, 50.7.178.34, says

bandwidth 1073741824 1073741824 8115187

Are there relays that have high (or at least, non trivial) consensus weight but tiny relaybandwidthrate? I would be surprised.

Yes, that's what was worrying me. It seems like really strange behaviour. I was hoping someone would have time to follow it up.

comment:4 in reply to:  3 ; Changed 3 years ago by arma

Replying to teor:

Maybe stem or my script is reporting success, when it's actually timed out?

Maybe. How are you choosing which ones to try to connect to?

Are there relays that have high (or at least, non trivial) consensus weight but tiny relaybandwidthrate? I would be surprised.

Yes, that's what was worrying me. It seems like really strange behaviour. I was hoping someone would have time to follow it up.

No, I meant "because I have not seen any yet, including the ones I spot-checked from your list". Those relays had enormous relaybandwidthrate or did not set it at all.

comment:5 in reply to:  4 Changed 3 years ago by teor

Replying to arma:

Replying to teor:

Maybe stem or my script is reporting success, when it's actually timed out?

Maybe. How are you choosing which ones to try to connect to?

Based on their uptime history in OnionOO.
I've fixed that to use the Running flag in the current consensus.
That still leaves:

89.163.225.184:9030 - mertadx - No ContactInfo - 92139E58A2FD1235A9AC02B1E1E87174FB80301B
185.31.230.69:9030 - AskForEraser - cc'd - E58CECC2ED31B33867D30707CDB239C85B38FFF2
81.7.14.227:9030 - Torwell01 - cc'd - BCA197C43A44B7B9D14509637F96A45B13C233D0
62.210.238.33:9030 - ORGNorthEast1 - cc'd - FDF845FC159C0020E2BDDA120C30C5C5038F74B4
212.107.149.145:9030 - flimsy - No ContactInfo - A4B99A72464F955F7EFFB5DD968B53DD450C7FB4

Are there relays that have high (or at least, non trivial) consensus weight but tiny relaybandwidthrate? I would be surprised.

Yes, that's what was worrying me. It seems like really strange behaviour. I was hoping someone would have time to follow it up.

No, I meant "because I have not seen any yet, including the ones I spot-checked from your list". Those relays had enormous relaybandwidthrate or did not set it at all.

I've emailed some of the operators and cc'd tor-relays to try and work this out.

comment:6 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:7 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:8 Changed 2 years ago by arma

Oh, I see what's going on here. You're actually fetching the consensus, and seeing how long it takes, and sad that some relays are slow.

This slowness is unlikely to be because of RelayBandwidthRate. And by unlikely, I mean it isn't. Specifically, I spot checked all of the relays you described, and for zero of them it was because of RelayBandwidthRate. I know this because relays put their bandwidthrates in their descriptors.

So what you really meant here was "relays should disable their dirport if they're overloaded". But overloaded is relative, and time based, and not something that relays have a chance to notice right now.

That said, relays do aim to return a 503 in responsive to a consensus fetch, when they know their rate limits are going to cause them to run out of bytes in the token bucket.

But probably the really slow relays are the ones who have set their bandwidthrates too *high*, meaning they are experiencing all sorts of messy congestion at the network layer because they're trying to talk too much.

I'm tempted to close this ticket with reason "you misunderstood what relaybandwidthrate is". Plausible?

comment:9 Changed 2 years ago by teor

Resolution: wontfix
Status: newclosed

I don't think there is anything to fix here: we already have a lower limit on relay bandwidth.

Note: See TracTickets for help on using tickets.