Opened 2 months ago

Last modified 3 days ago

#28563 new defect

Work out how sbws can report excluded relays in the bandwidth file

Reported by: teor Owned by:
Priority: Medium Milestone: sbws: 1.1.x-final
Component: Core Tor/sbws Version:
Severity: Normal Keywords: tor-bwauth, sbws-1.0-must-moved-20181128
Cc: pastly, juga Actual Points:
Parent ID: #28547 Points:
Reviewer: Sponsor:

Description

If we report excluded relays in the bandwidth file, then they will be publicly archived and available for anyone to analyse.

We just need to work out a syntax that makes tor ignore excluded relays.

Child Tickets

Change History (8)

comment:1 Changed 8 weeks ago by juga

I see some inconvenients to do this:

  • once we figure out why are being relays excluded, we might not want to keep the same format.
  • we need to wait until longclaw update to the code that publish the files
  • it'd add like around 1000 extra relays with some extra data, though this might not be a problem.
  • the delay that implies creating the spec before

I was thinking either on something temporal:

  1. produce a different file, with the relays excluded and useful data*
  2. implement other script to dump the data to a DB. It sounds kind of crazy, but it might not be much work and from that it's easier to make queries

While currently it's only me accessing to the original data, i can publish the results of that.

*Currently relays excluded, can be because:

  • circuits timeout, this is already in the raw results file
  • when scaling, doesn't find 2 measurements that are at least 1 day away and 5 days recent
  • something else we don't know yet

What about if i try one of the two other approaches before?. Otherwise i'm fine with this.

comment:2 in reply to:  1 Changed 8 weeks ago by teor

Replying to juga:

I see some inconvenients to do this:

  • once we figure out why are being relays excluded, we might not want to keep the same format.

I'm not sure what you mean here.
Do you think we'll change the bandwidth file format?
I expect that we'll add extra keys to the relay lines that count the relays excluded at each stage. Then we will add more keys for the stages that exclude a lot of relays.

  • we need to wait until longclaw update to the code that publish the files

Or you or micah can sync the file to a public web server, like people.torproject.org.
Most of the other directory authority operators sync their bandwidth files somewhere public.

  • it'd add like around 1000 extra relays with some extra data, though this might not be a problem.

I don't think it's a problem.

  • the delay that implies creating the spec before

I'm not sure what you mean here.
Are you concerned that the spec will take too much time?
It is ok to try a few things in the code, then update the spec.
And I can work on the spec next week.

I was thinking either on something temporal:

  1. produce a different file, with the relays excluded and useful data*
  2. implement other script to dump the data to a DB. It sounds kind of crazy, but it might not be much work and from that it's easier to make queries

While currently it's only me accessing to the original data, i can publish the results of that.

*Currently relays excluded, can be because:

  • circuits timeout, this is already in the raw results file
  • when scaling, doesn't find 2 measurements that are at least 1 day away and 5 days recent
  • something else we don't know yet

Let's make a pad and list all the reasons from the code:
https://pad.riseup.net/p/sbws-exclude-reasons-keep

I think there are a lot more, see the children of #28547.

What about if i try one of the two other approaches before?. Otherwise i'm fine with this.

Relay operators, authority operators, and developers need to be able to find out why a relay isn't being measured.

If we put that information in the bandwidth file, authorities serve the bandwidth file, and metrics archives it, then the information is public and available.

Your other options are good for you to do a quick analysis. But they do not help everyone else.

comment:3 Changed 8 weeks ago by teor

To complete this ticket, we should work out how to make all supported versions of tor ignore a bandwidth file line. We should put the excluded relays at the end of the file, so they are not put in the header in the consensus.

Here are the things we should test:

  • something that stops tor parsing the file
  • no bw= key
  • a bw=0 key

comment:4 Changed 7 weeks ago by teor

Keywords: sbws-1.0-must-moved-20181128 added
Milestone: sbws 1.0 (MVP must)sbws 1.0.4

Moving all sbws 1.0 must planning and feature tickets to 1.0.4.

comment:5 Changed 7 weeks ago by teor

Milestone: sbws 1.0.4sbws 1.1

Milestone renamed

comment:6 Changed 7 weeks ago by teor

Milestone: sbws 1.1sbws: 1.1.x

Milestone renamed

comment:7 Changed 7 weeks ago by teor

Milestone: sbws: 1.1.xsbws: 1.1.x-final

Milestone renamed

comment:8 Changed 3 days ago by teor

Long-term, we might want to put excluded relays in their own section, after a section terminator =====.

Note: See TracTickets for help on using tickets.