Opened 6 months ago

Last modified 5 months ago

#33947 needs_review defect

Compare sbws and Torflow

Reported by: juga Owned by:
Priority: Medium Milestone: sbws: 1.1.x-final
Component: Core Tor/sbws Version: sbws: 1.1.0
Severity: Normal Keywords: sbws-roadmap, GeorgKoppen202006
Cc: juga, gk Actual Points:
Parent ID: #33121 Points:
Reviewer: gk Sponsor:

Description

gk and i were talking about what to review in #30375 and we thought it'd be useful to create a ticket to check whether the bugfixes we have been working on (https://trac.torproject.org/projects/tor/query?keywords=~sbws-roadmap&status=closed) to deploy sbws in all bwauths are working, ie. making sbws to behave very close to Torflow.

I think we should document what to check, where/how to check it and which ticket(s) intended to fix it.

I also think we should add this as documentation in sbws itself because they're important questions that have been blockers to deploy sbws in all bwauths and to avoid regressions in the future.

Some of the main things to check that should be further explained are:

  • whether sbws "failures" "low" (#30719)
  • whether the number of relays to vote on reported by sbws "similar" to the number of relays reported by torflow (#30727, #30735)
  • whether sbws relay descriptors are updated (#30733)
  • whether sbws router statuses (relay info. from consensus) are updated (#30733)
  • whether sbws consensus bandwidth total sum is similar to torflow (#33871, #33009, #33350)?
  • whether changes in a relay consensus bandwidth affect in a similar way as torflow (#33871)

Child Tickets

TicketStatusOwnerSummaryComponent
#33198newCheck changes related to descriptors in a bandwidth file created by a bwauth before next releaseCore Tor/sbws
#33350newIs sbws weighting some relays too high?Core Tor/sbws

Attachments (2)

bw_comparative_extra.ods (960.7 KB) - added by juga 5 months ago.
20200523_bw_comparative_10.csv (837.6 KB) - added by juga 5 months ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 6 months ago by gk

Keywords: GeorgKoppen202005 added

comment:2 Changed 5 months ago by juga

Something have not commented here is to check is how individual bandwidth values differ from one authority to other.
I think this is useful for #33871, so i'm including it here.
For that, i've implemented a new function in https://gitlab.torproject.org/juga/bwauthealth.
I obtained the bw of all bwauths for the consensus on 25.05.2020, then in libre office i calculated the median of the torflow bwauths, then calculated the percentage difference with longclaw and maatuska (right before maatuska changed to last sbws version) and finally counted the number of relays for which the percentage difference is greater than 50.
This is 626 relays for lonclaw.
Then i counted how many of the relays with a percentage difference greater than 50 are greater than 1.
This is 151 for longclaw. The cases in which longclaw (but not Torflow) has bw 1 is probably because not having the descriptors and consensus bandwidth.
I attach the libre office file.
I think it would be interesting also to see if it's actually the case that those bw are 1 cause of missing descriptors/consensus.
I could have include this calculations as part of the code too, but for now, it was faster this way.
I leave for #33350 the case of maatuska and the case in which the sbws bws are greater than Torflow.

Changed 5 months ago by juga

Attachment: bw_comparative_extra.ods added

comment:3 Changed 5 months ago by juga

Following the previous reasoning and data, i implemented other function to find over/under-weighted relays.
I created a modified version of the previous function.
I attach a new csv file and comment in this ticket due to repeated data and comments.
This is what i found:

Overweighted relays (#33350)


  • maatuska was still overweighting 346 relays while sbws ony 3
  • the 3 sbws cases are due to mising the descriptor average bandwidth
  • all the maatuska cases are missing descriptor observed bandwidth except for one relay that have higher conensus and observed bandwidth than longclaw, probably cause the stored descriptor was old

Reasons why i think this happend:

  • maatuska version was not updating correctly descriptors (#30733, #33570) what was making skip scaling for many relays that were still included in the bandwidth file to vote (#33832, this would be solved by a commit in #33832)
  • rounding before limiting to the maximum was probably a problem too (would be solved by a commit in #33832 too)
  • these errors were accumulating until we got more torflows cause final bandwidth depends on the previous consensus bandwidth

Underweighted relays (#33775)


https://trac.torproject.org/projects/tor/ticket/33775

  • maatuska was still underweighting 70 relays while longclaw none
  • it seems that the consensus bandwidth maatuska has for those relays is lower than sbws

Reasons:

  • maybe cause using exits with low capacity (#33009)
  • again not updating often/correctly descriptors and keeping previous low values
  • and making them accumulate

Regarding the total consenus weight, it can be seen here now all bwauths are close: https://metrics.torproject.org/totalcw.png?start=2019-12-18

There is no code to review but someone should check whether my reasoning makes sense ;)

From the list of things to check in this ticket, it would be still missing #33198

Changed 5 months ago by juga

comment:4 Changed 5 months ago by gk

Keywords: GeorgKoppen202006 added; GeorgKoppen202005 removed

Moving my tickets.

comment:5 Changed 5 months ago by gk

Reviewer: gk
Status: newneeds_review
Note: See TracTickets for help on using tickets.