Totals of consensus weighs shift erratically due to some aspect of vote median behavior in the consensus. E.g. (Exit,Exit+Guard) moved 12.5% in 12 hours on 09-Jul-18 12:00 to 23:59 UTC while votes steady. Twenty percent in 56 hours with votes shifting. The behavior results in significant adjustment to the selection probability of relays with unchanged consensus weights. Please add to
Where do you have those numbers from? The current graph does not consider any relay flags at all and simply plots the total consensus weight of all relays in a vote. Do you have graphs or raw data on those events in July?
Wrote a few scripts that collect consensus data and data for my relay to comprehend Torflow. Imported into attached spreadsheet. Possibly works only with Excel. . .OO Calc chokes and dies; have not attempted Google Docs.
The relevant consensus information is certainly present in OnionOO. Constructing the scripts and spread-sheet was my avenue to noticing the behavior and thinking of the enhancement.
I'm afraid I cannot open that file. A CSV file might work better, as would a PNG or PDF.
Can you also elaborate what separate weighted lines you'd want to see in the graph? For example, I don't really understand your (Exit,Exit+Guard) notation. Which relays does this include? And does it weight the measured bandwidth by anything depending on flag combination?
I'm also copying teor on this ticket, in case they have a better intuition what modifications you have in mind for the graph they suggested in #25459 (moved).
I thought more about weighting the values (as in Relay Search), but it makes no difference for the purpose which is to see if the totals of medians continue jumping about with SBWS as presently happens with Torflow. Simply graphing the total consensus for each selection class, Exit, Guard, Middle is sufficient.
(Exit,Exit+Guard) is the total of Exit-Only and Exit+Guard flagged relays as this is the set used for choosing exits
(Guard,notExit) is the set used when selecting guards
(Guard,unflagged) is the set for selecting middle relays, is the only one where current consensus weight values make a difference
In a perfect solution each selection category would be constructed from the full set of relevant weights, but since most weights presently are either 1 or 0 it was not worth the trouble for the scripts I wrote. Perhaps Relay Search (formerly Atlas) logic can be reused here.
I'm not sure I understand the problem, or its likely cause. I am cc'ing Mike, because he has more experience with bandwidth weighting.
I'm going to ask some questions to work out what is happening. I find big blocks of text confusing, so it would help me if you'd answer after each question.
Totals of consensus weighs shift erratically due to some aspect of vote median behavior in the consensus. E.g. (Exit,Exit+Guard) moved 12.5% in 12 hours on 09-Jul-18 12:00 to 23:59 UTC while votes steady.
The consensus is created deterministically from the votes. If the votes are identical, the consensus will be identical. In particular, the consensus weights are the low-median of the votes for each relay: they can't change unless the votes change.
What is changing in the votes to change the consensus weights?
Are some authorities not voting?
Are the Bandwidth= figures in the votes actually different?
Or, are you talking about overall relay selection probability, which depends on the total consensus weight?
Do other relays start Running or stop Running?
Do some relays start or stop being Guard or Exit?
Twenty percent in 56 hours with votes shifting. The behavior results in significant adjustment to the selection probability of relays with unchanged consensus weights.
The goal of the bandwidth weighting system is to provide a set of weights that give clients equal performance, regardless of the particular relays they choose.
Maybe the load on the relay changes erratically, so its selection probability should also change?
Maybe other available relays change their performance, so this relay should get used more (or less)?
Do these erratic changes affect client performance?
Would clients perform better or worse without these erratic changes?
I thought more about weighting the values (as in Relay Search), but it makes no difference for the purpose which is to see if the totals of medians continue jumping about with SBWS as presently happens with Torflow. Simply graphing the total consensus for each selection class, Exit, Guard, Middle is sufficient.
I agree we should monitor the behaviour of each class of relays.
(Exit,Exit+Guard) is the total of Exit-Only and Exit+Guard flagged relays as this is the set used for choosing exits
No, this is the set that is currently used for choosing exits. If tor gets more exits in future, then Exit+Guard may be used as Guard.
So we shouldn't hard-code the assumption that Exit+Guard is only used as an Exit.
I'm not sure I understand the problem, or its likely cause. I am cc'ing Mike, because he has more experience with bandwidth weighting.
I'm going to ask some questions to work out what is happening. I find big blocks of text confusing, so it would help me if you'd answer after each question.
Totals of consensus weighs shift erratically due to some aspect of vote median behavior in the consensus. E.g. (Exit,Exit+Guard) moved 12.5% in 12 hours on 09-Jul-18 12:00 to 23:59 UTC while votes steady.
The consensus is created deterministically from the votes. If the votes are identical, the consensus will be identical. In particular, the consensus weights are the low-median of the votes for each relay: they can't change unless the votes change.
What is changing in the votes to change the consensus weights?
The problem I see is that, in aggregate, the median votes values selected by the consensus will, in a short span, shift around such that the total consensus value moves significantly. This would not matter if individual votes were updated as quickly as these shifts in totals, but in practice individual relays are often not updated for sometimes two and even three days. Individual relays see their consensus selection probability change by 5% or even 10% (because the denominator changes) while the absolute median for the relay (numerator) does not move at all.
In a word: anachronism
Are some authorities not voting?
Voting continues, but not consistently across the entire set of relays. SBWS likely does not suffer from this behavior.
Are the Bandwidth= figures in the votes actually different?
Per the above, some change some do not.
An easy way to think about this is cases where one of the bwauths drops out for a few hours or a day or two. The consensus total will experience a huge jump in one hour but many relay median votes do not move at all. This is the extreme case but it happens all the time without a bwauth withdraw or join event.
Or, are you talking about overall relay selection probability, which depends on the total consensus weight?
This is all about the totals moving and shifting votes that are not refreshing as quickly. Each relay class operates independently as a practical matter, and exits have the worst time if it.
Do other relays start Running or stop Running?
Relays are generally stable. It seems to me that occasionally a big operator will take down or start up a block of a dozen or so high-bandwidth nodes and this can trigger a shift, but it's not the principal cause. The "rc" columns and percentages in the CSV can be used to look for these.
Do some relays start or stop being Guard or Exit?
Possibly, but again these events are not a big problem as AFAICT.
Twenty percent in 56 hours with votes shifting. The behavior results in significant adjustment to the selection probability of relays with unchanged consensus weights.
The goal of the bandwidth weighting system is to provide a set of weights that give clients equal performance, regardless of the particular relays they choose.
Maybe the load on the relay changes erratically, so its selection probability should also change?
Again, in this situation I'm focused on consensus totals. Something about the way Torflow votes from different authorities interact results in the medians shifting wholesale while the individual votes sets appear mostly stable. I did not try to analyze the exact nature of it, figuring it would be worth the trouble only if the new system experiences this.
Maybe other available relays change their performance, so this relay should get used more (or less)?
Do these erratic changes affect client performance?
Clients use selection probability, so yes for sure. If a node's probability changes because the denominator moved, the number is still different.
Would clients perform better or worse without these erratic changes?
I believe this contributes to misrating, especially for faster relays where the offset ratios are high, +1 and above (i.e 2x the average) and could be a factor in relays overloading and seizing up as often happens. I notice this when using SSH frequently--a good session will abruptly become terrible or just freeze.
I thought more about weighting the values (as in Relay Search), but it makes no difference for the purpose which is to see if the totals of medians continue jumping about with SBWS as presently happens with Torflow. Simply graphing the total consensus for each selection class, Exit, Guard, Middle is sufficient.
I agree we should monitor the behavior of each class of relays.
(Exit,Exit+Guard) is the total of Exit-Only and Exit+Guard flagged relays as this is the set used for choosing exits
No, this is the set that is currently used for choosing exits. If tor gets more exits in future, then Exit+Guard may be used as Guard.
yes, the weights. . .haven't fully wrapped my mind around how it all works
So we shouldn't hard-code the assumption that Exit+Guard is only used as an Exit.
Must eat my words, clicked the XLSX by accident and--surprise--OO Calc opens it fine though color is absent in the graphs and X-axis labels are messed up. Have a Excel in another ticket with some newfangled "table objects" and OO burns to the ground on that one.
I just closed the last remaining child ticket. I didn't go through all comments above, but I'll assume that 14 months inactivity means that nothing else remains to be done here. If I'm wrong, please re-open and give a summary what work remains. Thanks! Closing.
Trac: Resolution: N/Ato implemented Status: new to closed