Opened 4 months ago

Last modified 4 months ago

#26675 new defect

Incorrect ancillory data presented on Torflow guard relay "update" vote lines

Reported by: starlight Owned by: tom
Priority: Medium Milestone:
Component: Core Tor/Torflow Version:
Severity: Minor Keywords:
Cc: mikeperry, teor Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I realize little appetite exists for correcting Torflow issues at this juncture, but I came across an inexplicable behavior that is either my inability to unravel the code or a bug serious enough to merit fixing even at this stage of the new scanner development.

Investigating the scenario where the current measurement of a Guard relay is disregarded and instead the most recent voted measurement revised, I found this:

first archive appearance of new measurement vote

https://bwauth.ritter.vg/bwauth/bwscan.20180629-0645

1530271972
node_id=$4F0DB7E687FC7C0AE55C8F243DA8B0EB27FBF1F2 bw=11500 nick=Binnacle measured_at=1530271632 updated_at=1530271632 pid_error=0.34411159803 pid_error_sum=0.34411159803 pid_bw=11467418 pid_delta=-0.0875088625329 circ_fail=0.0 scanner=/scanner.5/scan-data/bws-51.6:52.4-done-2018-06-29-06:27:12

@type server-descriptor 1.0
router Binnacle 108.53.208.157 443 0 80
published 2018-06-29 00:29:07
fingerprint 4F0D B7E6 87FC 7C0A E55C 8F24 3DA8 B0EB 27FB F1F2
bandwidth 1073741824 1073741824 8531597

first subsequent no-measurement vote update

https://bwauth.ritter.vg/bwauth/bwscan.20180701-0545

1530441163
node_id=$4F0DB7E687FC7C0AE55C8F243DA8B0EB27FBF1F2 bw=20100 nick=Binnacle measured_at=1530271632 updated_at=1530439657 pid_error=0.34411159803 pid_error_sum=0.34411159803 pid_bw=11467418 pid_delta=-0.0875088625329 circ_fail=0.0 scanner=/scanner.4/scan-data/bws-38.6:39.4-done-2018-07-01-05:07:37

@type server-descriptor 1.0
router Binnacle 108.53.208.157 443 0 80
published 2018-06-30 12:30:08
fingerprint 4F0D B7E6 87FC 7C0A E55C 8F24 3DA8 B0EB 27FB F1F2
bandwidth 1073741824 1073741824 9458512

reading aggregate.py it appears this case his handled by lines 678-688 with kp=1 ki=0 and kd=0:

  # Don't use feedback here, but we might as well use our
  # new measurement against the previous vote.

  n.copy_vote(prev_votes.vote_map[n.idhex])

  if cs_junk.use_desc_bw:
    n.new_bw = n.get_pid_bw(prev_votes.vote_map[n.idhex],
                        cs_junk.K_p,
                        cs_junk.K_i,
                        cs_junk.K_d,
                        0.0, False)


  def get_pid_bw(self, prev_vote, kp, ki, kd, kidecay, update=True):
    if not update:
      return self.use_bw \
                  + kp*self.use_bw*self.pid_error \
                  + ki*self.use_bw*self.pid_error_sum \
                  + kd*self.use_bw*self.pid_delta

line 591
  n.use_bw = n.desc_bw

line 556
  prev_votes = VoteSet(argv[-1])
line 205
  try:
    self.pid_error = float(re.search("[\s]*pid_error=([\S]+)[\s]*", line).group(1))

  def copy_vote(self, vote):
    self.new_bw = vote.bw*1000 # Not set yet
    self.pid_bw = vote.pid_bw  # Not set yet
    self.pid_error_sum = vote.pid_error_sum # Not set yet
    self.pid_delta = vote.pid_delta # Not set yet

My reading indicates new_bw is calculated by applying the last vote pid_error to the current descriptor bandwidth. Comment appears misleading, but the logic seems sensible enough.

Problem is that 9458512 + 9458512 * 0.34411159803 is 12713,296 rather than the published value of 20100. If my understanding of the code is correct, the actual behavior here does not reflect the intended behavior and the result is egregiously incorrect. I suspect a bug, but the code is complex and in the absence of debugging a live Torflow instance my analysis could be wrong.

Perhaps Mike Perry could chime in here?

Child Tickets

Change History (4)

comment:1 Changed 4 months ago by teor

Cc: mikeperry added

Even if we fixed this issue, it's unlikely we could test and deploy torflow to a majority of bwauths before sbws is deployed.

comment:2 Changed 4 months ago by teor

Cc: teor added; teor@… removed

Shorten useful CCs

comment:3 Changed 4 months ago by starlight

Severity: NormalMinor
Summary: possible server bug in Torflow guard "update" vote calculationIncorrect ancillory data presented on Torflow guard relay "update" vote lines

So this is not a serious problem.

I overlooked that copy_vote does not overwrite pid_error and that lines 702-704

# Reset the remaining vote data..
n.measured_at = prev_votes.vote_map[n.idhex].measured_at
n.pid_error = prev_votes.vote_map[n.idhex].pid_error

do overwrite pid_error and measured_at after lines 684-688 calculate a complete new vote from current scan information. Had reviewed with an assumption, based on observed vote records, that votes in this situation are revisions to earlier votes rather than fresh ones.

When use_pid_tgt is 1 vote revision appears to be what occurs, but current operation with use_pid_tgt is 0 produces full vote calculations while explicitly removing all calculation fields aside from new_bw, reverting to earlier values up to two weeks old.

The behavior has no negative effect on the critical bw= (new_bw) value, but the other fields show incorrect data. Behavior is a defect though of minor consequence.

This issue could be closed WONTFIX in light of plans to retire Torflow, but it might make sense to repair the glitch and deploy to one BWauth for improved usefulness of vote files in support of validation and comparison work for the new system.

Likely can be repaired by adding an exclusion to the "is this a guard" test when use_pid_tgt is not in effect.

comment:4 in reply to:  3 Changed 4 months ago by starlight

Replying to starlight:

. . . but it might make sense to repair the glitch and deploy to one BWauth for improved usefulness of vote files in support of validation and comparison work for the new system.

I should add that if aggregate.py is touched for the purpose of supporting comparison work, even more value would accrue from copying scanner input values to vote lines--a minor adjustment on line 884. By this I mean

strm_bw
filt_bw
ns_bw
desc_bw

Would also be helpful to know which of filt_bw or strm_bw was selected in each vote.

Note: See TracTickets for help on using tickets.