Opened 7 years ago

Closed 7 years ago

Last modified 5 years ago

#2704 closed defect (fixed)

BandwidthObserved higher than BandwidthBurst

Reported by: keb Owned by: Sebastian
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: Tor: 0.2.2.23-alpha
Severity: Keywords: 0.2.2.23-alpha bandwidth observed tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

got this line looking at my relay:
Bandwidth (Max/Burst/Observed - In Bps): 340992 / 1022976 / 30676070
1h20 after an upgrading from 0.2.2.22-alpha to 0.2.2.23-alpha

Child Tickets

Attachments (1)

state-20110310 (16.7 KB) - added by keb 7 years ago.
state file from tor data dir

Download all attachments as: .zip

Change History (30)

Changed 7 years ago by keb

Attachment: state-20110310 added

state file from tor data dir

comment:1 Changed 7 years ago by arma

Sure enough, BWHistoryWriteMaxima has values like 30689280. Are we writing them wrong, or using them wrong?

comment:2 Changed 7 years ago by nickm

Milestone: Tor: 0.2.2.x-final
Version: Tor: 0.2.2.23-alpha

Roger suspects that the new remember-your-observed-bandwidth stuff in rephist.c in 0.2.2.23-alpha may be implicated.

comment:3 Changed 7 years ago by nickm

The functions to scrutinize are rep_hist_update_bwhist_state_section, and rep_hist_load_bwhist_state_section , both in rephist.c .

comment:4 Changed 7 years ago by keb

i should mention the 1st report line above is from
http://torstatus.all.de/router_detail.php?FP=D3ED911183DF31FDB744A22F94BE6A856AC77D12
but when i got some output from blutmagie's status page a few hours later, it looks ok
http://torstatus.blutmagie.de/router_detail.php?FP=d3ed911183df31fdb744a22f94be6a856ac77d12

so it is a case of one website misinterpreting the units in which that field is stored in the database?

comment:5 Changed 7 years ago by arma

    maxval = b->maxima[i] / NUM_SECS_ROLLING_MEASURE;

implies that maxval is a per-second amount, whereas b->maxima is a per-10-second amount. So the Maxima we write to the state file is a per-second amount.

Then when we read it, we

       v = tor_parse_uint64(cp, 10, 0, UINT64_MAX, &ok, NULL);
        if (have_maxima) {
          maxstr = smartlist_get(s_maxima, cp_sl_idx);
          mv = tor_parse_uint64(maxstr, 10, 0, UINT64_MAX, &ok_m, NULL);
          mv *= NUM_SECS_ROLLING_MEASURE;
        } else {
          /* No maxima known; guess average rate to be conservative. */
          mv = v / s_interval;
        }

and then

          b->max_total = mv;

where max_total is also a per-10-second amount. So

          mv = v / s_interval;

ought to be

          mv = v / s_internal * NUM_SECS_ROLLING_MEASURE;

But that's not our bug, since it would produce smaller numbers not larger numbers. But it is a bug.

comment:6 Changed 7 years ago by arma

Also, why is the argument for rep_hist_load_bwhist_state_section() called s_begins yet the state line is called "BWHistoryWriteEnds"? That sounds like a recipe for more confusion.

comment:7 Changed 7 years ago by arma

This could be it:

          add_obs(b, start, v);

We are reading in a per-900-second value, and then telling our rephist system that at second 'start', we saw all of those bytes get written. I suspect this causes the per-10-second calculation to be 1/10 of the per-900-second value.

comment:8 Changed 7 years ago by arma

moria1 has funny-looking values here too:

BWHistoryReadValues 536190976,409089024,532183040,497468416,...
BWHistoryReadMaxima 53618688,40908800,53218304,49745920,...

Notice the pattern?

comment:9 Changed 7 years ago by keb

(The following redundant note is added to catch user searches.)
Vidalia "Tor Bandwidth Usage" chart is affected by this issue. The vertical scale is much larger than the actual transfer numbers, so the indicator hardly leaves the floor.

comment:10 Changed 7 years ago by arma

Priority: normalmajor

comment:11 Changed 7 years ago by arma

Status: newneeds_review

One way to resolve this one for 0.2.2 is to revert the patch that caused it.

Then we could reschedule that patch for 0.2.3, but not merge it in yet because of this bug.

comment:12 Changed 7 years ago by mikeperry

Hrmm, I hope we're not using int32_t's anywhere related to this. We're approaching maxint observed bandwidth on several relays. I guess we'll find out how robust all associated tor and non-tor code is soon :).

comment:13 Changed 7 years ago by Sebastian

I think part of the analysis here is wrong. Specifically, the suggestion to do

          mv = v / s_internal * NUM_SECS_ROLLING_MEASURE;

is good, but doesn't influence the other bug this ticket is about, because the above fix is only relevant when we don't know the maximum. If we know the maximum, we already do

          mv *= NUM_SECS_ROLLING_MEASURE;

comment:14 Changed 7 years ago by Sebastian

Owner: set to Sebastian
Status: needs_reviewassigned

Oh, also I think we should not just revert the patch. I have a fix that seems to work in local testing, cleaning it up to add it here.

comment:15 Changed 7 years ago by Sebastian

Status: assignedneeds_review

bug2704 in my repository

comment:16 Changed 7 years ago by nickm

We ran into a bug on IRC:

18:56 < nickm> hang on. That loop, where we do "while (start < now)". Is that

right? If this interval is not the most recent one, then won't
we add the observation far too many times?

and a simplification:

18:57 < nickm> also you can simplify the overflow logic a lot
18:57 < nickm> instead of overflow, have, say, "actual_interval_len". Set it

to either s_interval or to now-start.

18:57 < nickm> then compute cur = v / actual_interval_len

comment:17 Changed 7 years ago by Sebastian

pushed an update. tested in private network again. Thanks!

comment:18 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Thanks! Rebased (for the fixup) & merged.

comment:19 Changed 7 years ago by nickm

Resolution: fixed
Status: closedreopened

comment:20 Changed 7 years ago by nickm

Apparently arma says this isn't fixed for him on moria1, and that he's talked about it with sebastian. Somebody please post here?

comment:21 Changed 7 years ago by Sebastian

moria1's current descriptor says bandwidth 512000 10485760 1306223, with the burst well within above the max observed.

Looking at the past days, moria reported very high values shortly after starting up, and the values then dropped. This didn't appear to change significantly with the patch vs without it.

comment:22 Changed 7 years ago by Sebastian

I can't really reproduce the issue with fluxe3, however: I just restarted it (on 0.2.2.24-alpha), and in its descriptor it said bandwidth 921600 10240000 83463884 - so we don't have an inflated value here. hrm.

comment:23 Changed 7 years ago by Sebastian

Eh, disregard the comment above. I guess my maths sucks a little more than usual today

comment:24 Changed 7 years ago by Sebastian

Here's another bug in rep_hist_update_bwhist_state_section(), I think:

  tor_asprintf(&cp, U64_FORMAT, U64_PRINTF_ARG(b->max_total & ~0x3ff));
  smartlist_add(*s_maxima, cp);
}}
It seems we need to divide b->max_total by NUM_SECS_ROLLING_MEASURE, no?

Also, this following change from 105b94b7 seems to be strange to me, but I can't put my finger on why so far.
{{{
   /* Clean up maxima and observed */
-  /* Do we really want to zero this for the purpose of max capacity? */
   for (i=0; i<NUM_SECS_ROLLING_MEASURE; ++i) {
     b->obs[i] = 0;
   }
   b->total_obs = 0;
-  for (i=0; i<NUM_TOTALS; ++i) {
-    b->maxima[i] = 0;
-  }
-  b->max_total = 0;
}}}

comment:25 Changed 7 years ago by Sebastian

Status: reopenedneeds_review

ok, so I think we should definitely fix that bug soon. This does lead to inflated Maxima history in the state file. bug2704 for the patch.

comment:26 Changed 7 years ago by Sebastian

thanks for the merge, and sorry for the silly compile error.

comment:27 Changed 7 years ago by arma

Resolution: fixed
Status: needs_reviewclosed

I'm calling this one fixed. This time for sure! Thanks.

comment:28 Changed 5 years ago by nickm

Keywords: tor-relay added

comment:29 Changed 5 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.