BandwidthObserved higher than BandwidthBurst

changed milestone to %Tor: 0.2.2.x-final

added 0.2.2.23-alpha bandwidth component::core tor/tor milestone::Tor: 0.2.2.x-final observed owner::Sebastian priority::high reporter::keb resolution::fixed status::closed tor-relay type::defect version::tor 0.2.2.23-alpha labels

Trac:
Username: keb

state-20110310

state file from tor data dir

Trac:
Username: keb

Sure enough, BWHistoryWriteMaxima has values like 30689280. Are we writing them wrong, or using them wrong?

Roger suspects that the new remember-your-observed-bandwidth stuff in rephist.c in 0.2.2.23-alpha may be implicated.

Trac:
Version: N/A to Tor: 0.2.2.23-alpha
Milestone: N/A to Tor: 0.2.2.x-final

The functions to scrutinize are rep_hist_update_bwhist_state_section, and rep_hist_load_bwhist_state_section , both in rephist.c .

i should mention the 1st report line above is from http://torstatus.all.de/router_detail.php?FP=D3ED911183DF31FDB744A22F94BE6A856AC77D12 but when i got some output from blutmagie's status page a few hours later, it looks ok http://torstatus.blutmagie.de/router_detail.php?FP=d3ed911183df31fdb744a22f94be6a856ac77d12

so it is a case of one website misinterpreting the units in which that field is stored in the database?

Trac:
Username: keb

    maxval = b->maxima[i] / NUM_SECS_ROLLING_MEASURE;

implies that maxval is a per-second amount, whereas b->maxima is a per-10-second amount. So the Maxima we write to the state file is a per-second amount.

Then when we read it, we

       v = tor_parse_uint64(cp, 10, 0, UINT64_MAX, &ok, NULL);
        if (have_maxima) {
          maxstr = smartlist_get(s_maxima, cp_sl_idx);
          mv = tor_parse_uint64(maxstr, 10, 0, UINT64_MAX, &ok_m, NULL);
          mv *= NUM_SECS_ROLLING_MEASURE;
        } else {
          /* No maxima known; guess average rate to be conservative. */
          mv = v / s_interval;
        }

and then

          b->max_total = mv;

where max_total is also a per-10-second amount. So

          mv = v / s_interval;

ought to be

          mv = v / s_internal * NUM_SECS_ROLLING_MEASURE;

But that's not our bug, since it would produce smaller numbers not larger numbers. But it is a bug.

Also, why is the argument for rep_hist_load_bwhist_state_section() called s_begins yet the state line is called "BWHistoryWriteEnds"? That sounds like a recipe for more confusion.

This could be it:

          add_obs(b, start, v);

We are reading in a per-900-second value, and then telling our rephist system that at second 'start', we saw all of those bytes get written. I suspect this causes the per-10-second calculation to be 1/10 of the per-900-second value.

moria1 has funny-looking values here too:

BWHistoryReadValues 536190976,409089024,532183040,497468416,...
BWHistoryReadMaxima 53618688,40908800,53218304,49745920,...

Notice the pattern?

(The following redundant note is added to catch user searches.) Vidalia "Tor Bandwidth Usage" chart is affected by this issue. The vertical scale is much larger than the actual transfer numbers, so the indicator hardly leaves the floor.

Trac:
Username: keb

Trac:
Priority: normal to major

One way to resolve this one for 0.2.2 is to revert the patch that caused it.

Then we could reschedule that patch for 0.2.3, but not merge it in yet because of this bug.

Trac:
Status: new to needs_review

Hrmm, I hope we're not using int32_t's anywhere related to this. We're approaching maxint observed bandwidth on several relays. I guess we'll find out how robust all associated tor and non-tor code is soon :).

I think part of the analysis here is wrong. Specifically, the suggestion to do

          mv = v / s_internal * NUM_SECS_ROLLING_MEASURE;

is good, but doesn't influence the other bug this ticket is about, because the above fix is only relevant when we don't know the maximum. If we know the maximum, we already do

          mv *= NUM_SECS_ROLLING_MEASURE;

Oh, also I think we should not just revert the patch. I have a fix that seems to work in local testing, cleaning it up to add it here.

Trac:
Owner: N/A to Sebastian
Status: needs_review to assigned

bug2704 in my repository

Trac:
Status: assigned to needs_review

We ran into a bug on IRC:

18:56 < nickm> hang on. That loop, where we do "while (start < now)". Is that right? If this interval is not the most recent one, then won't we add the observation far too many times?

and a simplification:

18:57 < nickm> also you can simplify the overflow logic a lot 18:57 < nickm> instead of overflow, have, say, "actual_interval_len". Set it to either s_interval or to now-start. 18:57 < nickm> then compute cur = v / actual_interval_len

pushed an update. tested in private network again. Thanks!

Thanks! Rebased (for the fixup) & merged.

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

Trac:
Status: closed to reopened
Resolution: fixed to N/A

Apparently arma says this isn't fixed for him on moria1, and that he's talked about it with sebastian. Somebody please post here?

moria1's current descriptor says bandwidth 512000 10485760 1306223, with the burst well within above the max observed.

Looking at the past days, moria reported very high values shortly after starting up, and the values then dropped. This didn't appear to change significantly with the patch vs without it.

I can't really reproduce the issue with fluxe3, however: I just restarted it (on 0.2.2.24-alpha), and in its descriptor it said bandwidth 921600 10240000 83463884 - so we don't have an inflated value here. hrm.

Eh, disregard the comment above. I guess my maths sucks a little more than usual today

Here's another bug in rep_hist_update_bwhist_state_section(), I think:

  tor_asprintf(&cp, U64_FORMAT, U64_PRINTF_ARG(b->max_total & ~0x3ff));
  smartlist_add(*s_maxima, cp);
}}
It seems we need to divide b->max_total by NUM_SECS_ROLLING_MEASURE, no?

Also, this following change from 105b94b7 seems to be strange to me, but I can't put my finger on why so far.
{{{
   /* Clean up maxima and observed */
-  /* Do we really want to zero this for the purpose of max capacity? */
   for (i=0; i<NUM_SECS_ROLLING_MEASURE; ++i) {
     b->obs[i] = 0;
   }
   b->total_obs = 0;
-  for (i=0; i<NUM_TOTALS; ++i) {
-    b->maxima[i] = 0;
-  }
-  b->max_total = 0;

ok, so I think we should definitely fix that bug soon. This does lead to inflated Maxima history in the state file. bug2704 for the patch.

Trac:
Status: reopened to needs_review

thanks for the merge, and sorry for the silly compile error.

I'm calling this one fixed. This time for sure! Thanks.

Trac:
Resolution: N/A to fixed
Status: needs_review to closed

Trac:
Keywords: 0.2.2.23-alpha bandwidth observed deleted, 0.2.2.23-alpha bandwidth observed tor-relay added

Trac:
Component: Tor Relay to Tor

closed

mentioned in issue #2744 (closed)

mentioned in issue #2987 (moved)

mentioned in issue #3338 (moved)

mentioned in issue #3435 (moved)

moved to tpo/core/tor#2704 (closed)

mentioned in issue tpo/core/tor#2987 (closed)

BandwidthObserved higher than BandwidthBurst

Child items ...

Activity