got this line looking at my relay:
Bandwidth (Max/Burst/Observed - In Bps): 340992 / 1022976 / 30676070
1h20 after an upgrading from 0.2.2.22-alpha to 0.2.2.23-alpha
Trac: Username: keb
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
implies that maxval is a per-second amount, whereas b->maxima is a per-10-second amount. So the Maxima we write to the state file is a per-second amount.
Then when we read it, we
v = tor_parse_uint64(cp, 10, 0, UINT64_MAX, &ok, NULL); if (have_maxima) { maxstr = smartlist_get(s_maxima, cp_sl_idx); mv = tor_parse_uint64(maxstr, 10, 0, UINT64_MAX, &ok_m, NULL); mv *= NUM_SECS_ROLLING_MEASURE; } else { /* No maxima known; guess average rate to be conservative. */ mv = v / s_interval; }
and then
b->max_total = mv;
where max_total is also a per-10-second amount. So
mv = v / s_interval;
ought to be
mv = v / s_internal * NUM_SECS_ROLLING_MEASURE;
But that's not our bug, since it would produce smaller numbers not larger numbers. But it is a bug.
Also, why is the argument for rep_hist_load_bwhist_state_section() called s_begins yet the state line is called "BWHistoryWriteEnds"? That sounds like a recipe for more confusion.
We are reading in a per-900-second value, and then telling our rephist system that at second 'start', we saw all of those bytes get written. I suspect this causes the per-10-second calculation to be 1/10 of the per-900-second value.
(The following redundant note is added to catch user searches.)
Vidalia "Tor Bandwidth Usage" chart is affected by this issue. The vertical scale is much larger than the actual transfer numbers, so the indicator hardly leaves the floor.
Hrmm, I hope we're not using int32_t's anywhere related to this. We're approaching maxint observed bandwidth on several relays. I guess we'll find out how robust all associated tor and non-tor code is soon :).
I think part of the analysis here is wrong. Specifically, the suggestion to do
mv = v / s_internal * NUM_SECS_ROLLING_MEASURE;
is good, but doesn't influence the other bug this ticket is about, because the above fix is only relevant when we don't know the maximum. If we know the maximum, we already do
18:56 < nickm> hang on. That loop, where we do "while (start < now)". Is that
right? If this interval is not the most recent one, then won't
we add the observation far too many times?
and a simplification:
18:57 < nickm> also you can simplify the overflow logic a lot
18:57 < nickm> instead of overflow, have, say, "actual_interval_len". Set it
to either s_interval or to now-start.
18:57 < nickm> then compute cur = v / actual_interval_len
moria1's current descriptor says bandwidth 512000 10485760 1306223, with the burst well within above the max observed.
Looking at the past days, moria reported very high values shortly after starting up, and the values then dropped. This didn't appear to change significantly with the patch vs without it.
I can't really reproduce the issue with fluxe3, however: I just restarted it (on 0.2.2.24-alpha), and in its descriptor it said bandwidth 921600 10240000 83463884 - so we don't have an inflated value here. hrm.
Here's another bug in rep_hist_update_bwhist_state_section(), I think:
tor_asprintf(&cp, U64_FORMAT, U64_PRINTF_ARG(b->max_total & ~0x3ff)); smartlist_add(*s_maxima, cp);}}It seems we need to divide b->max_total by NUM_SECS_ROLLING_MEASURE, no?Also, this following change from 105b94b7 seems to be strange to me, but I can't put my finger on why so far.{{{ /* Clean up maxima and observed */- /* Do we really want to zero this for the purpose of max capacity? */ for (i=0; i<NUM_SECS_ROLLING_MEASURE; ++i) { b->obs[i] = 0; } b->total_obs = 0;- for (i=0; i<NUM_TOTALS; ++i) {- b->maxima[i] = 0;- }- b->max_total = 0;