Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#5235 closed defect (fixed)

550 Hz warn-level log spam on dirauth (“[warn] Weighted bandwidth is 0.000000...”)

Reported by: ln5 Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Keywords: easy tor-auth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

maatuska sees

[warn] Weighted bandwidth is 0.000000 in node selection for rule weight as directory

_very_ often. Today, it's been seen 19602 times.

At one point today I saw 7690 of these in 14 seconds. That's about 550 Hz.

rransom says it might be because all dirauths have bandwidth weight 0
and that bwauths sets dirauths to 0. Haven't yet checked what more it
takes to end up here though.

Child Tickets

Attachments (2)

0001-rate-limit-weighted-bandwidth-is-zero-message.patch (1.5 KB) - added by andrea 7 years ago.
0001-rate-limit-weighted-bandwidth-is-zero-message-alternate.patch (2.2 KB) - added by andrea 7 years ago.
Alternate patch which also modifies message to emit unweighted bandwidth, to distinguish between possible causes

Download all attachments as: .zip

Change History (11)

comment:1 Changed 7 years ago by rransom

Milestone: Tor: 0.2.3.x-final
Summary: Extensive logging on a directory authority550 Hz warn-level log spam on dirauth (“[warn] Weighted bandwidth is 0.000000...”)

comment:2 Changed 7 years ago by nickm

Keywords: easy added

The easiest solution here would be to move the check up higher, or to use ratelim_t to rate-limit how often that warning can occur when it occurs.

comment:3 Changed 7 years ago by andrea

Status: newneeds_review

This patch rate-limits the message; since this function is called very frequently, it would probably be advisable to apply similar rate-limiting to the other two warning messages it can emit ("Round-off error in computing bandwidth had an effect on which router we chose", "Consensus is not listing bandwidths. Defaulting back to old router selection algorithm.").

For possible further investigation: why would this sum to zero? The function exits early if the list is empty, so this message can only be emitted if the list contains one or more nodes which contribute zero bandwidth to the total and none that contribute non-zero bandwidth. A node can contribute zero bandwidth if node->rs->bandwidth is zero, or if all the weight factors applied are zero. Which is it that has occurred here?

Changed 7 years ago by andrea

Alternate patch which also modifies message to emit unweighted bandwidth, to distinguish between possible causes

comment:4 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified
Status: needs_reviewneeds_information

Applying and merging the second option. Thanks!

I also checked in a whitespace normalization (we don't typically put space inside parenthesis in the rest of the code) and a changes file as 8c9b9a5f0053605f01.

The underlying issue isn't fixed here, but the obnoxious symptom is. I'm putting this ticket into "Tor: unspecified", so we can chase it harder if it ever manifests again.

comment:5 Changed 7 years ago by nickm

oops; I spoke too soon. My extra commit was 8be6058d8f31e578d

comment:6 Changed 7 years ago by rransom

Looks like #5635 did indeed fix the code path that produced this log spam, by making Tor not consider fetching v2 networkstatuses from non-dirauths.

Some of the log snippets in #5635 indicate that Tor was trying to fetch from a different relay every time. If the directory-fetch rate limiting is per-dirserver, rather than per-document, that could explain why Tor was trying to fetch a v2 networkstatus at 550 Hz.

comment:7 Changed 7 years ago by rransom

Resolution: fixed
Status: needs_informationclosed

comment:8 Changed 6 years ago by nickm

Keywords: tor-auth added

comment:9 Changed 6 years ago by nickm

Component: Tor Directory AuthorityTor
Note: See TracTickets for help on using tickets.