Opened 5 months ago

Last modified 2 months ago

#23318 merge_ready defect

compute_weighted_bandwidths: do not add 0.5 to final_weight

Reported by: cypherpunks Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.3-alpha
Severity: Normal Keywords: path-selection, 029-backport, 030-backport, 031-backport
Cc: mikeperry Actual Points:
Parent ID: Points: 1
Reviewer: asn Sponsor: SponsorQ

Description

This extra 0.5 leads to non-zero probability to bypass bandwidth weights limitations. Like to pick Exit+Guard as Guard while there are not enough bandwidth.

Child Tickets

TicketStatusOwnerSummaryComponent
#23406acceptednickmSampled guards are not re-weighted when a new consensus arrivesCore Tor/Tor
#24170closedLog the actual bandwidth total when logging "Generated weighted bandwidths"Core Tor/Tor

Change History (24)

comment:1 Changed 5 months ago by cypherpunks

Component: - Select a componentCore Tor/Tor

comment:2 Changed 5 months ago by cypherpunks

I think in original code 0.5 contradicts to tor_llround(), it should be (int64_t)(weight*this_bw + 0.5) or tor_llround(weight*this_bw)

comment:3 Changed 5 months ago by dgoulet

Cc: mikeperry added
Keywords: path-selection added
Milestone: Tor: 0.3.2.x-final
Priority: MediumHigh

comment:4 Changed 5 months ago by cypherpunks

Priority: HighVery High

comment:5 Changed 5 months ago by cypherpunks

It's not only case when code bypassing limits, as noticed by pastly in IRC. Guard nodes weights only if new guard added by entry_guards_expand_sample. Sample guard or subsets never being re-weighted: once sampled non-Exit Guard will be used as Exit+Guard (if operator change it policy) by client.
(See related #22779)

comment:6 in reply to:  5 Changed 5 months ago by teor

Points: 1
Priority: Very HighMedium
Version: Tor: 0.2.4.3-alpha

Replying to cypherpunks:

I think in original code 0.5 contradicts to tor_llround(), it should be (int64_t)(weight*this_bw + 0.5) or tor_llround(weight*this_bw)

I can confirm this bug: tor_llround(weight*this_bw + 0.5) returns 1 when weight or this_bw are 0, because tor_llround() rounds away from zero. This is probably not what we want.

I split the other bug off into another ticket.

comment:7 Changed 4 months ago by nickm

Owner: set to nickm
Status: newaccepted

So, I've done the obvious patch as bug23318_029. I suggest trying it in 0.3.2 for a while before we backport, in case there are any weird consequences.

Last edited 4 months ago by nickm (previous) (diff)

comment:8 Changed 4 months ago by nickm

Keywords: 029-backport 030-backport 031-backport added

comment:9 Changed 4 months ago by nickm

Status: acceptedneeds_review

comment:10 Changed 4 months ago by asn

Hmm, fix looks reasonable and obvious here.

Help me understand one thing: Where does the tor_llround() happen after compute_weighted_bandwidths()? I think the cypherpunks at comment:2 is implying that it should be done as part of final_weight = weight*this_bw;?

I see tor_llround() activity in smartlist_choose_node_by_bandwidth_weights() using scale_array_elements_to_u64().

However, we don't do it in frac_nodes_with_descriptors(). There we just use the value as a decimal without rounding. Is that OK?

comment:11 Changed 4 months ago by nickm

I think it's okay. The only reason to use llround is that we're converting to a uint64, which we need to do for our randomized constant-time weighted selection logic. In frac_nodes_with_descriptors(), we're not converting to uint64 at all.

comment:12 Changed 3 months ago by nickm

Keywords: review-group-24 added

review-group-24 is now open.

comment:13 Changed 3 months ago by nickm

Sponsor: SponsorQ

comment:14 Changed 3 months ago by dgoulet

Reviewer: asn

Fix lgtm but I think we should give an opportunity to asn to address the response to his comment.

comment:15 Changed 2 months ago by asn

Yep, it's good with me. Nick's comment:11 covered me. Sorry for not clarifying.

comment:16 Changed 2 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.1.x-final
Status: needs_reviewmerge_ready

Okay. Merged to 0.3.2 and forward; marking for backport.

comment:17 Changed 2 months ago by teor

This change breaks all chutney networks: we should run make test-network on our changes before we merge them.

comment:18 Changed 2 months ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final
Status: merge_readyneeds_revision

This seems to have broken test-network-all.

comment:19 Changed 2 months ago by nickm

Reverted with 3dc61a5d71423e.

We should fix 01e984870a7e1db2722e85fe43af7bcb4755c2d4 so that it doesn't break test-networks, and then reapply.

comment:20 Changed 2 months ago by teor

Status: needs_revisionneeds_review

The underlying issue here is how tor deals with a consensus that has all-zero bandwidths.

Before the original patch:

  1. Tor calculates 1 as the weighted bandwidth for all the nodes
  2. frac_nodes_with_descriptors() returns a sensible value, so Tor bootstraps
  3. node_sl_choose_by_bandwidth() returns a badly-weighted value (I split off #24169 to deal with this), but Tor works anyway

After the original patch:

  1. Tor calculates 0 as the weighted bandwidth for all the nodes
  2. frac_nodes_with_descriptors() returns 0, so Tor fails to bootstrap

comment:21 Changed 2 months ago by teor

Please see my branch bug23318-redux, which fixes the issue with frac_nodes_with_descriptors(), and the logging issue in #24170.

comment:22 Changed 2 months ago by nickm

Rebased onto maint-0.2.9 as bug23318-redux_029

comment:23 Changed 2 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.1.x-final
Status: needs_reviewmerge_ready

Merged that into 0.3.2; seems to work. Thanks! Marking for backport.

comment:24 Changed 2 months ago by nickm

Keywords: review-group-24 removed
Note: See TracTickets for help on using tickets.