Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#1863 closed defect (fixed)

Relays should store observed bandwidth in state file

Reported by: mikeperry Owned by:
Priority: Medium Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: Tor: 0.2.2.x-final
Severity: Keywords: tor-relay
Cc: mikeperry Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

arma suggested I write a proposal for this, but I believe it is a flat out bug. When a relay goes down, it looses its 24 hour observed bandwidth window. For fast relays, this can be catastropic, as it takes weeks for them to regain their bandwidth due to our slow descriptor refresh interval. The slow descriptor refresh interval can be shortened, but that requires careful study to make sure we do not increase the descriptor load on clients.

If we just instead stored this info in the state file to survive crashes, high bandwidth relays would still spend a *loooong* time gathering bandwidth (and maybe this is a really good thing?), but will at least not have to restart when they die.

Child Tickets

Change History (13)

comment:1 Changed 9 years ago by nickm

Sounds plausible. Can you outline how you think the solution should look before we go and build it?

comment:2 Changed 9 years ago by nickm

Cc: mikeperry added
Milestone: Tor: 0.2.2.x-final

Assigning to 0.2.2.x final if it isn't to hard, though it can go right on to 0.2.3.x final if it's hard.

mike, it would help if you outlined what value exactly you wanted to store here; I'm being stupid tonight.

comment:3 Changed 9 years ago by mikeperry

The minimal solution is to just save the value from rep_hist_bandwidth_assess() into the state file periodically. Upon startup, tor would read this value in and then fill read_array and write_array with this value instead of zero as it does now.

This way, the value from the state file would stay the max for 24 hours (until the read_array and write_array are filled with observed data and wrap around), or until a new higher bandwidth observation is made, whichever comes first.

comment:4 Changed 9 years ago by mikeperry

Oh shoot, that approach is going to mess with the extra-info descriptors, isn't it?

Hrmm.

comment:5 Changed 9 years ago by mikeperry

It's now a complete mystery to me why rep_hist_update_state() and rep_hist_load_state() aren't doing this job already.

comment:6 Changed 9 years ago by nickm

Yeah; it looks to me like it should totally work. Looking at state files from my cute little test network, it seems like everything is getting written out okay, and I can't see any bugs in the read code.

Maybe it's too harsh to throw out the observations if more than NUM_SECS_BW_SUM_INTERVAL*NUM_TOTALS have passed since they were written? Is that the issue here?

comment:7 Changed 9 years ago by mikeperry

Ahh, yes, that is possible then. What is that, 24 hours?

comment:8 Changed 9 years ago by nickm

[55]% grep '#define NUM_' ~/src/tor/src/or/rephist.c
#define NUM_SECS_ROLLING_MEASURE 10
#define NUM_SECS_BW_SUM_INTERVAL (15*60)
#define NUM_SECS_BW_SUM_IS_VALID (24*60*60)
#define NUM_TOTALS (NUM_SECS_BW_SUM_IS_VALID/NUM_SECS_BW_SUM_INTERVAL)

Sure looks that way. Maybe if we load bw observations, and it's all before the NUM_SECS_BW_SUM_IS_VALID period (and our IP hasn't changed) we should still base our declared bandwidth on them until we've been up for a fixed period? Something like that?

I don't like the idea of adding fake measurements to the observation array; fake or mislabeled data never ends well for us.

comment:9 Changed 9 years ago by nickm

Status: newneeds_review

I went to refactor the code a bit and found out that we aren't actually doing anything to make the maxima persistent, but it is the maxima that we look at in rep_hist_bandwidth_assess. That would explain why even after starting back up immediately, Tor would think it had done nothing for the last 24 hours.

I have a fix for this and some other rephist load/save issues in branch bug1863_bwhist in my public. It needs review and testing.

comment:10 Changed 9 years ago by karsten

See branch bug1863_bwhist in my public repository. Other than that, looks good.

comment:11 Changed 9 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

okay, will merge. damn the torpedos

comment:12 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:13 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.