Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#4142 closed enhancement (wontfix)

Make relays and bridges publish a new descriptor after writing stats

Reported by: karsten Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Relays and bridges write new stats every 24 hours, but publish a new descriptor every 18 hours. It could be that relays/bridges wait another 18 hours after a stats interval ends before publishing them to the directory authorities. It's quite possible that the bridge or relay is not online 18 hours later and doesn't upload these stats then.

There are two problems here. The first is that the bridge or relay may not come back at all, so that we'll never learn what stats it had. Or it could be offline for a few days and decide that stats have become too old to upload when they come back (I'd have to look up after what time they make that decision).

Another problem is that the delay between measuring stats and submitting them is bad for things like the censorship detector. Having the data up to 18 hours earlier and 9 hours earlier on average would be helpful.

How about we make relays and bridges publish a new descriptor every time they finish a stats interval. That would be 1 descriptor more per bridge and relay per day.

Child Tickets

Attachments (1)

delay.png (35.9 KB) - added by karsten 8 years ago.
Delay between finishing a dirreq-stats interval and first publishing its results

Download all attachments as: .zip

Change History (12)

comment:1 Changed 8 years ago by Sebastian

Currently these descriptors would be rejected by authorities because nothing significant changed

comment:2 in reply to:  1 Changed 8 years ago by karsten

Replying to Sebastian:

Currently these descriptors would be rejected by authorities because nothing significant changed

Right. The problem here is that the server descriptor wouldn't change at all, but the extra-info descriptor would. And we need the new server descriptor to be referenced from the consensus, or we'll never learn that there's a new extra-info descriptor. So, we'll have to make sure the directory authorities accept the new server descriptor whenever the extra-info descriptor has new stats.

One way to achieve this would be to include a timestamp of the last written statistics in the server descriptor.

Are there other ways?

comment:3 Changed 8 years ago by Sebastian

We could change the descriptor generation interval to 24h. Not sure what implications that would have?

comment:4 Changed 8 years ago by karsten

You mean increase the interval from 18 to 24 hours? What if the relay uploads a new descriptor, e.g., due to a bandwidth change, a few minutes before the 24-hour stats interval ends? Then we might wait almost 24 hours for the stats to arrive, not 18.

comment:5 Changed 8 years ago by Sebastian

Ah, I see. The real solution is probably to make it so that we don't have to publish a new server descriptor to publish new stats, so that not every has to download them. But that's a lot more complex. Hrm.

comment:6 Changed 8 years ago by karsten

The overhead from additional server descriptors should be low. It's just 1 descriptor per relay every 24 hours. Shouldn't matter much, in particular with microdescriptors. Also, note that if there's no new server descriptor that's referenced in the consensus, metrics-db has no way to learn that there's a new extra-info descriptor. So, I'd say we shouldn't try to change that relays and bridges publish their extra-info descriptors and server descriptors together. That's way too complex.

I still think my suggestion above would work:

"One way to achieve this would be to include a timestamp of the last written statistics in the server descriptor."

Changed 8 years ago by karsten

Attachment: delay.png added

Delay between finishing a dirreq-stats interval and first publishing its results

comment:7 Changed 8 years ago by karsten

Sebastian and I briefly discussed how long statistics publication is delayed in practice due to relays only publishing descriptors every 18 hours. I ran a quick analysis that compares the dirreq-stats-end timestamp to the published timestamp of the first descriptors containing the stats. See the attached graph (and code). The result is that the delay is about 8 hours on average.

The graph does not show how many statistics we're missing because relays went offline. In theory, we could count relays with 24+ hours uptime that don't publish a new descriptor before disappearing from the consensus. Those could in theory have published a descriptor with stats if we changed the timing. I didn't run this analysis now, but I guess the number would be non-zero.

comment:8 Changed 8 years ago by nickm

Milestone: Tor: unspecified

comment:9 Changed 8 years ago by karsten

Resolution: wontfix
Status: newclosed

As part of the report What fraction of our bridges are not reporting usage statistics? I looked into the fraction of non-reported statistics due to not publishing a descriptor shortly after writing stats. This fraction was tiny. It seems that we can break a lot by changing how Tor decides when to publish a descriptor, so it's probably safer to leave this alone. Closing this ticket as wontfix.

comment:10 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:11 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.