Opened 6 months ago

Closed 5 months ago

#30218 closed enhancement (fixed)

Add bandwidth files archiving to CollecTor

Reported by: irl Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: tor-bwauth, tor-dirauth, metrics-roadmap-2019-q2
Cc: teor, metrics-team, starlight@… Actual Points:
Parent ID: #21378 Points:
Reviewer: Sponsor:

Description

These are referenced by votes, and available via the directory protocol. Unfortunately there is no "current" URL yet, only "next", so we might have to be proactive in downloading these independently of the relaydescs module.

Child Tickets

Change History (11)

comment:1 Changed 6 months ago by teor

The bandwidth file is read by an authority to create its vote at VA-DistSeconds-VoteSeconds:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n302

That's around HH:50 (and when there's no consensus, HH:20) on the authority's UTC clock.

So you should get the bandwidth file that the authority is about to use for the vote if you start fetching it at HH:49 and HH:19.

After you fetch the vote, you can check bandwidth-file-digest, and re-fetch the bandwidth file if it does not match.

comment:2 Changed 6 months ago by irl

We are trying to build more robust archiving solutions to avoid missing data, and having a window of only a few minutes in which the data is going to be available for certain doesn't help towards that goal. We really should have a "current" URL that keeps a copy of the bandwidth file that was used for at least as long as the consensus is fresh.

comment:3 in reply to:  2 Changed 6 months ago by teor

Replying to irl:

We are trying to build more robust archiving solutions to avoid missing data, and having a window of only a few minutes in which the data is going to be available for certain doesn't help towards that goal. We really should have a "current" URL that keeps a copy of the bandwidth file that was used for at least as long as the consensus is fresh.

I agree. We have ticket #27047 for this feature, but it's not on our roadmap right now.

Unfortunately, we've had to minimise non-sponsored work in 2019.

Is there a current sponsor that wants us to do #27047?
Or should we ask the grants team to find grants for bandwidth authority work?

comment:4 Changed 6 months ago by irl

There is no current sponsor.

comment:5 Changed 6 months ago by karsten

Yesterday, I wrote a little script that ran roughly once per minute for over an hour and fetched moria1's and longclaw's "next" bandwidth files. I received a bandwidth file every time, not just between, say, HH:49 and HH:00. More specifically, here's what I got:

Authority Timestamp Digest First received Last received Referenced from vote
longclaw 1555868103 (17:35) lKTscsfb.. .. 18:40 18:00
longclaw 1555871704 (18:35) 8KuO5fcL.. 18:43 19:40 19:00
longclaw 1555875303 (19:35) laoWH3KH.. 19:41 .. 20:00
moria1 1555867341 (17:22) EAiqle6R.. .. 18:45 18:00
moria1 1555871524 (18:32) 5aZPyxPy.. 18:46 19:45 19:00
moria1 1555875627 (19:40) ZTrHiTtI.. 19:46 .. 20:00

Looking at the 19:00 votes, we could fetch referenced bandwidth files from around 18:45 to around 19:45.

For reference, we're currently fetching votes (and all other relay descriptors) starting at HH:05 and once more at HH:35 in case there was no consensus at HH:00.

How about we simply download "next" bandwidth files at around HH:05 and at around HH:35, knowing that we're really going to receive "previous" bandwidth files? This would be easiest with regard to extending CollecTor's relaydescs module.

comment:6 Changed 6 months ago by irl

Reviewer: irl
Status: newneeds_review

comment:7 Changed 6 months ago by irl

Reviewer: irl
Status: needs_reviewnew

This sounds like an OK approach for now, but it doesn't seem to be particularly robust. We will need to revisit this in the future but I don't think that should hold up implementing this now.

comment:8 Changed 6 months ago by karsten

Status: newneeds_review

Please review commit ad1bbc8 in my task-30218 branch.

If you want to try it, be sure to use a metrics-lib-2.6.0-dev.jar built with the #30369 fix.

This branch worked just fine on my local machine for downloading two hours of bandwidth files. But I have to admit that I haven't tested it as thoroughly as I'd usually do. Let's be sure to give it more testing before we deploy this on the server.

comment:9 Changed 5 months ago by karsten

I ran this branch for over a week on my server, and it worked just fine.

When you review the commit above, please also review fixup commit 7d0b9eb in the same branch.

comment:10 Changed 5 months ago by irl

Status: needs_reviewmerge_ready

Seems to be working. Code looks good.

Checks and tests pass.

comment:11 Changed 5 months ago by karsten

Resolution: fixed
Status: merge_readyclosed

Thanks for looking! Squashed and merged to master. Opening a new ticket for the release and closing this one. Thanks!

Note: See TracTickets for help on using tickets.