Opened 9 months ago

Closed 9 months ago

Last modified 9 months ago

#29056 closed defect (implemented)

Implement bandwidth file parser and formater

Reported by: juga Owned by: juga
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords: tor-bwauth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

After #21377 is implemented, the bandwidth authorities will start publishing the bandwidth files that they have used in their votes and these will be stored by CollectTor.
This is currently implemented in sbws, but it's possible that we still need to change the format of the bandwidth file.
As discussed in #28684, it would be a problem to parse files already archived if the format of the bandwidth file changes, so it would be better if this implemented in stem.

Child Tickets

Change History (23)

comment:2 Changed 9 months ago by atagar

Thanks juga! Sounds good. Once the bwauth file format and a method of getting them have been added to the tor spec I'd be happy to help write a parser (or merge it if ya have a patch).

comment:3 in reply to:  2 Changed 9 months ago by teor

Replying to atagar:

Thanks juga! Sounds good. Once the bwauth file format

https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt

and a method of getting them have been added to the tor spec

https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2652

I'd be happy to help write a parser (or merge it if ya have a patch).

I'd encourage you to wait for the Tor features #21377 (bandwidth file URL), and maybe also #26698 (bandwidth-file-digest). There might be last-minute code or spec changes.

For completeness:

The spec for #26698 (bandwidth-file-digest) is:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2140

bandwidth-file-headers was implemented in #26222, its spec is:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2120

comment:4 Changed 9 months ago by atagar

Thanks teor! I'd be happy to proceed with the parser part of this now. It's fine if there's spec tweaks up until our next Stem release, which is a very long ways out. That said, I'll hold off on the downloader component until #21377 has been merged since I need that for an end-to-end test.

To proceed with making a parser I need a copy of a bandwidth authority file for our unit tests. Per chance does anyone have one handy?

comment:5 in reply to:  4 Changed 9 months ago by teor

Replying to atagar:

Thanks teor! I'd be happy to proceed with the parser part of this now. It's fine if there's spec tweaks up until our next Stem release, which is a very long ways out. That said, I'll hold off on the downloader component until #21377 has been merged since I need that for an end-to-end test.

To proceed with making a parser I need a copy of a bandwidth authority file for our unit tests. Per chance does anyone have one handy?

You should have three files in your unit tests:

  • a format version 1.0 file from Torflow,
  • a format version 1.1 file from sbws 0.1.0 to 1.0.0
  • a format version 1.2 file from sbws 1.0.3 and later

https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt#n543

I've noticed some minor spec issues, I'll open child tickets for them.

comment:6 Changed 9 months ago by atagar

You should have three files in your unit tests

Sounds good. Any notion of who can provide them?

comment:7 Changed 9 months ago by teor

In fact, you should have 4 files in your unit tests, because there are two different kinds of 1.2.0 files.

There are sample files in the appendix of the spec here:
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt#n543

If you want to parse a real file that contains thousands of relays, they are publicly archived by some directory authorities. You should get a sample from longclaw, which runs sbws, and another authority, which all run torflow.

comment:8 in reply to:  7 ; Changed 9 months ago by juga

Replying to teor:

In fact, you should have 4 files in your unit tests, because there are two different kinds of 1.2.0 files.

Since it'll change again with 1.0.3 (not released yet) and since, except for longclaw, there're no other authorities that has used sbws, do you really think it's needed to support the files produced by sbws 1.2.0?.

comment:9 in reply to:  4 Changed 9 months ago by juga

Replying to atagar:

Thanks teor! I'd be happy to proceed with the parser part of this now.

It's basically implemented in sbws [0], but there're some changes to make
I guess it should inherit from stem.descriptor.Descriptor. It's not using regular expressions but after looking how you implemented other documents, might be easier to use them.
Probably there's no need to separate classes for header and bandwidth lines.

[0] https://github.com/torproject/sbws/blob/12ad99d0024cdd65bd73d6ac969cc1e70680f286/sbws/lib/v3bwfile.py#L157,
https://github.com/torproject/sbws/blob/12ad99d0024cdd65bd73d6ac969cc1e70680f286/sbws/lib/v3bwfile.py#L187,
https://github.com/torproject/sbws/blob/12ad99d0024cdd65bd73d6ac969cc1e70680f286/sbws/lib/v3bwfile.py#L362,
https://github.com/torproject/sbws/blob/12ad99d0024cdd65bd73d6ac969cc1e70680f286/sbws/lib/v3bwfile.py#L590,
https://github.com/torproject/sbws/blob/12ad99d0024cdd65bd73d6ac969cc1e70680f286/sbws/lib/v3bwfile.py#L600

comment:10 in reply to:  8 Changed 9 months ago by teor

Replying to juga:

Replying to teor:

In fact, you should have 4 files in your unit tests, because there are two different kinds of 1.2.0 files.

Since it'll change again with 1.0.3 (not released yet) and since, except for longclaw, there're no other authorities that has used sbws, do you really think it's needed to support the files produced by sbws 1.2.0?.

People will expect stem to parse all available format versions.

Here's why we might need to parse each format version:

  • stem will need to parse 1.0.0 format files, because they're produced by Torflow on most authorities.
  • if anyone is keeping an archive of bandwidth files from longclaw, then it may contain 1.1.0 format files, and it will contain 1.2.0 format files.
  • if stem's parser is ready before longclaw deploys sbws 1.0.3, then it should be able to parse 1.2.0 format files.

And more generally:

The bandwidth file format is forward-compatible: a well-written parser should be able to handle 1.0.0, 1.1.0, 1.2.0, and all future 1.x format versions. And it's worth testing the parser with all available format versions, to make sure it produces useful results.

comment:11 in reply to:  7 Changed 9 months ago by juga

Replying to teor:

In fact, you should have 4 files in your unit tests, because there are two different kinds of 1.2.0 files.

There are sample files in the appendix of the spec here:
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt#n543

If you want to parse a real file that contains thousands of relays, they are publicly archived by some directory authorities. You should get a sample from longclaw, which runs sbws, and another authority, which all run torflow.

Torflow: https://bwauth.ritter.vg/bwauth/bwscan.V3BandwidthsFile
sbws v1.0.2 (bandwidth file version 1.2): https://people.torproject.org/~juga/volatile/latest.v3bw

comment:12 Changed 9 months ago by juga

Status: assignednew

Un-assign myself, since maybe is implemented by atagar

comment:13 Changed 9 months ago by juga

atagar, since i've seen you're already implementing the parser, please look at the wip branch where i started to refactor the current code to separate in different classes the parser logic and the scaling logic: https://github.com/juga0/sbws/blob/wip_refactor_v3bwfile/sbws/lib/bandwidth_file.py

If you think this is a good approach and it's compatible with what you're doing, i could open a PR in stem and you can continue to work on it.

comment:14 Changed 9 months ago by juga

Commit https://github.com/juga0/sbws/commit/d325a000dc7eea3be5b6d3dc9f69559429ba33b0#diff-82cb5602e1659562ab66f2ca0a5e2a34 just create a new class, copying only the v3bwfile.py class methods to parse/generate the file.
Commit https://github.com/juga0/sbws/commit/ba5081cf6bb344eea887db919af75e45f1507012#diff-82cb5602e1659562ab66f2ca0a5e2a34 create other class that inherits also from Descriptor, but it's not finished.

comment:15 Changed 9 months ago by atagar

Hi juga, just pushed stem support for parsing and creating bandwidth files. Would you like anything tweaked?

https://gitweb.torproject.org/stem.git/commit/?id=9cac908

comment:16 in reply to:  15 Changed 9 months ago by teor

Status: newneeds_revision

Replying to atagar:

Hi juga, just pushed stem support for parsing and creating bandwidth files. Would you like anything tweaked?

https://gitweb.torproject.org/stem.git/commit/?id=9cac908

There wasn't any pull request, so I left comments on the commit:
https://github.com/torproject/stem/commit/9cac9085504230e036ff65754d88a349ad88d549

I also updated the spec to clarify some things, it will get merged in #29079:
https://github.com/teor2345/torspec/blob/ticket29079/bandwidth-file-spec.txt

comment:17 Changed 9 months ago by atagar

Thanks teor! Fixes pushed...

https://gitweb.torproject.org/stem.git/commit/?id=bc7975b

Some sbws versions use a 4-character terminator

Fixed in the parser. I would advise for the spec to say which is preferred, explaining that allowance of the other is kept only for backward compatability. The way the spec's being changed here (simply citing both header dividers) makes it confusing which is the right one going forward.

Stem can create bandwidth files, so I need to know which is preferred.

comment:18 in reply to:  17 Changed 9 months ago by juga

Replying to atagar:

Thanks teor! Fixes pushed...

https://gitweb.torproject.org/stem.git/commit/?id=bc7975b

Some sbws versions use a 4-character terminator

Fixed in the parser. I would advise for the spec to say which is preferred, explaining that allowance of the other is kept only for backward compatability. The way the spec's being changed here (simply citing both header dividers) makes it confusing which is the right one going forward.

Stem can create bandwidth files, so I need to know which is preferred.

I'd say 5 because that was the first we wrote in the spec and what sbws currently writes. 4 was a typo bug when it was first implemented in sbws.

comment:19 Changed 9 months ago by juga

atagar, sorry i haven't still looked at your code, will do it before the end of the week, but don't wait for me.
If if find later something missing, we can always open a new ticket.

comment:20 Changed 9 months ago by atagar

I'd say 5 because that was the first we wrote in the spec and what sbws currently writes. 4 was a typo bug when it was first implemented in sbws.

I figured (and that's what the comment in stem says). I meant I hope this gets stated in the spec.

comment:21 Changed 9 months ago by atagar

Resolution: implemented
Status: needs_revisionclosed

If if find later something missing, we can always open a new ticket

Sounds good. Resolving this one.

comment:22 in reply to:  20 Changed 9 months ago by teor

Replying to atagar:

I'd say 5 because that was the first we wrote in the spec and what sbws currently writes. 4 was a typo bug when it was first implemented in sbws.

I figured (and that's what the comment in stem says). I meant I hope this gets stated in the spec.

See https://github.com/teor2345/torspec/commit/7f573b3c5968c3e59a15dfc05c0a21b194373c97#diff-824f4c7404328d8aa1101d74ba36884d

comment:23 Changed 9 months ago by atagar

Perfect! Thanks teor.

Note: See TracTickets for help on using tickets.