Opened 4 weeks ago

Last modified 3 weeks ago

#30255 new task

Add additional bandwidth file headers in sbws 1.2

Reported by: teor Owned by:
Priority: Medium Milestone: sbws: 1.2.x-final
Component: Core Tor/sbws Version:
Severity: Normal Keywords:
Cc: juga, metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Parent ticket.

Child Tickets

TicketStatusOwnerSummaryComponent
#29355newInclude scanner nickname and UUID in the bandwidth file headers?Core Tor/sbws
#30196needs_revisionAdd the tor version to the sbws bandwidth file headerCore Tor/sbws
#30229newAutomatically set the scanner country and ASCore Tor/sbws
#30251needs_revisionAdd the operating system to the bandwidth file headersCore Tor/sbws
#30252needs_revisionAdd the tor OpenSSL and NSS versions to the sbws bandwidth file headersCore Tor/sbws
#30254newWork out how to contact bandwidth scanner operatorsCore Tor/sbws
#30256newAdd data_period to the bandwidth file headersCore Tor/sbws
#30406needs_reviewjugaRefactor bandwidth file header constantsCore Tor/sbws

Change History (9)

comment:1 Changed 4 weeks ago by teor

Each of these child tickets needs a spec change, and a code change.

comment:2 Changed 4 weeks ago by juga

The problem with small tickets that change the same parts of the code is the merge conflicts. Any other solution apart of reviewing them one after other in some order?

comment:3 in reply to:  2 Changed 4 weeks ago by teor

Replying to juga:

The problem with small tickets that change the same parts of the code is the merge conflicts. Any other solution apart of reviewing them one after other in some order?

Our priority for sbws is maintaining stable software. That's more important than writing and merging features quickly.

Conflicts are often a sign of bad code structure. For each new key, the current code structure needs:

  • a key in one place
  • the same key in another place
  • the same key in a third place, with the code that implements the key
  • the same key in five other places, with the code that tests the key

Here's a structure for this code that would be easier to change:

  • each header key value is added to the state using a line of code
  • each header key value is tested using a line of code (or perhaps two lines of code, a set and a test)
  • the rest of the code just accepts whatever is in the state

Using this structure, merges add additional key value assignments, at the end of a list of assignments. After the code is refactored, the merge conflicts should be trivial.

While the code is so complicated, we need to split up sbws features into separate pull requests. Small pull requests are easier to review.

If the pull requests are hard to review, we will struggle to find all the bugs in sbws code.

comment:4 Changed 4 weeks ago by teor

Sorry, my advice was not the best advice.

Here is some better advice:

Put the commits in one pull request, in this order:

  1. We want to be able to add each header using 1-2 lines of code. So we need to refactor the header code. Add commits for this refactor.
  2. Add one new header in each commit. If a group of headers makes more sense together, put their commits next to each other.

comment:5 Changed 4 weeks ago by karsten

irl and I discussed the topic of additional bandwidth headers as part of #30216, where we're trying to write a parser for these files.

When reading this ticket and its subtickets I'm slightly concerned that you might consider adding spaces to new header lines. For example, operating system and contact information sound like they might contain spaces. But as far as I see it, that means trouble! One reason is that bandwidth-file-headers in dir-spec doesn't permit spaces in KeyValue parts, so that it wouldn't be possible to copy over header lines to the vote.

Does this make sense?

In fact, if it does, would you mind clarifying in the spec that header lines always have the format HeaderLine ::= KeyValue NL? That would also simplify the parser, because we'd then be able to distinguish header and relay lines by the number of space-separated KeyValue elements.

comment:6 Changed 4 weeks ago by karsten

Cc: metrics-team added

comment:7 in reply to:  5 ; Changed 4 weeks ago by juga

Replying to karsten:

irl and I discussed the topic of additional bandwidth headers as part of #30216, where we're trying to write a parser for these files.

i'm unfamiliar with metrics ecosystem and i thought that implementing the parser in stem (#29056, #30160) would avoid you to re-implementing for metrics. Is there a way to avoid that?.

When reading this ticket and its subtickets I'm slightly concerned that you might consider adding spaces to new header lines.

No, we should not add spaces. I made the mistake of writing the url of a wip branch in a ticket that should not have been reviewed yet.

For example, operating system and contact information sound like they might contain spaces.

For the operating system, this is what the command suggested by teor would generate: Linux-4.9.0-8-amd64-x86_64-with-glibc2.7. It doesn't contain space.
For the contact information, teor said:

Let's not specify another unstructured contact string, please.

And imo, it should just be an email address, if we add contact information.

But as far as I see it, that means trouble! One reason is that bandwidth-file-headers in dir-spec doesn't permit spaces in KeyValue parts, so that it wouldn't be possible to copy over header lines to the vote.

Does this make sense?

Yes

In fact, if it does, would you mind clarifying in the spec that header lines always have the format HeaderLine ::= KeyValue NL? That would also simplify the parser, because we'd then be able to distinguish header and relay lines by the number of space-separated KeyValue elements.

Hmm, that makes sense, but it's not true for the first line Timestamp NL. Maybe we can just clarify that.

comment:8 Changed 4 weeks ago by juga

I created #30311 for the suggested specification change.

comment:9 in reply to:  7 Changed 3 weeks ago by karsten

Replying to juga:

Replying to karsten:

irl and I discussed the topic of additional bandwidth headers as part of #30216, where we're trying to write a parser for these files.

i'm unfamiliar with metrics ecosystem and i thought that implementing the parser in stem (#29056, #30160) would avoid you to re-implementing for metrics. Is there a way to avoid that?.

Nope, there's no way to avoid that. The deployed Tor Metrics services are all implemented in Java, and we're currently starting to re-implement parts in Python. There will be a transition phase in which we'll need to write parsers in Java and Python.

When reading this ticket and its subtickets I'm slightly concerned that you might consider adding spaces to new header lines.

No, we should not add spaces. I made the mistake of writing the url of a wip branch in a ticket that should not have been reviewed yet.

Not sure which URL or branch you refer to, but glad to hear that you agree about the spaces.

For example, operating system and contact information sound like they might contain spaces.

For the operating system, this is what the command suggested by teor would generate: Linux-4.9.0-8-amd64-x86_64-with-glibc2.7. It doesn't contain space.

Okay!

For the contact information, teor said:

Let's not specify another unstructured contact string, please.

And imo, it should just be an email address, if we add contact information.

I'll leave this to you. I mostly wanted to know about the spaces, because that's what affects me when parsing the format.

But as far as I see it, that means trouble! One reason is that bandwidth-file-headers in dir-spec doesn't permit spaces in KeyValue parts, so that it wouldn't be possible to copy over header lines to the vote.

Does this make sense?

Yes

Cool!

In fact, if it does, would you mind clarifying in the spec that header lines always have the format HeaderLine ::= KeyValue NL? That would also simplify the parser, because we'd then be able to distinguish header and relay lines by the number of space-separated KeyValue elements.

Hmm, that makes sense, but it's not true for the first line Timestamp NL. Maybe we can just clarify that.

Yes, I think it should be possible to clarify that. After all, that's a special line anyway.

Thanks a lot!

Note: See TracTickets for help on using tickets.