Opened 7 months ago

Closed 6 months ago

#29333 closed defect (worksforme)

Use the bandwidth-file-spec.txt keywords as BandwidthFile attributes

Reported by: juga Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords: tor-bwauth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

BandwidthFile descriptor was implemented in #29056.
Some of name of the header attributes correspond to the keywords in the bandwidth-file-spec.txt, but not all.
Since we might change the keywords, the implementation might be easier to follow the keywords using the same name for the attributes.
Currently these attributes are:

consensus_size
eligible_count
eligible_percent
min_count 
min_percent

Also, measurements should be named bandwidth_lines according to the specification. The bandwidth lines might not be raw measurements, but scaled.

Child Tickets

Change History (8)

comment:1 Changed 7 months ago by atagar

Hi juga. I chose these attribute names to make usage nicer. That is to say...

print('%i / %i (%i%%) of relays are eligible' % (
  desc.eligible_count,
  desc.consensus_size,
  desc.eligible_percent,
))

... rather than...

print('%i / %i (%i%%) of relays are eligible' % (
  desc.number_eligible_relays,
  desc.number_consensus_relays,
  desc.percent_eligible_relays.
))

That said, I'd be happy to defer to whatever you'd like when it comes to BandwidthFiles. To be clear you want the following changes...

* Rename desc.created_at to desc.file_created
* Rename desc.generated_at to desc.generator_started
* Rename desc.consensus_size to desc.number_consensus_relays
* Rename desc.eligible_count to desc.number_eligible_relays
* Rename desc.eligible_percent to desc.percent_eligible_relays
* Rename desc.min_count to desc.minimum_number_eligible_relays
* Rename desc.min_percent to desc.minimum_percent_eligible_relays
* Rename desc.measurements to desc.bandwidth_lines

Is this correct?


On a side note I like the spec, but felt while implementing it there's a few rough edges...

  • Keywords feel unnecessarily verbose to me. That's why I chose different names.
  • Inclusion of the percent attribute is redundant with other stats. I think it's a mistake to include derived stats since they're either redundant or inaccurate (what if 'eligible_count / consensus_size != eligible_percent'?).
  • The measurements is an unstructured dictionary because bandwidth-file spec section 2.3 is pretty sparse. It mandates 'node_id=', 'bw=', and notes 'master_key_ed25519=' but otherwise defers to section 2.4.2.1 which depends on application version rather than bandwidth file versions.

comment:2 in reply to:  1 Changed 6 months ago by juga

Replying to atagar:
[...]

That said, I'd be happy to defer to whatever you'd like when it comes to BandwidthFiles. To be clear you want the following changes...

* Rename desc.created_at to desc.file_created
* Rename desc.generated_at to desc.generator_started
* Rename desc.consensus_size to desc.number_consensus_relays
* Rename desc.eligible_count to desc.number_eligible_relays
* Rename desc.eligible_percent to desc.percent_eligible_relays
* Rename desc.min_count to desc.minimum_number_eligible_relays
* Rename desc.min_percent to desc.minimum_percent_eligible_relays
* Rename desc.measurements to desc.bandwidth_lines

Is this correct?

Yes, it's.
However, you're right that the names you chose might be clearer.
We have a ticket to try to decide better names for keyword #28099, and before April we might add other keywords as part of #28547 (see https://pad.riseup.net/p/sbws-exclude-reasons-keep for the keywords proposed so far).
So, i'd propose that we delay these changes until we solve #28547, and mark this as needs_information. Sounds good?

On a side note I like the spec, but felt while implementing it there's a few rough edges...

[...]

If we change the keywords or add new, we should change also the specification.
Feel free to open tickets for the specification regarding software versions and not specification versions.

comment:3 Changed 6 months ago by atagar

So, i'd propose that we delay these changes until we solve #28547, and mark this as needs_information. Sounds good?

Hi juga. I'd be happy to rename these attributes, keep them as-is, or wait. Your call.

Every other Stem Descriptor subclasses uses nicer attribute names than the spec. Bandwidth files are not unique in this regard. My goal with Stem is to provide the most developer friendly library I can, not to mirror spec terminology. If you feel the names proposed above improve code readability we can go with that but personally I think it's a step backward.

To be clear we can only await the tickets you cite until Stem's next release. That won't be for a long while (rough guess: six months?), but once Stem makes a release these attribute names can no longer change. (*)

How would you care to proceed?

Feel free to open tickets for the specification regarding software versions and not specification versions.

I'll leave this at your discretion. Our spec determines the API I provide in Stem and those struck me as rough edges as I implemented it, but I don't have a personal interest in if the spec changes or not. Just feedback - that's all.

Cheers! -Damian

(*) Not entirely true, but something to be avoided. We can still rename attributes but it's done so via aliases, with the deprecated name remaining around until Stem 2.x which will be released in 2020 when Python 2.7 is deprecated.

comment:4 in reply to:  3 Changed 6 months ago by juga

Replying to atagar:
[...]

To be clear we can only await the tickets you cite until Stem's next release. That won't be for a long while (rough guess: six months?), but once Stem makes a release these attribute names can no longer change. (*)

[...]
Sounds good, the changes i might need to do must be done before April and i'll try to do them even earlier, before we start collecting the bandwidth files.

comment:5 Changed 6 months ago by juga

As a first step to adapt sbws, i was creating from_dict and to_dict methods in the sbws class.
These are some other things i realized that i'm doing in sbws and/or i think for consistency should probably done differently.

  • header attribute doesn't return timestamp, should it rather return a key timestamp and value timestamp?
  • measurements attribute returns a dictionary, should it rather return a list? (the fingerprint is in node_id key)

When using create (should it have an alias from_dict?):

  • timestamp is not required. Is there a way to pass it?
  • should the headers be passed in a key header to be consistent with header attribute?
  • should content be named measurements to be consistent with measurements attribute?

comment:6 Changed 6 months ago by atagar

Sounds good, the changes i might need to do must be done before April and i'll try to do them even earlier, before we start collecting the bandwidth files.

Great! Sounds like a fine plan. Keeping things as-is until I hear otherwise.

header attribute doesn't return timestamp, should it rather return a key timestamp and value timestamp?

Would you like it to? In our spec the timestamp is a weird exception made for backward compatibility with TorFlow. Personally I think we should change it to 'timestamp=<value>' with a note saying 'in bandwidth file version x the 'timetamp=' key MAY not be present, but MUST be present in future versions'. But clearly not a big whoop.

BandwidthFiles have a separate 'timestamp' attribute but you're right that I don't include it in the 'header' dictionary.

measurements attribute returns a dictionary, should it rather return a list? (the fingerprint is in node_id key)

Structuring this as a dictionary gives more flexibility. Callers can process the measurements as a list...

for measurement in bwfile.measurements.values():
  ... do stuff...

... or get the measurement of a specific relay...

myMeasurement = bwfile.measurements.get('3BB34C63072D9D10E836EE42968713F7B9325F66')

if myMeasurement:
  print('My bandwidth authority measurement was: %s' % myMeasurement)
else:
  print('My relay was not measured by the bandwidth authorities')

Modeling this as a list prevents us from doing the later in constant time. Other descriptor objects (such as the Consensus class) model its entries as a 'fingerprint => record' dictionary for this reason too.

When using create (should it have an alias from_dict?):

That's an interesting idea but there's actually two different methods: create() and content(). These are methods of our base Descriptor class...

https://stem.torproject.org/tutorials/mirror_mirror_on_the_wall.html#can-i-create-descriptors

timestamp is not required. Is there a way to pass it?

Yup, provide 'timestamp' as the key. This is included in the pydoc example...

https://stem.torproject.org/api/descriptor/bandwidth_file.html#stem.descriptor.bandwidth_file.BandwidthFile.content

should the headers be passed in a key header to be consistent with header attribute?

I'd rather keep create() and content() as consistent with other descriptor types as I can. Timestamps and content are already weird unavoidable exceptions.

should content be named measurements to be consistent with measurements attribute?

Their differing types would probably cause confusion. Measurements is a parsed dictionary whereas content are raw lines. Their types differ.

comment:7 in reply to:  6 Changed 6 months ago by juga

Replying to atagar:

Would you like it to? In our spec the timestamp is a weird exception made for backward compatibility with TorFlow. Personally I think we should change it to 'timestamp=<value>' with a note saying 'in bandwidth file version x the 'timetamp=' key MAY not be present, but MUST be present in future versions'. But clearly not a big whoop.

BandwidthFiles have a separate 'timestamp' attribute but you're right that I don't include it in the 'header' dictionary.

While is an idea to have timestamp also as a key-value[*] at some point, it'd be not backward compatible with current Tor versions, so probably won't be changed any time soon.
If the change would require that sbws creates bandwidth files with timestamp as key-value too, then we can't do this change yet.

measurements attribute returns a dictionary, should it rather return a list? (the fingerprint is in node_id key)

Structuring this as a dictionary gives more flexibility. Callers can process the measurements as a list...

[...]

Modeling this as a list prevents us from doing the later in constant time. Other descriptor objects (such as the Consensus class) model its entries as a 'fingerprint => record' dictionary for this reason too.

It's more logical a list according to the bandwidth file specification, but i agree that's a good reason and it's the one we use in sbws to also converted to dictionary in some parts of the code.
So fine like dictionary.

When using create (should it have an alias from_dict?):

That's an interesting idea but there's actually two different methods: create() and content(). These are methods of our base Descriptor class...

[...]

Again thinking on my code/spec logic, not on existing descriptor code. It's fine then :)

[...]

should the headers be passed in a key header to be consistent with header attribute?

I'd rather keep create() and content() as consistent with other descriptor types as I can. Timestamps and content are already weird unavoidable exceptions.

Fair enough..

should content be named measurements to be consistent with measurements attribute?

Their differing types would probably cause confusion. Measurements is a parsed dictionary whereas content are raw lines. Their types differ.

ok.

[*] The idea long term is to format bandwidth files as other descriptor documents (no key-values), so maybe take it into account if you need to change the parser.

comment:8 Changed 6 months ago by atagar

Resolution: worksforme
Status: newclosed

Hi juga, think I'm gonna resolve this. Feel free to reopen if there's remaining issues I missed.

Note: See TracTickets for help on using tickets.