#26155 closed defect (fixed)

Bandwidth file Timestamp is the latest scanner result, not the file creation time

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-spec, tor-bwauth
Cc: juga, pastly Actual Points:
Parent ID: #25925 Points:
Reviewer: Sponsor:

Description (last modified by teor)

The bandwidth file timestamp should be the last time a relay was scanned. But we say it's the file creation time, which is wrong.

in torflow, the timestamp is actually the oldest of the most recent timestamps for all scanners:
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py#n409
(This is buggy in small networks, because the unmeasured node scanner may end up in a state where is never has any nodes to scan. But we are not fixing torflow bugs.)

Here's how I suggest we fix the spec issue:

Replace the initial Timestamp with:

  Timestamp NL

    [At start, exactly once.]

    The Unix Epoch time in seconds of the most recent scanner result.

    If there are multiple scanners which can fail independently, implementations
    SHOULD take the most recent timestamp from each scanner and use the
    oldest value. This ensures all the scanners continue running.

    If there are scanners that do not run continuously, they SHOULD be excluded
    from the timestamp calculation,

    It does not follow the KeyValue format for backwards
    compatibility with version 1.0.0.

Add a file creation date:

  "file_created=" DateTime NL

    [Zero or one time.]

    The date and time timestamp in ISO 8601 format and UTC time zone
    when the file was created.

    This Line has been added in version 1.1.0 of this specification.

Add a latest bandwidth in human-readable format:

  "latest_bandwidth=" DateTime NL

    [Zero or one time.]

    The date and time timestamp in ISO 8601 format and UTC time zone
    of the most recent scanner result.

    This time MUST be identical to the initial Timestamp line.

    This duplicate value is included to make the format easier for people
    to read.

    This Line has been added in version 1.1.0 of this specification.

Child Tickets

Change History (20)

comment:1 Changed 16 months ago by teor

Status: assignedneeds_review

comment:2 Changed 16 months ago by teor

Description: modified (diff)

Clarify what we are fixing

comment:3 Changed 16 months ago by juga

Status: needs_reviewneeds_revision

I added what you suggest, just changed scanner by generator and formatted 72 characters per line.
See https://github.com/torproject/torspec/pull/10

comment:4 Changed 16 months ago by juga

Hmm, timestamp and version have fixed positions, but i don't remember now if we commented that software and software_version should be in 3rd and 4th positions or they can be in arbitrary positions.

comment:5 in reply to:  4 Changed 16 months ago by teor

I reviewed your changes.
I noticed a missed "scanner".
I also noticed an ambiguity introduced by using "generator".

Please flip to merge_ready once you're done.

Replying to juga:

Hmm, timestamp and version have fixed positions,

Timestamp needs to be the first line because that's what all tor implementations expect.
Version needs to be the second line because it can change how the rest of the file is parsed.

but i don't remember now if we commented that software and software_version should be in 3rd and 4th positions or they can be in arbitrary positions.

We have always said they can be in arbitrary positions.

Is there any reason that they need to be near the top?
Otherwise, let's not have a restriction.

comment:6 Changed 16 months ago by juga

Is there any reason that they need to be near the top?

i can't think of any. Initially we added them with that order in sbws, i just forgot to change that.

comment:7 Changed 16 months ago by juga

Other question: in the examples i used ISO 8601 in the format "2018-05-08T16:13:26" (with "T"), though Tor uses "2018-05-08 16:13:26" (with space) in other files.

I did that because if we use ISO 8601 with space in the Bandwidth Lines, it would break the logic of having SP as key_value separator. However we are not using in it in Bandwidth Lines, and in the header Lines the separator is new line. Which format should we use?

comment:8 in reply to:  7 ; Changed 16 months ago by teor

but i don't remember now if we commented that software and software_version should be in 3rd and 4th positions or they can be in arbitrary positions.

We have always said they can be in arbitrary positions.

Is there any reason that they need to be near the top?
Otherwise, let's not have a restriction.

i can't think of any. Initially we added them with that order in sbws, i just forgot to change that.

I don't understand what needs to change in sbws.

The software and software version lines can be in any position on or after the 3rd line.
(The first and second lines are reserved for timestamp and version.)
So putting the software and software version lines in the 3rd and 4th positions is ok.

Can you explain what you want to change in sbws?

Other question: in the examples i used ISO 8601 in the format "2018-05-08T16:13:26" (with "T"), though Tor uses "2018-05-08 16:13:26" (with space) in other files.

I did that because if we use ISO 8601 with space in the Bandwidth Lines, it would break the logic of having SP as key_value separator. However we are not using in it in Bandwidth Lines, and in the header Lines the separator is new line. Which format should we use?

The timestamp format without the space, as specified in:
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt#n93

See nickm's response here:
https://lists.torproject.org/pipermail/tor-dev/2018-May/013141.html

The format with the space is a legacy format that is harder to parse.

More spec questions from https://github.com/pastly/simple-bw-scanner/issues/169#issuecomment-390923878 :

When there are not any scan results (similar to the earliest_bandwith case https://github.com/pastly/simple-bw-scanner/blob/master/sbws/core/generate.py#L136), which should be the timestamp?.

If there are no results in the file, then it really doesn't matter what's in the header.

The timestamp is mandatory, so you can't leave it out.

Here are your options:

  1. specify that the time must be in the past - tor will warn that the file is too old
  2. don't generate the file, delete any existing file - tor will warn that the file is missing
  3. generate an empty file, truncate any existing file - old tors will log stack contents, see #26007

I suggest that we say that generators SHOULD NOT generate a file, and SHOULD delete any existing file, because it is the least confusing option.
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt#n61

If you want, you can also say that:

  • generators SHOULD wait until enough relays are measured before generating the file (option 2)
  • generators MAY use a placeholder timestamp (option 1j, but the time MUST be at least 5 days in the past to avoid silent failures.
  • generators MUST NOT generate an empty file (option 3), because it triggers a bug in tor.

Please update the spec with these changes.

The earliest_bandwidth and latest_bandwidth are optional.
If there is no valid value for these lines, the generator SHOULD leave them out.
Please update the spec with these changes.

I don't understand well what do you mean by "continuously" and "timestamp calculation" in the trac ticket:

If there are scanners that do not run continuously, they SHOULD be excluded from the timestamp calculation

Some torflow scanners do not run continuously. They only run when there are unmeasured relays.

In a small network, if all relays are measured by the other scanners, the latest timestamp for the unmeasured scanner can become too old. Then the timestamp in the file becomes too old, and tor stops voting using the file. But the results are still valid, because the other scanners are still measuring all the relays.

This is a bug in torflow that we won't fix.

But we should document the issue in the spec so future scanners should not implement the same bug.

Replacing "scanners" with "generators" makes this sentence even more confusing. See my comments in the pull request.

AFAIU, sbws run continuously and to generate the bandwidth file use results from a date period https://github.com/pastly/simple-bw-scanner/blob/master/sbws/core/generate.py#L134.
Should we in this case search for older result files than that date period?

I'm not sure I understand what you mean here.
Does "this case" mean "when the scanner has stopped producing results"?

If the scanner stops producing results, then the generator should stop including stale results in the file.

When there are not enough results in the file, the generator SHOULD delete the file, and tor will warn it is missing (See option 2 above.)
If the latest result is older than Tor's voting threshold, then tor will stop voting using the file, and warn it is too old.

comment:9 Changed 16 months ago by juga

Status: needs_revisionneeds_review

Three new commits in the same branch should solve what commented here.
Changing to needs_review and not merge_ready, in case i made some other mistake.

comment:10 Changed 16 months ago by juga

Parent ID: #25925

comment:11 in reply to:  8 ; Changed 16 months ago by teor

Status: needs_reviewneeds_revision

Please see my comments on those commits.

Also, it looks like the changes below weren't made in the spec.

Can you tell me what you decided to do about these issues:

Replying to teor:


When there are not any scan results (similar to the earliest_bandwith case https://github.com/pastly/simple-bw-scanner/blob/master/sbws/core/generate.py#L136), which should be the timestamp?.

If there are no results in the file, then it really doesn't matter what's in the header.

The timestamp is mandatory, so you can't leave it out.

Here are your options:

  1. specify that the time must be in the past - tor will warn that the file is too old
  2. don't generate the file, delete any existing file - tor will warn that the file is missing
  3. generate an empty file, truncate any existing file - old tors will log stack contents, see #26007

I suggest that we say that generators SHOULD NOT generate a file, and SHOULD delete any existing file, because it is the least confusing option.
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt#n61

If you want, you can also say that:

  • generators SHOULD wait until enough relays are measured before generating the file (option 2)
  • generators MAY use a placeholder timestamp (option 1j, but the time MUST be at least 5 days in the past to avoid silent failures.
  • generators MUST NOT generate an empty file (option 3), because it triggers a bug in tor.

Please update the spec with these changes.

The earliest_bandwidth and latest_bandwidth are optional.
If there is no valid value for these lines, the generator SHOULD leave them out.
Please update the spec with these changes.

comment:12 in reply to:  11 ; Changed 16 months ago by juga

Replying to teor:

Can you tell me what you decided to do about these issues:

  1. don't generate the file, delete any existing file - tor will warn that the file is missing

This is what i decided. In the code is in https://github.com/pastly/simple-bw-scanner/pull/175/files#diff-5d572187bd8e5853f29b5e173a4fa8efR119.

However, i don't delete any existing files because i don't understand the logic.
If previous files were generated by the same code, they will have the correct timestamp of previous results.

I suggest that we say that generators SHOULD NOT generate a file,

i say MUST NOT: https://github.com/torproject/torspec/pull/10/files#diff-824f4c7404328d8aa1101d74ba36884dR132

and SHOULD delete any existing file, because it is the least confusing option.

but not this because of what commented above.

If you want, you can also say that:

  • generators SHOULD wait until enough relays are measured before generating the file (option 2)
  • generators MAY use a placeholder timestamp (option 1j, but the time MUST be at least 5 days in the past to avoid silent failures.
  • generators MUST NOT generate an empty file (option 3), because it triggers a bug in tor.

i say "MUST NOT generate the file", though i don't mention "empty" nor that it triggers a bug. Is it needed?.

The earliest_bandwidth and latest_bandwidth are optional.

Optional is implicit by "zero or one time": https://github.com/torproject/torspec/pull/10/files#diff-824f4c7404328d8aa1101d74ba36884dR173

If there is no valid value for these lines, the generator SHOULD leave them out.

If file is not created when there are not results (no "latest_bandwidth"), then "file_created" won't exist. Same for "latest_bandwidth". Is it needed to say so?

comment:13 in reply to:  12 Changed 16 months ago by teor

Replying to juga:

Replying to teor:

Can you tell me what you decided to do about these issues:

  1. don't generate the file, delete any existing file - tor will warn that the file is missing

This is what i decided. In the code is in https://github.com/pastly/simple-bw-scanner/pull/175/files#diff-5d572187bd8e5853f29b5e173a4fa8efR119.

However, i don't delete any existing files because i don't understand the logic.
If previous files were generated by the same code, they will have the correct timestamp of previous results.

Yes, you're right. So it is ok to leave the old file as it is.

I suggest that we say that generators SHOULD NOT generate a file,

i say MUST NOT: https://github.com/torproject/torspec/pull/10/files#diff-824f4c7404328d8aa1101d74ba36884dR132

and SHOULD delete any existing file, because it is the least confusing option.

but not this because of what commented above.

Ok.

If you want, you can also say that:

  • generators SHOULD wait until enough relays are measured before generating the file (option 2)
  • generators MAY use a placeholder timestamp (option 1j, but the time MUST be at least 5 days in the past to avoid silent failures.
  • generators MUST NOT generate an empty file (option 3), because it triggers a bug in tor.

i say "MUST NOT generate the file", though i don't mention "empty" nor that it triggers a bug. Is it needed?.

I'm not sure. The current version is ok.

The earliest_bandwidth and latest_bandwidth are optional.

Optional is implicit by "zero or one time": https://github.com/torproject/torspec/pull/10/files#diff-824f4c7404328d8aa1101d74ba36884dR173

That is good..

If there is no valid value for these lines, the generator SHOULD leave them out.

If file is not created when there are not results (no "latest_bandwidth"), then "file_created" won't exist. Same for "latest_bandwidth". Is it needed to say so?

That is good.

comment:15 Changed 16 months ago by juga

Status: needs_revisionneeds_review

Modified according to your comments. Feel free to squash if needed.

comment:16 Changed 16 months ago by juga

Status: needs_reviewmerge_ready

comment:17 Changed 15 months ago by nickm

Hi! There's a conflict in the example, and I'm not sure how to resolve it. Could one of you please give it a try?

comment:18 Changed 15 months ago by teor

Status: merge_readyneeds_revision

comment:19 Changed 15 months ago by juga

Status: needs_revisionmerge_ready

Rebased to master, should be fine now

comment:20 Changed 15 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

Thanks! Merged to torspec.

Note: See TracTickets for help on using tickets.