Opened 2 years ago

Closed 22 months ago

Last modified 22 months ago

#16020 closed enhancement (implemented)

new field: measured flag

Reported by: cypherpunks Owned by:
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: tyseom, aagbsn Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi Karsten,

I would find it usefull to have a flag measured/unmeasured in relay details objects.

Use cases:

  • make graphs based on how big is the fraction of unmeasured relays (-> detect spikes and BWAuth problems)
  • Atlas would be able to display the flag, helping the operator with debugging

https://lists.torproject.org/pipermail/tor-relays/2015-May/006946.html

Child Tickets

Attachments (2)

unmeasured.py (2.2 KB) - added by aagbsn 23 months ago.
Script to parse votes and count unmeasured relays
vote_test.tar.xz (1.8 MB) - added by leeroy 23 months ago.
An archive containing a current vote to test parsing performance, optionally run unxz to get .tar

Download all attachments as: .zip

Change History (35)

comment:1 Changed 2 years ago by tyseom

  • Cc tyseom added

comment:2 Changed 2 years ago by cypherpunks

Instead of making this field a boolean I'd say make it an int that shows the measured bw and if unmeasured has a value of -1.

comment:3 Changed 2 years ago by aagbsn

  • Cc aagbsn added

comment:4 follow-up: Changed 23 months ago by leeroy

The Consensus Weight is the measured bandwidth from bandwidth authorities. If you're going to add a key based on measured status it might also be worth considering a last_measured based on the last time the relay had a weight line in consensus without Unmeasured.

The graphs you are interested in wouldn't be as easy to produce based on a measured key because then you need to check every hour to see if the status for measured changed for every relay. It's also not exactly deterministic when a measurement is taken (based on ticket review). You're asking Onionoo to do what you could just as easily do with CollecTor and basic system tools.

If you're interested in graphing bandwidth data over time you might be better served by the Onionoo weights document which tracks Consensus Weight.

comment:5 in reply to: ↑ 4 Changed 23 months ago by cypherpunks

Replying to leeroy:

You're asking Onionoo to do what you could just as easily do with CollecTor and basic system tools.

Correct, but isn't almost any other field also sourced from CollecTor data as well?

If the number of bw authorities remain as low a currently it would even be handy to include the measurement per bw auth.

Last edited 23 months ago by cypherpunks (previous) (diff)

comment:6 Changed 23 months ago by leeroy

I just mean why ask for a time to implement in the next year what you can just as easily do without imposing a load on Onionoo? It's not deterministic when it occurs, so you cannot rely on it until the BWAuth tickets are addressed. When will they be fixed is the better question. If you're looking for measurement per BWAuth, Onionoo is the wrong place to ask for an enhancement. As you say the source is CollecTor which is based on live network data. Is the measurement per BWAuth a part of live network data? Then the component for this ticket is not Onionoo.

comment:7 follow-ups: Changed 23 months ago by karsten

Adding a measured flag to details documents shouldn't be difficult. Though the primary purpose would be for operators to find out easily why their relay is not used, not for developers to graph the fraction of unmeasured relays (using CollecTor's data directly is indeed the better way for that).

A few remarks on the possible specification of such a field:

  • This field shouldn't be combined with the existing "consensus_weight" field by setting that to -1 if a relay is unmeasured, because that would override the (unmeasured) bandwidth value contained in the consensus. This change to the current field semantics would also constitute a major protocol change, requiring client developers to update their clients.
  • Turning this field into a "last_measured" field is harder to implement and would require re-parsing descriptor archives. I'd rather want to avoid that.
  • Including all bwauth measurements would certainly be handy, but that would require parsing votes which we don't do right now. Onionoo is already choking on parsing all the descriptors published every hour, and votes are not exactly tiny. I'd say don't expect this to happen anytime soon. But I agree that it would be really useful to have.

I'm sketching out the implementation off the top of my head, in case somebody else wants to try hacking on this:

  • Extend NodeDetailsStatusUpdater to include whether a relay was measured in the relay's NodeStatus and to later include that in the relay's DetailsStatus. Look out for NodeStatus.getConsensusWeight() and NodeStatus.setConsensusWeight() to get the idea.
  • Extend DetailsDocumentWriter to include this information in the relay's DetailsDocument. Similarly, look out for DetailsStatus.getConsensusWeight() for an example.
  • Include new "measured" field in the ResponseBuilder hack in writeDetailsLines() to support the fields parameter.
  • Raise protocol version to 2.5 or whatever the next minor version is. Do this in build.xml and ResponseBuilder and web/protocol.html. Describe new field in web/protocol.html.

I probably won't get to it this week, but I'll try next week if nobody else has coded this until then. It shouldn't be hard. Happy to review (partial) patches.

comment:8 Changed 23 months ago by aagbsn

I wrote a simple script that uses stem to produce csv formatted measurement counts from votes (archived or from the Directory Authorities directly) - it could be easily extended to provide per-bwauth values for each relay, per vote.

Note that parsing the votes seems to take ages - I have a really anemic system and the vote parsing script is only parallelized so far as stem DescriptorReader spawns a separate thread.

I had been thinking along the lines of extracting the relavent attributes from the votes, and providing an API endpoint that would emit json, to produce client-side graphs using d3.js - rather than try and render graphs for every relay, every consensus period. Thoughts?

Changed 23 months ago by aagbsn

Script to parse votes and count unmeasured relays

comment:9 Changed 23 months ago by leeroy

Karsten, implementing last_measured is actually trivial. Onionoo already parses consensus so the time for parsing is already used. Unmeasured is just another piece of data in the weight line so last_measured is based on the last time it was missing. It's trivial because it only requires comparing the timestamp stored (and can be done without string comparison using a hash if needed). It's up to you if it should be implemented though.

A node can have consensus weight without being measured. In which case I don't see how this could possibly help for debugging. Either the BWAuth's are experiencing a known problem (and the problem is easy to identify), or not (and the field does nothing to help). On the other hand an operator can use the weights history document to see changes in bandwidth weight (besides other goodness) in hourly intervals.

It's only if they're new, or reset, and expect a BWAuth scan, to have a problem in obtaining consensus weight. It's also not a future guarantee to be able to get per BWAuth data. Consensus is computed on meeting the threshold number of measurements. From BWAuth tickets dedicated BWAuths scan certain percentiles. So measurements may be the same BWAuth, not unique. In this case you would be parsing votes just to get the different view of bandwidth, from the same BWAuth, at different DirAuth.

But if you're still determined to do this I would propose an optimization. Only parse the current vote, don't parse archives. As a comparison, parsing current consensus takes ~880ms while parsing current votes takes 11s. Which is based on the ideal read benchmark I posted elsewhere.

Last edited 23 months ago by leeroy (previous) (diff)

comment:10 Changed 23 months ago by aagbsn

Each Bandwidth Authority scans the whole network, but the task is split up across 4 (now 8) processes that run on the same physical machine.

comment:11 Changed 23 months ago by aagbsn

I'm also interested to see (e.g. visualize) how different measurements look as seen by each Bandwidth Authority for a given relay - I think this kind of graph would be helpful to think about how geographic diversity in measurement endpoints affects the bandwidth distribution of the Tor network.

comment:12 Changed 23 months ago by leeroy

What about the separate tickets for BWAuths. In particular having dedicated BWAuth for certain percentiles. Unless the two changes are coordinated you will eventually end up obtaining the same BWAuth data.

comment:13 Changed 23 months ago by aagbsn

I'm not sure if I understand what you are suggesting.
Currently each bandwidth voting Directory Authority must receive a complete (>60% of the network) set of measurements, which are obtained independently from the other Directory Authority operators. Each Bandwidth Authority operator measures the entire Tor network (in the ideal case).

The tickets about splitting up or balancing Bandwidth Authorities have to do with making the separate scanner processes complete their fraction of the network at the same time as the other scanner processes (in the same Bandwidth Authority installation), so that the whole network is measured on approximately the same time interval. This isn't very pretty -- hope this makes a bit more sense now?

comment:14 Changed 23 months ago by leeroy

I appreciate the clarification. So the only concern would be if an uncoordinated change to BWAuth behavior occurs. And BWAuths don't have a spec, so it can happen. Wouldn't your desire to visualize measurements be out of scope for Onionoo? It's closer to metrics-web.

comment:15 Changed 23 months ago by aagbsn

The spec is hidden here:
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt

My desire is to provide the appropriate amount of data so that it can be visualized, and yes, I'm not sure where the best home for these metrics might be. Perhaps this is an off-topic aside for this ticket, but perhaps there is a way for onionoo to provide queryable bandwidth history?

comment:16 Changed 23 months ago by leeroy

Oh, a spec I've not read. Thank you. Why's it hidden?

Visualizing measurements is off-topic. Providing the current measurements data, from votes, just so someone can make graph is out of scope (unless you want to change Onionoo's scope). The BWAuth measurements, and when they occur are a property of BWAuths, not a running relay. Visualizing them is better done from metrics-web. Onionoo already provides the Consensus Weight and Weights history document.

That being said as long as Onionoo doesn't try to build a history object from archives (because that's bad news) it looks doable on an hourly basis. If you really intend to expect an operator to parse votes archives then Onionoo needs to make producing the bandwidth measurements optional to setup.

To reinforce feasibility perhaps it would be better to test it first. An extension of the previous parsing result would be "Parse the current interval vote, obtain an `Iterable` mapping fingerprint to set of measurements", and see what kind of memory/time is required. Much better than blindly hacking in new code. But it's up to you's.

Last edited 23 months ago by leeroy (previous) (diff)

comment:17 Changed 23 months ago by atagar

Note that parsing the votes seems to take ages...

Quick thing Aaron: if you're not using Stem 1.4.1 then I'd suggest upgrading since it introduces lazy loading.

comment:18 Changed 23 months ago by aagbsn

I am using stem from git tip.

I was joking about it being hidden, the spec lives in the torflow repository, next to all the BandwidthAuthority code.

The relevant tor-spec.git proposals are:

https://gitweb.torproject.org/torspec.git/tree/proposals/160-bandwidth-offset.txt
https://gitweb.torproject.org/torspec.git/tree/proposals/161-computing-bandwidth-adjustments.txt

Hope this is interesting/useful.

comment:19 Changed 23 months ago by leeroy

Okay, but why isn't it in torspec with the rest. Isn't that where spec documents belong? That would be like having dir-auth's spec in a folder with tor's dir spec code.

Yes, interesting, thank you.

Last edited 23 months ago by leeroy (previous) (diff)

comment:20 follow-up: Changed 23 months ago by aagbsn

I think because tor-spec is for changes to little-t-tor (the program written in C), and the bwauth implementation and other scanner tools compliment but are not part of tor.

Changed 23 months ago by leeroy

An archive containing a current vote to test parsing performance, optionally run unxz to get .tar

comment:21 in reply to: ↑ 20 Changed 23 months ago by leeroy

Replying to aagbsn:

Revisiting the recent mesurements-team meeting, I added an archive with a current vote for testing parsing. You mentioned that the parsing method you used wasn't too good on your machine. If you put the archive attached (either as tar.xz or run unxz to get a tarball closer to parsing recent data) in the folder with the benchmark in #16424, you would be able to see if using metrics-lib would be a better choice for your development system.

comment:22 in reply to: ↑ 7 Changed 23 months ago by cypherpunks

Replying to karsten:

I probably won't get to it this week, but I'll try next week if nobody else has coded this until then. It shouldn't be hard. Happy to review (partial) patches.

awesome!

comment:23 Changed 23 months ago by tom

My historical bwauth votes (for maatuska) are at https://bwauth.ritter.vg/bwauth/

comment:24 in reply to: ↑ 7 Changed 22 months ago by cypherpunks

Replying to karsten:

I probably won't get to it this week, but I'll try next week if nobody else has coded this until then. It shouldn't be hard. Happy to review (partial) patches.

Any update on this?

comment:25 Changed 22 months ago by karsten

  • Status changed from new to needs_review

I just implemented this in branch task-16020 in my public repository, and I'm now testing it locally.

That branch contains a new field "measured" with the following specification: "Boolean field saying whether the consensus weight of this relay is based on a threshold of 3 or more measurements by Tor bandwidth authorities. Omitted if the network status consensus containing this relay does not contain measurement information."

If it doesn't explode here and I don't hear any objections until tomorrow, I'll merge to master and deploy.

comment:26 Changed 22 months ago by karsten

Local test was successful, but see also the fixup commit in the branch. Will squash, merge, and deploy tomorrow unless I hear objections.

comment:27 Changed 22 months ago by karsten

Changing the order of steps a bit: now deployed on https://onionoo.thecthulhu.com/, will merge to master and also deploy on https://onionoo.torproject.org/ in the next couple of days.

comment:28 Changed 22 months ago by cypherpunks

Thanks a lot Karsten!

How likely would you consider including each voted measurement in onionoo data if it stays below <10 measurements per consensus? (the per entry record length is neglectable compared to the length of family data)

Last edited 22 months ago by cypherpunks (previous) (diff)

comment:29 Changed 22 months ago by cypherpunks

I've got a question about the semantics of the measured flag for non-running relays.

Total stats from 2015-08-16 04:00:

+----------+---------+
| measured | #relays |
+----------+---------+
|     NULL |     956 |
|        0 |     338 |
|        1 |    6659 |
+----------+---------+

First I thought onionoo does not provide the measured flag for all non-running relays, but there are also non-running relays with measured set to 'True' and 'False'.

Lets look at an example (non-running relay, no measured flag data in details document, but the last known bw entry was actually based on bwauth data) from relays_published '2015-08-16 04:00:00':

{"nickname":"96ed5c5","fingerprint":"C8FB6DB9BE327A05F5B29A2FFC240DACDB3C2967","or_addresses":["62.108.36.173:443"],"last_seen":"2015-08-14 15:00:00","last_changed_address_or_port":"2015-04-20 14:00:00","first_seen":"2014-05-27 22:00:00","running":false,"flags":["Fast","Running","Stable","Valid"],"country":"de","country_name":"Germany","latitude":51.0,"longitude":9.0,"as_number":"AS30962","as_name":"comtrance GmbH","consensus_weight":10500,"host_name":"62.108.36.173","last_restarted":"2015-08-04 14:40:05","bandwidth_rate":1073741824,"bandwidth_burst":1073741824,"observed_bandwidth":0,"advertised_bandwidth":0,"exit_policy":["reject *:*"],"exit_policy_summary":{"reject":["1-65535"]},"contact":"admin <tor-d0t-p-at-trickhieber.de>","platform":"Tor 0.2.6.10 on Linux","recommended_version":true,"hibernating":true},

From collector data: (taken from last_seen timestamp)
https://collector.torproject.org/recent/relay-descriptors/consensuses/2015-08-14-15-00-00-consensus

r 96ed5c5 yPttub4yegX1spov/CQNrNs8KWc Cv7197uSXYANO54Oz+WfYMCZtK8 2015-08-14 14:33:02 62.108.36.173 443 0
s Fast Running Stable Valid
v Tor 0.2.6.10
w Bandwidth=10500

5 BWAuths provided measurement data.

Why does that relay have no 'measured' flag in onionoo's details document?

thanks!

comment:30 Changed 22 months ago by cypherpunks

After looking at the data ordered by last_seen I saw that there are no relays with non provided measured flag with a last_seen newer than '2015-08-14 15:00:00', so I assume that this is just because this feature has been recently introduced and the 'problem' will go away and only relays with non existing 'w bandwidth= ' consensus entry will have no measured flag.

comment:31 Changed 22 months ago by karsten

Yes, this problem will go away over the next week or so. (If it does not, please let me know, and I'll take another look.)

Regarding including measured bandwidths from votes, that's going to be tricky. The problem is not that details documents would grow. The problem is that we'd have to start parsing vote documents for that, and those are pretty large. I'm not sure whether the hourly updater could handle the load. Happy to keep this ticket open (or create a new ticket for this), but I'd rather not want to promise an ETA for this feature.

comment:32 Changed 22 months ago by karsten

  • Status changed from needs_review to needs_information

Merged to master and deployed on the tpo instance (which will soon come back).

So, what's left to do here? Should we update the summary or create a new ticket? Thanks.

comment:33 Changed 22 months ago by cypherpunks

  • Resolution set to implemented
  • Status changed from needs_information to closed

Karsten, thanks for implementing this feature request!
The 'include all bwauth measurements' topic will go into a separate ticket #16843.

Last edited 22 months ago by cypherpunks (previous) (diff)
Note: See TracTickets for help on using tickets.