Opened 22 months ago

Last modified 16 months ago

#24229 assigned enhancement

Provide BGP Data Collection on Tor Metrics

Reported by: iwakeh Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by iwakeh)

Yixin's data description:

data&spec

  • Data (06/2016 - 08/2017): We put each month's BGP updates into a single txt file, compressed with xz -9e into [year]-[month]-updates.txt.xz. These are the files under all-updates/. The all-updates.tar is basically a tarball of the all-updates/ directory.
  • Software (detection.py): This is our script to analyze the data (also linked in the html page).

====

Next steps (semi-random order):

  • find a place for the data, i.e., a path on CollecTor
  • determine a data update process
  • integrate html description into Metrics' site

Child Tickets

TicketTypeStatusOwnerSummary
#25402enhancementclosedmetrics-teamAdd `contrib` section to CollecTor's file structure protocol
#25403enhancementneeds_revisionmetrics-teamCreate jsp with bgp documentation
#25624enhancementclosediwakehIndex 'contrib' directory

Change History (11)

comment:1 Changed 22 months ago by iwakeh

Description: modified (diff)
Owner: changed from metrics-team to iwakeh
Status: newaccepted

comment:2 Changed 18 months ago by iwakeh

Status: acceptedneeds_information

Suggested file structure for external data provided by CollecTor:

https://collector.torproject.org/third-party-archive/

The next level will consist of the third party main name, i.e., the path for BGP Data:

https://collector.torproject.org/third-party-archive/<bgp|counter-raptor>

(<bgp|counter-raptor>: meaning one or the other name or another suggestion.)

Below the thrid party main level there could be archive, source-code, and docs. For example: https://collector.torproject.org/third-party-archive/<bgp|counter-raptor>/archive will contain the files yyyy-mm-updates.txt.xz, https://collector.torproject.org/third-party-archive/<bgp|counter-raptor>/source-code just the detection.py script, and the provided documentation counter-raptor.html will be in https://collector.torproject.org/third-party-archive/<bgp|counter-raptor>/doc slightly adapted to point to prevent broken links, if possible.

The official documentation will be a jsp file based on the provided documentation. The path for the documentation integrated to metrics.tp.o could be https://metrics.torproject.org/counter-raptor.html, which will be linked from Other sources.

If the above paths are agreed on:

  • the jsp and referring links need to be created and
  • files need to be copied to the new structure below https://collector.torproject.org/third-party-archive/.

Thoughts, additions, suggestions?

Last edited 18 months ago by iwakeh (previous) (diff)

comment:3 Changed 18 months ago by iwakeh

Cc: metrics-team added

comment:4 Changed 18 months ago by iwakeh

Addition to comment:2: The data update process needs to be defined still.
(Not in the scope of this ticket.)

Last edited 18 months ago by iwakeh (previous) (diff)

comment:5 Changed 18 months ago by iwakeh

Addendum to comment:2:

The data should be verifiable in some way. There are no checksums and signatures for other collecTor data, but for the third party data it might be useful to have signatures from the providing party even.

comment:6 Changed 18 months ago by karsten

Some quick thoughts on paths:

  • https://collector.torproject.org/third-party-archive/ feels unnecessarily long. Why not pick something shorter like https://collector.torproject.org/contrib/?
  • The subdirectories archive, source-code, and docs could be named by the contributor or as part of the contribution process. That is, we don't really have to settle on names for future contributions there. Personally, I'd keep those short, too, and say src rather than source-code, but I'm fine with the longer variant, too.
  • We should probably have a contrib subdirectory in https://metrics.torproject.org/contrib/counter-raptor.html. Otherwise we're limiting future contributions to names that are not already taken by our pages and likewise limiting ourselves to not pick names that are already taken by contributions.
  • We should pick the same contribution subdirectory name for both CollecTor and Tor Metrics. That is, not bgp for CollecTor and counter-raptor for Tor Metrics.

Regarding data being verifiable, I agree that this would be good. We could start without that requirement if we have to, but then we should put it on a list and do it soon after.

comment:7 Changed 18 months ago by iwakeh

Yes, I think contrib for all third-party contributions is the right choice! Let's choose bgp, because it is short.

Thus, we would have:

Suggested file structure for external data provided by CollecTor:

https://collector.torproject.org/contrib/

The next level will consist of the third party main name, i.e., the path for BGP Data:

https://collector.torproject.org/contrib/bgp

Agreed, below the third party main level we can use whatever is provided by the contributing third-party, that is within reasonable naming conventions (sort of *nix server style).

The official documentation will be a jsp file based on the provided documentation. The path for the documentation integrated to metrics.tp.o could be https://metrics.torproject.org/contrib/bgp.html, which will be linked from Other sources.

Next steps:

  • the jsp and referring links need to be created and
  • files need to be copied to the new structure below https://collector.torproject.org/contrib/.

comment:8 Changed 18 months ago by karsten

Sounds good to me!

comment:9 Changed 17 months ago by iwakeh

Should the 'contrib' path be part of index.json?

comment:10 in reply to:  9 Changed 17 months ago by karsten

Replying to iwakeh:

Should the 'contrib' path be part of index.json?

I think that makes sense, yes.

comment:11 Changed 17 months ago by iwakeh

Owner: changed from iwakeh to metrics-team
Status: needs_informationassigned

Assigning to metrics-team as this ticket is simply used for grouping the child tickets.

Note: See TracTickets for help on using tickets.