#22827 closed enhancement (implemented)

Formalise CollecTor spec for sanitised bridge descriptors and put in torspec

Reported by: isis Owned by: karsten
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-spec, tor-docs, tor-bridges
Cc: isis, iwakeh, karsten Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Perhaps we should make https://collector.torproject.org/#type-bridge-network-status a bit more formal and put it into the torspec repo somewhere?

See this comment by iwakeh on #22207 and this comment by karsten on #18797 and karsten's ANTLR 4 grammar for bridge networkstatuses (note that the grammar will work for both sanitised and unsanitised bridge networkstatuses).

The related ticket, for making a spec for unsanitised bridge descriptors, is #22826.

Child Tickets

Attachments (5)

san-bridge-spec.xml (28.9 KB) - added by karsten 22 months ago.
san-bridge-spec.raw.txt (24.2 KB) - added by karsten 22 months ago.
san-bridge-spec-v1-comments.tgz (18.0 KB) - added by iwakeh 21 months ago.
xml, html with comments etc.
san-bridge-spec.html (49.4 KB) - added by karsten 21 months ago.
New draft of this specification document
bridge-descriptors.jpg (221.9 KB) - added by karsten 21 months ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 22 months ago by karsten

Agreed on the idea of making that specification more formal.

The part that I'm not yet sure about is whether that document should live in the torspec repository. In theory, it would be useful to have more formal specifications for all descriptor types available in CollecTor, but I'm not sure if all of them would fit into torspec. Maybe it's better to keep the more formal sanitized bridge descriptors specification in a metrics repository, because that's where the sanitizing step takes place.

comment:2 in reply to:  1 Changed 22 months ago by isis

Replying to karsten:

Maybe it's better to keep the more formal sanitized bridge descriptors specification in a metrics repository, because that's where the sanitizing step takes place.


Sure, that makes sense!

comment:3 Changed 22 months ago by dgoulet

Milestone: Tor: unspecified

comment:4 Changed 22 months ago by nickm

This can probably refer to torspec.git/attic/dir-spec-v2.txt , where the v2 networkstatus format is documented. All that would leave to document is the ways in which the bridge networkstatus document differs.

comment:5 Changed 22 months ago by karsten

Owner: set to karsten
Status: newaccepted

FWIW, I started writing a specification document for sanitized bridge descriptors. Grabbing this ticket. If somebody else wants to work on this, feel free to ask how far I got to avoid duplicating efforts. I'll comment here when I have something to show/discuss.

Changed 22 months ago by karsten

Attachment: san-bridge-spec.xml added

Changed 22 months ago by karsten

Attachment: san-bridge-spec.raw.txt added

comment:6 Changed 22 months ago by karsten

Status: acceptedneeds_review

Alright, I finished a first draft of a specification document for sanitized bridge descriptors. And the cool part is that it's written in XML and compiled using xml2rfc! This is certainly not going to be the final draft, but I seriously need to switch over to other tasks for at least the next 48 hours or risk serious specification sickness. This draft is also in a good stage to get coarse feedback. If there's anything I should take into account when picking this up in a few days, please be sure to let me know by leaving a comment here! Thanks in advance!

Changed 21 months ago by iwakeh

xml, html with comments etc.

comment:7 Changed 21 months ago by iwakeh

Yes, there is some risk of specification sickness :-)

I added my comments as cref elements, which should make easy to spot them. I tried to change all references to dir-spec into "Tor directory protocol, version 3", removed very few typos, and replaced tabs by spaces.

The attached xml can be processed into html using xslt; this makes things a little prettier and is more adjustable than 'pure' rfc. The xml almost the same, but 'pure' rtf refused to insert the comments into the text and didn't allow eref tags inside a comment.
The tools for xslt processing are listed in a comment. But, no need to fiddle with that immediately as I also attached the generated html.

comment:8 Changed 21 months ago by karsten

iwakeh, thanks for the review and the comments! Using XSLT instead of the RFC tool sounds reasonable to me. I have a few responses to your comments and a couple of unrelated changes from re-reading the draft. How about we add the XML file to an existing Git repository somewhere, so that we can track changes? Maybe somewhere in metrics-web, with the intention to produce a HTML output and put it on the Tor Metrics website once it's ready? If this makes sense, do you want to put my first draft and your modified XML file there?

comment:9 Changed 21 months ago by karsten

I started a new metrics-web branch for tracking changes to this specification document. I can't finish this right now, but I'm planning to include more changes and post a link here. If you were planning to work on this, better wait another 12 hours. Thanks.

Changed 21 months ago by karsten

Attachment: san-bridge-spec.html added

New draft of this specification document

comment:10 Changed 21 months ago by karsten

Obviously, I didn't mean 12 hours but 48 hours! Please find my task-22827 metrics-web branch with an updated XML file. I also attached the HTML output to this ticket.

comment:11 Changed 21 months ago by iwakeh

Looks fine!

Maybe keep the comment section small by removing all resolved comments, like c00, c00a, c01,c01a (actually the entire xslt discussion), c07, c07a.
c03&c03a could be replaced by a remark that ANTLR is too much overhead for the moment.
Regarding c02, c02a: There is no list of references in the new html. If that resolves the issue, it could also be removed.

Minor format issue/nitpick:
I don't find the choice of [:n] very intuitive as operator (and also applying it from the right. Maybe just use a function notation, e.g., Bn. For example:
KeyedHash = SHA256(Ipv4Address | Fingerprint | Secret)[:3] could become
KeyedHash = B3(SHA256(Ipv4Address | Fingerprint | Secret)).

Shouldn't the additional xslt and awk files used for generating html be also added to git?

comment:12 in reply to:  11 ; Changed 21 months ago by karsten

Replying to iwakeh:

Looks fine!

Maybe keep the comment section small by removing all resolved comments,

Ah, yes. In fact, I thought we'd remove the entire WIP section and the build instructions before publishing.

like c00, c00a, c01,c01a (actually the entire xslt discussion), c07, c07a.

Yep, they can all go away.

c03&c03a could be replaced by a remark that ANTLR is too much overhead for the moment.

Yep, good idea.

Regarding c02, c02a: There is no list of references in the new html. If that resolves the issue, it could also be removed.

Works for me.

Minor format issue/nitpick:
I don't find the choice of [:n] very intuitive as operator (and also applying it from the right. Maybe just use a function notation, e.g., Bn. For example:
KeyedHash = SHA256(Ipv4Address | Fingerprint | Secret)[:3] could become
KeyedHash = B3(SHA256(Ipv4Address | Fingerprint | Secret)).

This notation is being used in other Tor specification documents. That's why I'd rather not touch it without suggesting a more formal and hopefully at the same time more intuitive notation for the remaining specification parts. Do you mind if we keep this unchanged for now?

Shouldn't the additional xslt and awk files used for generating html be also added to git?

Yes. I just wasn't sure whether this will be the final place for putting the specification file, so I only checked in the source file. But they can certainly go into the repo.

How do we proceed (assuming we can resolve the remaining parts above)? How would we include the generated HTML in the Tor Metrics website? Do we edit the XSLT template for that, or do we write a shell script for this that rips out the pieces of the generated HTML file that we want, or what?

comment:13 in reply to:  12 Changed 21 months ago by iwakeh

Replying to karsten:

Replying to iwakeh:

Looks fine!

Maybe keep the comment section small by removing all resolved comments,

Ah, yes. In fact, I thought we'd remove the entire WIP section and the build instructions before publishing.

The instructions could be turned into a comment in the xml, if the xslt-processing doesn't complain.

like c00, c00a, c01,c01a (actually the entire xslt discussion), c07, c07a.

Yep, they can all go away.

c03&c03a could be replaced by a remark that ANTLR is too much overhead for the moment.

Yep, good idea.

Regarding c02, c02a: There is no list of references in the new html. If that resolves the issue, it could also be removed.

Works for me.

Minor format issue/nitpick:
I don't find the choice of [:n] very intuitive as operator (and also applying it from the right. Maybe just use a function notation, e.g., Bn. For example:
KeyedHash = SHA256(Ipv4Address | Fingerprint | Secret)[:3] could become
KeyedHash = B3(SHA256(Ipv4Address | Fingerprint | Secret)).

This notation is being used in other Tor specification documents. That's why I'd rather not touch it without suggesting a more formal and hopefully at the same time more intuitive notation for the remaining specification parts. Do you mind if we keep this unchanged for now?

I don't insist on the change.

Shouldn't the additional xslt and awk files used for generating html be also added to git?

Yes. I just wasn't sure whether this will be the final place for putting the specification file, so I only checked in the source file. But they can certainly go into the repo.

How do we proceed (assuming we can resolve the remaining parts above)? How would we include the generated HTML in the Tor Metrics website? Do we edit the XSLT template for that, or do we write a shell script for this that rips out the pieces of the generated HTML file that we want, or what?

We found a place for the page (cf. team meeting).
I think it would be best to adapt the XSLT. That would complete the task in one processing step.
Maybe, first define the target page (incl. colors etc.) and then give it a try with XSLT?
shell scripting should be last resort.

Changed 21 months ago by karsten

Attachment: bridge-descriptors.jpg added

comment:14 Changed 21 months ago by karsten

I made progress on integrating the output into Tor Metrics. Please admire this screenshot of the result, and please review my task-22827 branch with three commits:

  • 80fb379 is simply cherry-picked from master in order to use libraries from Debian stretch.
  • b8d2dc8 is where I made most of the changes and which deserves a careful review.
  • 816e245 adds the RFC 2629 files without modification.

Things left to do:

  • Go through the comments to see which need more work, and ideally remove most or all of them afterwards.
  • Put a link to the new page to sources.html under a new short section "Specifications" or similar.
  • Remove duplicate parts from collector.html.
  • What else?
Last edited 21 months ago by karsten (previous) (diff)

comment:15 Changed 21 months ago by iwakeh

Status: needs_reviewneeds_revision

According to our project structure I'd prefer to have the folder spec in src/main/resources.

The saxon-packages I can find in debian jessie or stretch only supply saxon-xslt, but that doesn't work either when calling convert.sh. What could be amiss?

The 'things left to do' seems complete (except for making the process work).

And, the screenshot looks fine :-)

comment:16 in reply to:  15 ; Changed 21 months ago by karsten

Status: needs_revisionneeds_review

Thanks for the review!

Replying to iwakeh:

According to our project structure I'd prefer to have the folder spec in src/main/resources.

Ah, you mean move from src/main/spec/ to src/main/resources/spec/? Sure.

The saxon-packages I can find in debian jessie or stretch only supply saxon-xslt, but that doesn't work either when calling convert.sh. What could be amiss?

Ah, the reason is probably that I'm using the Saxon that comes with brew, not the one in Debian. I'll create a symlink here and change it in the script.

The 'things left to do' seems complete (except for making the process work).

And, the screenshot looks fine :-)

Okay, cool! I made the remaining edits and pushed a few more commits to my task-22827 branch.

I'd like to squash these commits into one for the new specification content and another one for the RFC files. Do you mind?

After that I'll merge into master and deploy.

comment:17 in reply to:  16 Changed 21 months ago by iwakeh

Replying to karsten:

Thanks for the review!

Replying to iwakeh:

According to our project structure I'd prefer to have the folder spec in src/main/resources.

Ah, you mean move from src/main/spec/ to src/main/resources/spec/? Sure.

Thanks!

The saxon-packages I can find in debian jessie or stretch only supply saxon-xslt, but that doesn't work either when calling convert.sh. What could be amiss?

Ah, the reason is probably that I'm using the Saxon that comes with brew, not the one in Debian. I'll create a symlink here and change it in the script.

There might be more to it:
Running convert.sh leads to:

website/src/main/resources/spec$ ./convert.sh 
INFO: unsupported rfc pseudo-attribute 'topblock'
line 359 column 700 - Error: <header> is not recognized!
line 359 column 700 - Warning: discarding unexpected <header>
line 359 column 765 - Warning: discarding unexpected </header>
line 359 column 774 - Error: <section> is not recognized!
line 359 column 774 - Warning: discarding unexpected <section>
line 359 column 2004 - Warning: discarding unexpected </section>
line 359 column 2014 - Error: <section> is not recognized!
line 359 column 2014 - Warning: discarding unexpected <section>
line 359 column 2391 - Error: <section> is not recognized!
line 359 column 2391 - Warning: discarding unexpected <section>
line 359 column 3031 - Warning: discarding unexpected </section>
line 359 column 3041 - Error: <section> is not recognized!
line 359 column 3041 - Warning: discarding unexpected <section>
line 359 column 4039 - Warning: discarding unexpected </section>
line 359 column 4049 - Error: <section> is not recognized!

and only a zero byte file 'bridge-descriptors.jsp'.

The 'things left to do' seems complete (except for making the process work).

And, the screenshot looks fine :-)

Okay, cool! I made the remaining edits and pushed a few more commits to my task-22827 branch.

I'd like to squash these commits into one for the new specification content and another one for the RFC files. Do you mind?

After that I'll merge into master and deploy.

comment:18 Changed 21 months ago by karsten

Looks like we can suppress that first INFO message by a simple change that you'll now find in my branch. The other warnings result from you not having tidy installed. Can you try to install that and re-run the script?

comment:19 Changed 21 months ago by iwakeh

Status: needs_reviewmerge_ready

I did have tidy installed, but an older version HTML Tidy for Linux released on 25 March 2009.
The script needs HTML Tidy for Linux version 5.2.0.
Now, convert.sh runs without complaints and generates a valid 'bridge-descriptors.jsp.

If the metrics-web host has the new tidy, things should be fine.

Last edited 21 months ago by iwakeh (previous) (diff)

comment:20 Changed 21 months ago by karsten

Glad to hear that it works now!

Regarding the tidy version issue, how about we put the generated JSP file under version control instead? Then we don't have to worry about the tidy version on the metrics-web host. Another benefit is that we can track changes to the JSP file, including how changes to the XML affect the JSP. Any concerns against that?

comment:21 in reply to:  20 Changed 21 months ago by iwakeh

Replying to karsten:

Glad to hear that it works now!

Regarding the tidy version issue, how about we put the generated JSP file under version control instead? Then we don't have to worry about the tidy version on the metrics-web host. Another benefit is that we can track changes to the JSP file, including how changes to the XML affect the JSP. Any concerns against that?

Adding the JSP is a good idea.

comment:22 Changed 21 months ago by karsten

Alright, I'll add the JSP and squash and merge and deploy and close. Probably later today. Thanks again!

comment:23 Changed 21 months ago by karsten

Resolution: implemented
Status: merge_readyclosed

And now it's deployed! Closing.

Note: See TracTickets for help on using tickets.