Opened 20 months ago

Last modified 5 months ago

#18856 new enhancement

Talk with tor's ORPort

Reported by: atagar Owned by: atagar
Priority: Low Milestone:
Component: Core Tor/Stem Version:
Severity: Minor Keywords: descriptor
Cc: yawning, teor Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by atagar)

Long ago tor served and received descriptors on its DirPort but nowadays it uses single-hop circuits on its ORPort instead. In fact, the DirPort is essentially unused besides Stem.

It would be nice if Stem learned how to download descriptors from tor's ORPort. This would allow us to better test relays and the directory authorities. This is a very minor benefit and not worth the amount of effort it would likely take, but we might be able to make this easier by taking advantage of other tor implementations written in python...

A list of all known tor implementations can be found here.

Child Tickets

TicketTypeStatusOwnerSummary
#22882defectclosedThe v4 link protocol requires the initiator to set the most significant bit
#22918defectclosednickmAdd link protocol 5 throughout torspec
#22929defectclosedWhat cells can be sent before a VERSIONS cell, and what is their CIRCID_LEN?
#22931defectclosedWhat happens when a VERSIONS cell is sent outside a handshake?
#22934defectclosednickmPADDING cells can't be sent immediately after a VERSIONS cell
#22937defectclosedClarify how resolved values are encoded in cells
#22948defectacceptedisisPadding, Keepalive and Drop cells should have random payloads
#22951defectclosedNETINFO cells are mandatory, but tor-spec says "may"
#22961defectclosedShould tor-spec say that nodes MUST NOT use TLS compression?
#22987defectclosedteorTAP Hybrid Encryption case 1 is used when the payload is equal to the maximum length
#22994defectnewUse consistently named constants for relay command specifications
#23009defectclosedMake it clear that RELAY_SENDME cells don't have a payload
#23276defectclosedRELAY_CONNECTED cells responding to RELAY_BEGIN_DIR cells don't have a payload

Change History (13)

comment:1 Changed 20 months ago by atagar

Description: modified (diff)

comment:2 Changed 18 months ago by yawning

Cc: yawning added

comment:3 Changed 18 months ago by teor

Cc: teor added

comment:4 Changed 5 months ago by teor

Another alternative would be to teach Trunnel to output python bindings for binary parsing and construction. That would let us re-use Tor's definitions.

This might be a good idea because it involves less duplication of effort. It might be a bad idea because we're not actually programming from the spec.

comment:5 Changed 5 months ago by atagar

Sorry, I might be missing something. Quick look seems to indicate Trunnel is some C parser thing?

http://www.wangafu.net/~nickm/trunnel-manual.html

How is this related to this? And are you advocating for stem to distribute native code? If so that would be a big wrinkle for distribution.

Is the argument that the ORPort protocol is so tricky only native code can talk with it? If so then this ticket indeed might be moot.

comment:6 in reply to:  5 ; Changed 5 months ago by teor

Replying to atagar:

Sorry, I might be missing something. Quick look seems to indicate Trunnel is some C parser thing?

http://www.wangafu.net/~nickm/trunnel-manual.html

How is this related to this?

Trunnel takes a binary format definition, and produces C code that generates and parses that definition.

Tor comes with a set of trunnel definitions for some of its binary formats, and is migrating to trunnel for the rest.

And are you advocating for stem to distribute native code? If so that would be a big wrinkle for distribution.

No, we could teach trunnel to produce python code that generates and parses binary format definitions, then use the tor trunnel definitions to generate them.

Is the argument that the ORPort protocol is so tricky only native code can talk with it? If so then this ticket indeed might be moot.

No, I'm sure it can be done in python. It's just bit and byte manipulation.

Last edited 5 months ago by teor (previous) (diff)

comment:7 in reply to:  6 Changed 5 months ago by teor

Replying to teor:

Replying to atagar:

...
Is the argument that the ORPort protocol is so tricky only native code can talk with it? If so then this ticket indeed might be moot.

No, I'm sure it can be done in python. It's just bit and byte manipulation.

Turns out it's possible to speak tor on the command-line.

If you run tor like this:

tor ORPort 12345 PublishServerDescriptor 0 AssumeReachable 1 Log "info stderr"

You can do a CREATE_FAST handshake like this:
(The lines are the VERSIONS, NETINFO, and CREATE_FAST cells.)

echo 0000 07 0004 0003 0004 \
     00000000 08 59649845 04 04 7f000001 00 `cat /dev/zero | head -c 498 | xxd -p` \
     80000000 05 000102030405060708090a0b0c0d0e0f10111213 `cat /dev/zero | head -c 489 | xxd -p` \
     | xxd -r -p | openssl s_client -connect 127.0.0.1:12345 -quiet | xxd

Want to help me write a python version of this, atagar?

The next step is to implement KDF-TOR [0], and that's a little beyond my shell scripting skills. Or use a proper CREATE/2 handshake, and implement KDF-RFC5869 [1]. Either way, getting beyond CREATE_FAST needs a cell encryption implementation.

[0]: https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n995
[1]: https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1026

P.S. If we want a cell-related pun for a project name, endosome, cytotype, chromatid, or centisome seem to have no conflicts on GitHub. Or we could put it in stem.

comment:8 Changed 5 months ago by atagar

Interesting! The module which downloads descriptors from DirPorts doesn't require a tor binary and things like DocTor don't presently have one. However, if you think this is the best way to go it certainly would be a neat capability to have even if it does have that dependency.

Ideal would be a pure python example of downloading a descriptor from a relay's ORPort but if that's extra tricky certainly, a python example of doing this via the tor binary would be much appreciated. Thanks!

comment:9 in reply to:  8 Changed 5 months ago by teor

Replying to atagar:

Interesting! The module which downloads descriptors from DirPorts doesn't require a tor binary and things like DocTor don't presently have one. However, if you think this is the best way to go it certainly would be a neat capability to have even if it does have that dependency.

I was demonstrating how you could send Tor cells using a small amount of shell script, that could easily be translated into python.

The tor binary is only used to test the code: in my example, I launched a tor relay so that I could get decent logging when my attempts to set up a circuit didn't work. (And because it's impolite to test against other people's relays.)

Ideal would be a pure python example of downloading a descriptor from a relay's ORPort but if that's extra tricky

It's really not that tricky, unless you're trying to do it in bash. It just needs a decent crypto library, and stem already has a dependency on "cryptography".

certainly, a python example of doing this via the tor binary would be much appreciated. Thanks!

A running tor binary automatically downloads descriptors, you can get them via 'GETINFO {desc,md}/id/<fingerprint>'. You can also download individual networkstatus entries via 'GETINFO ns/id/<fingerprint>'.

So it would be easy to use the control port to download descriptors. But we should open a separate ticket for that: this ticket is about using the ORPort to download descriptors.

comment:10 Changed 5 months ago by teor

Here's what I must implement to do this:

  • implement VERSIONS, NETINFO, and CREATE_FAST in python
  • implement KDF-TOR in python
  • implement "hybrid encryption" in python
  • send a BEGINDIR cell containing a directory request
  • decode RELAY_DATA cells
  • make sure the ORPort and DirPort responses match

Here's what I really should implement:

  • sendme cells, so we can download more than 250kB of data
  • closing the stream, circuit (DESTROY cell), and connection properly

Here are optional things that would be nice:

  • parse error responses rather than ignoring them
  • use the v5 link protocol to disable link padding
  • verify relay hashes match the fingerprint
  • other certificate verification
  • do TAP or ntor (needs onion keys)
  • use cryptography.hazmat.primitives.kdf.hkdf.HKDF for KDF-RFC5869
  • other protocol variations from tor-spec
Last edited 5 months ago by teor (previous) (diff)

comment:11 in reply to:  10 Changed 5 months ago by teor

Replying to teor:

Here's what I must implement to do this:

  • implement VERSIONS

I have implemented SSL and the VERSIONS cell in python.

You can follow along on github if you'd like:

https://github.com/teor2345/endosome

comment:12 Changed 5 months ago by atagar

Thanks Tim! Sorry, bit naive - are version cells for getting the tor version? Is there anything end users may want to take advantage of that I should start integrating?

comment:13 in reply to:  12 Changed 5 months ago by teor

Replying to atagar:

Thanks Tim! Sorry, bit naive - are version cells for getting the tor version?

Versions cells contain the link version, which doesn't change very often, and is mostly hidden from users:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n503

Is there anything end users may want to take advantage of that I should start integrating?

Not at this point, as far as I can tell: the interesting stuff happens once I write the circuit encryption code and can exchange data with a remote relay. Until then it's just boring byte packing and spec queries.

Note: See TracTickets for help on using tickets.