HSv3 descriptor support in stem [decoding]

Trac:
Parent Ticket: #26768 (moved)

added actualpoints::2.5 component::archived/stem onionbalance owner::atagar parent::26768 points::4 priority::medium resolution::implemented scaling severity::normal sponsor::27-must status::closed tor-hs type::defect labels

Hi asn, hidden service v3 descriptor support pushed. It only includes the outer later, and does not validate signatures. If this doesn't do the trick for your use case please let me know. :)

Trac:
Status: new to closed
Resolution: N/A to implemented

Replying to atagar:

Hi asn, hidden service v3 descriptor support pushed. It only includes the outer later, and does not validate signatures. If this doesn't do the trick for your use case please let me know. :)

Wow atagar! This is great! I'm returning from camp tomorrow and I will be able to look at the code here.

For onionbalance we will need to decode/encode also the inner-layer since that's where the introduction points are. We will also need to generate valid signatures, and I'm pretty sure we will also need to verify them (but I will double check to make sure).

Are you interested in doing the above, or should I start doing them using your code as skeleton? I'm reopening the ticket for this reason, or I can make new one.

Thanks again! :)

Trac:
Resolution: implemented to N/A
Status: closed to reopened

Sounds great! I'd be happy to help, but patches to do the crypto would absolutely be appreciated. I gave signature verification a quick try yesterday but no dice. Tips welcome. :)

Hello atagar,

I took a look at the code and this looks like a great start!

We do need more stuff to make this work with onionbalance tho. In particular:

a) We need to parse deeper into the descriptor so that we get into the final layer and extract the intro points a1) On the way there we need to verify various types of crypto certificates. a2) Furthermore, we need to implement the key blinding logic of HSv3 to be able to verify some of those certificates.

b) We will need to be able to generate valid and useful HSv3 certificates down to the bottom layer. This involves being able to generate keys and certificates in a way that can be verified by Tor.

From the above, everything except from (a) contains crypto stuff. I will be working on the crypto parts of all the other tasks, but there is a learning curve involved here with learning how stem handles ed25519 certs (stem/certificate.py) and how it handles ed25519 sig verification. I have already started implementing the ed25519 cert parsing that v3 introduces, but I still need to see how the actual crypto is done.

Damian, would you be interested in moving forward with (a) if I give you a full unencrypted descriptor to play with, while I'm doing the crypto parts above? Also would it be possible to split the hidden_service.py file into two files (v2 and v3) so that the two codebases are isolated from each other?

I'd also appreciate any hints about how to handle ed25519 certs and ed25519 sig verification in stem.

Thanks! :)

I will be working on the crypto parts of all the other tasks

Wonderful, thanks asn!

Unsure if it helps but here's where we validate v2 hidden service digests...

https://gitweb.torproject.org/stem.git/tree/stem/descriptor/hidden_service.py#n311 https://gitweb.torproject.org/stem.git/tree/stem/descriptor/init.py#n1036

If it would be easier for you I'm happy to integrate working demos. I can easily make code pretty and add tests - I just need a working example of doing the crypto. :P

Damian, would you be interested in moving forward with (a) if I give you a full unencrypted descriptor to play with, while I'm doing the crypto parts above?

Certainly, delighted to!

Also would it be possible to split the hidden_service.py file into two files (v2 and v3) so that the two codebases are isolated from each other?

Sure, we can but honestly I don't think it makes much difference either way and imports are nicer as this is...

from stem.descriptor.hidden_service import (
  HiddenServiceDescriptorV2,
  HiddenServiceDescriptorV3,
)

# ^ these lines above would become the following if we split the module
#
#   from stem.descriptor.hidden_service_v2 import HiddenServiceDescriptorV2
#   from stem.descriptor.hidden_service_v3 import HiddenServiceDescriptorV3

print('=' * 80)
print('Hidden Service v2 example:')
print('=' * 80)
print('')

print(HiddenServiceDescriptorV2.create())

print('')
print('=' * 80)
print('Hidden Service v3 example:')
print('=' * 80)
print('')

print(HiddenServiceDescriptorV3.create())

I'd also appreciate any hints about how to handle ed25519 certs and ed25519 sig verification in stem.

Sadly I'm pretty green in this area. You already spotted our certificate.py's validate() method, and that's about all I know. Sorry. :(

asn / atagar,

I'm happy to answer questions about certificate validation.

Here's what we have already:

Ed25519 certificate validation in stem's certificate.py. Here's how to validate a signed hash using python's cryptography module: https://gitweb.torproject.org/stem.git/tree/stem/descriptor/certificate.py#n256

Ed25519 key blinding in tor's unit tests. Here's how we blind a key: https://gitweb.torproject.org/tor.git/tree/src/test/ed25519_exts_ref.py#n34 We might be able to rewrite that code better if we have access to some low level functions in python cryptography.

There are also some other useful tor unit tests in python:

HSv3 hash ring indexes: https://gitweb.torproject.org/tor.git/tree/src/test/hs_indexes.py

HSv3 addresses: https://gitweb.torproject.org/tor.git/tree/src/test/hs_build_address.py

Maybe the HSv3 ntor or plain ntor implementations could also help: https://gitweb.torproject.org/tor.git/tree/src/test/hs_ntor_ref.py https://gitweb.torproject.org/tor.git/tree/src/test/ntor_ref.py ntor uses curve25519.

Let me know what you're missing, and I'll do my best to help.

Replying to atagar:

I will be working on the crypto parts of all the other tasks

Wonderful, thanks asn!

Unsure if it helps but here's where we validate v2 hidden service digests...

https://gitweb.torproject.org/stem.git/tree/stem/descriptor/hidden_service.py#n311 https://gitweb.torproject.org/stem.git/tree/stem/descriptor/init.py#n1036

If it would be easier for you I'm happy to integrate working demos. I can easily make code pretty and add tests - I just need a working example of doing the crypto. :P

ACK. Thanks!

Damian, would you be interested in moving forward with (a) if I give you a full unencrypted descriptor to play with, while I'm doing the crypto parts above?

Certainly, delighted to!

OK here is an unencrypted version of the descriptor. The stuff below "encrypted" belongs to the inner layer of the descriptor, while the stuff below "superencrypted" belongs to the middle layer of the descriptor. I also provide a splitted version where it's decomposed: https://gist.github.com/asn-d6/bf5f4da3d6f9fd1e66e630a7b234ddd0

The descriptor decryption/encryption logic is fairly complicated so it will take me some time to implement properly, so feel free to work on this unencrypted thing if it's helpful to you :)

Also would it be possible to split the hidden_service.py file into two files (v2 and v3) so that the two codebases are isolated from each other?

Sure, we can but honestly I don't think it makes much difference either way and imports are nicer as this is...

Hmm, IMO it would be great to have them separated. The whole v3 logic will be a fair amount of code (especially when the crypto parts get in), and it would be helpful to me to know that all the things in the file are relevant to v3.

For personal notekeeping here is the crypto stuff that needs to be implemented in python:

Compute descriptor signing certificate and descriptor signature.
Compute the blinded key from the permanent key using a consensus and the SRV.
Validate descriptor signing certificate using the blinded key.
Validate descriptor signature using the descriptor signing key.
Implement encryption and decryption of descriptor layers.

Thanks for all the pointers to the crypto stuff Teor. I will use them when I start working on this.

FWIW, I learned that Joe Landers started working on onionbalance v3 a few months ago and have some stem code and OB code that could be useful to us: https://github.com/joelanders/stem/commit/e8455584cf50d7a398f994a7ea761baf3c7d6c00 https://github.com/joelanders/onionbalance/commit/1d30e6c5076ec2ee17e4b7a2a63ed72d0c32a670

Damian, perhaps the stem code might be worth checking out in case you can steal stuff. Joe is more than happy for us to use his code.

Trac:
Parent: N/A to #26768 (moved)

Trac:
Points: N/A to 9
Sponsor: Sponsor27-can to Sponsor27-must

Hello atagar,

I have a branch which implements the crypto parts needed to decrypt a descriptor. I made a PR just so that you can do a review, but please don't merge as this is very dirty and also breaks tests: https://github.com/torproject/stem/pull/20

Some comments:

We need to add a mandatory onion_address argument to the parsing function since we can't decrypt any layers of the descriptor without the full onion address. This breaks the tests and we need to fix them.
I made an hsv3_crypto.py file to store all the various crypto utilities but I will move these to the hsv3 file when we make one.

I also added some ATAGAR XXX notes with stem things I need help:

There is some pre-processing that needs to happen while parsing some objects (e.g. certs and encrypted bodies) which I do inline atm, but we need to move them to the parsing layer.
I don't know how to parse the resulting plaintext using the stem parsing functions, so I made a super dirty parse_superencrypted_plaintext() which needs to be made more stem-y.

I'm pretty confident that this can serve as a reasonable basis for more decoding work, so I will be moving to encoding work for now so that I do the crypto parts there too.

Thanks for all the help!

Hello atagar again,

here is another post but this time about descriptor encoding. I started looking at how to encode an HSv3 descriptor which involves putting valid (and user-defined) values in the right fields plus encrypting and signing the descriptor.

It's a pretty big job and I started by doing ed25519 certificate encoding which is certainly needed for this. I had a bit of trouble adapting the certificate.py code to do encoding, since it seems like the Ed25519CertificateV1 class has been made with parsing in mind, but in this case I will need to provide its raw attributes (keys, etc.) and have it encode them into an actual certificate. How would you do that in terms of changing the class logic?

I took a stab at it here: https://github.com/torproject/stem/pull/21 (see last commit). The whole thing works pretty well (see the unittest) but it's very dirty because I made my own class (for encoding) and use the old class for decoding. We need to unify these two classes but I would need your advice as the stem architect here.

[Carrying over from IRC discussion]

Hello atagar! Thanks for the great work here! I really appreciate it!

BTW, any plans on putting this stuff into github so that I can do some inline comments?

20:01 <+atagar> asn: Hi, I'm attempting to understand a comment in certificate.py:
20:01 <+atagar> "ATAGAR XXX certificates are generic and not just for descriptor, however this function assumes they are. this needs to be moved to the descriptor module. the new verify() function is more generic and should be used."
20:01 <+atagar> You are absolutely right that the method is specific to server descriptors and needs to be changed, however perhapse I'm blind but I'm not finding the 'verify()' function you're referencing.

You are right that I didn't add a verify() function. I'm not sure what I was refering to. I think I skipped that for now, since Tor already does this verification as part of HSFETCH.

But yes, IMO we should have a generic certificate validate (or verify?) function that just verifies that the certificate correctly signs the included key using the intended public key (see tor_cert_checksig() in little-t-tor): this is what the current validate() function does up to the signing_key.verify() call. And then we can make server_descriptor_validate() or hsv3_validate() that do more domain-specific things.

22:48 <+atagar> asn: Updated my hsv3 branch with the twiddling I got in today...
22:48 <+atagar> https://gitweb.torproject.org/user/atagar/stem.git/log/?h=hsv3

Great! I see that you have disabled the checksum verification but this actually passes for me. That is, if I remove the comments you added, the test_for_decrypt() unittest passes fine. Could it be something with your sha3 API if you have an older version of Python? I'm using Python 3.7.4+.

22:56 <+atagar> At this point I'm beginning to lose track of all the balls we have in the air. On the ticket you began moving on to descriptor encoding but it would be nice if we got decryption into a mergeable state first. Would you like for me to begin digging into hsv3_crypto or should I move back to the HSv3 descriptor parsing branch I began before this pull request?
23:03 <+atagar> asn: Concerning your onion_address question I moved it from the constructor into the decode method (https://gitweb.torproject.org/user/atagar/stem.git/commit/?h=hsv3&id=14a44b1). At this point what I need is a working unit test that calls "_decrypt()" to get the plaintext inner descriptor. Is that what you had? Since the test didn't make an assertion on the plaintext content I was unsure how far along the pull request is.

Sounds good about the decode() method. That makes sense! Thanks!

Yep there are many balls. I transitioned from decoding to encoding while I was waiting for feedback on my questions, but also because I wanted the encoding done so that I build a proper encode-to-decode unittest for HSv3 descriptors.

The current test_for_decrypt() unittest indeed gets the inner plaintext of the descriptor (and also the middle layer on the process), so we should be good to go on parsing up to that level. Would you be interested in digging more there; that is parsing the middle and inner layers of the descriptor? I think that would be great since I'm not sure what's the best way for stem to add more parsing handlers (for the extra layers) in the middle of the decode function.

In the meanwhile, I will be working on the encoding part, if you don't think that's too chaotic to do.

Two more things:

Commit 14a44b1c6e1438abdf5687a1c468536d88481f81 killed a few XXXs I had added to remind myself in the future (so that we don't forget them before merging).

Also, I see you deleted the onion_address argument from the constructor and that's fine with me, but FYI the encoding methods will also need an extra argument during descriptor encoding. See https://github.com/torproject/stem/pull/21/commits/55bb32c534a35430d6ba506ecf27340f1fd97bbc .

As an extra, I'm splitting this ticket into two. This one (for descriptor decoding) and #31823 (closed) (for descriptor encoding) since both of these will require discussion, and it might be too complex to fit into one ticket.

Trac:
Points: 9 to 4
Actualpoints: N/A to 2.5
Summary: HSv3 descriptor support in stem to HSv3 descriptor support in stem [decoding]

Perfect, thanks asn!

But yes, IMO we should have a generic certificate validate (or verify?) function that just verifies that the certificate correctly signs the included key using the intended public key

Agreed. Thought I just had: lets use isinstance checks here. That is to say...

def validate(self, descriptor):
  if isinstance(descriptor, stem.descriptor.server_descriptor.RelayDescriptor):
    # server descriptor specific things
  elif isinstance(descriptor, stem.descriptor.hidden_service.HiddenServiceDescriptorV3):
    # hidden service specific things
  else:
    raise ValueError('ED25519 certificate validation only presently supports server and hidden service descriptors, not %s' % type(descriptor).__name__)

Great! I see that you have disabled the checksum verification but this actually passes for me.

Interesting! Yes, likely is a pysha3 vs python 3.6 difference. I'll look into it.

Would you be interested in digging more there; that is parsing the middle and inner layers of the descriptor?

Yup! At this point I should have everything I need to run with this. Many thanks asn!

Give me a bit and hopefully I'll have a mergeable branch for you to take a peek at.

In the meanwhile, I will be working on the encoding part, if you don't think that's too chaotic to do.

Nope. Feel free, but I won't be able to give much significant thought to it until this end is wrapped up.

Commit 14a44b1c6e1438abdf5687a1c468536d88481f81 killed a few XXXs I had added to remind myself in the future (so that we don't forget them before merging).

If that's the case would you mind this before encoding so we can get this branch into a mergeable state?

To be clear at this point my plan is...

Productionize hsv3_crypto.py. This will probably include some refactoring.
Add parsing support for the other layers.
Unit test decryption and parsing.
Consider merging the branch.

If there's any additional verifications we'd care to get in prior to merging it would be preferable (but not a big whoop) to get them in during this.

FYI the encoding methods will also need an extra argument during descriptor encoding

Gotcha. That's perfectly fine. Descriptor creation can take additional arguments. For example server_descriptor.py has...

def content(cls, attr = None, exclude = (), sign = False, signing_key = None, exit_policy = None):

BTW, any plans on putting this stuff into github so that I can do some inline comments?

Not at present. I use my personal TPO repo for in flight branches (GitHub is simply a mirror of the official repo). That said, I'm leaning toward migrating to GitHub when the rest of tor moves to GitLab so in the future I'll likely fully be on that platform.

I'm splitting this ticket into two.

Perfect! Thanks.

Replying to atagar:

Perfect, thanks asn!

But yes, IMO we should have a generic certificate validate (or verify?) function that just verifies that the certificate correctly signs the included key using the intended public key

Agreed. Thought I just had: lets use isinstance checks here. That is to say...

{{{ def validate(self, descriptor): if isinstance(descriptor, stem.descriptor.server_descriptor.RelayDescriptor): # server descriptor specific things elif isinstance(descriptor, stem.descriptor.hidden_service.HiddenServiceDescriptorV3): # hidden service specific things else: raise ValueError('ED25519 certificate validation only presently supports server and hidden service descriptors, not %s' % type(descriptor).name) }}}

I'm fine with doing this if you prefer this approach. That said, IMO it's a layer violation: It shouldn't be the certificate's job to validate the descriptor; it should be the descriptor's job to validate the certificate. So IMO, the certificate codebase should be simple (just validate the sig with a public key and make sure it's not expired), and the complicated stuff should happen at the HS subsystem (e.g. for HSv3 we have to validate 4 certificates inside the descriptor).

If you think we should go with this isintance() approach anyhow, that's fine with me.

Would you be interested in digging more there; that is parsing the middle and inner layers of the descriptor?

Yup! At this point I should have everything I need to run with this. Many thanks asn!

Give me a bit and hopefully I'll have a mergeable branch for you to take a peek at.

Fantastic!

Commit 14a44b1c6e1438abdf5687a1c468536d88481f81 killed a few XXXs I had added to remind myself in the future (so that we don't forget them before merging).

If that's the case would you mind this before encoding so we can get this branch into a mergeable state?

What do you mean with this? I'm fine with not having the XXXs in the code for now, but we should not forget about them.

To be clear at this point my plan is...

Productionize hsv3_crypto.py. This will probably include some refactoring.

Add parsing support for the other layers.

Unit test decryption and parsing.

Consider merging the branch.

That's great!

It would also be very very useful to me if we split hidden_service.py to an hsv2 and an hsv3 module. I'd really like to know that all the code in the file is hsv3 related if possible. The hsv3 codebase will grow even larger as we do more of decoding and encoding, and I think it would definitely enjoy its own file.

If there's any additional verifications we'd care to get in prior to merging it would be preferable (but not a big whoop) to get them in during this.

There are more certificate verifications inside the v3 descriptor (auth-key, enc-key, enc-key-cert). We can skip them for an MVP of onionbalance (since tor already verifies them during HSFETCH), but they should ideally be done for correctness and completeness. Perhaps we can do them after the MVP of decoding/encoding is done?

FYI the encoding methods will also need an extra argument during descriptor encoding

Gotcha. That's perfectly fine. Descriptor creation can take additional arguments. For example server_descriptor.py has...

{{{ def content(cls, attr = None, exclude = (), sign = False, signing_key = None, exit_policy = None): }}}

ACK.

BTW, any plans on putting this stuff into github so that I can do some inline comments?

Not at present. I use my personal TPO repo for in flight branches (GitHub is simply a mirror of the official repo). That said, I'm leaning toward migrating to GitHub when the rest of tor moves to GitLab so in the future I'll likely fully be on that platform.

Sounds good to me. Worst case, if we need to inline reviews, I will make a GH PR of my own.

If you think we should go with this isintance() approach anyhow, that's fine with me.

Yup, I do.

What do you mean with this? I'm fine with not having the XXXs in the code for now, but we should not forget about them.

I mean lets either do those 'XXX' things or not. I'm unsure what I should do with those 'XXX' on my end so as I productionize the branch we need to either drop them, do them, or change them into a 'TODO' item for me.

It would also be very very useful to me if we split hidden_service.py to an hsv2 and an hsv3 module.

We might end up going this route. Please give me a while to refactor this branch - when I'm done rejiggering it I'd be delighted to chat about if this would be better as one module or two.

Perhaps we can do them after the MVP of decoding/encoding is done?

Certainly, sounds good.

Hi George, just merged HSv3 decryption support. Anything we should adjust?

https://gitweb.torproject.org/stem.git/commit/?id=f1e1902

Trac:
Status: reopened to needs_information

HSv3 descriptor support in stem [decoding]

Child items ...

Activity