HSv3 descriptor support in stem [encoding]

Trac:
Parent Ticket: #26768 (moved)

added actualpoints::2 component::archived/stem milestone::Tor: unspecified network-team-roadmap-september onionbalance owner::atagar parent::26768 points::5 priority::medium resolution::fixed scaling severity::normal status::closed tor-hs tor-spec type::defect labels

OK I present a draft version of this functionality at: https://github.com/torproject/stem/pull/24

It's a big branch with over 1k lines of code. It's not yet ready for production, but I posted it here to get some feedback from you atagar on what needs to be changed and improved. Also the interface of HiddenServiceV3Descriptor has changed (since we require more data to be able to minimally encode them) and this broke some of the simpler tests in the test suite. I can fix those too.

The branch basically builds up from small things (key blinding), to encoding descriptors and finally to being able to do a full on encode-to-decode unittest using the branch from #31369 (closed). The format of this unittest is the same logic I'm planning to use in onionbalance when encoding v3 descriptors. Also, I passed a stem descriptor to little-t-tor and verified that little-t-tor parses it well.

Starting tomorrow I will be offline for two weeks, so feel free to take your time reading this code. When I get back I will have more time to revise this branch and do any changes you might like.

Enjoy and thanks for the feedback!

Trac:
Status: new to needs_review
Cc: s7r, gk to s7r, gk, atagar

FWIW, my plan for after the branch stabilizes is to create test vectors for little-t-tor to make sure that little-t-tor correctly parses stem descriptors. I've done some rudimentary testing with the version of the branch from comment:1 and it seemed to work but need to do it proper.

Hi asn, swapping our tor-dev@ discussion here as requested.

What is your plan with the hsv3 branch? Should I start reviewing your changes already, or give you more time to do more?

Please give me another week. Ball's in my court, I'll let ya know once it's ready.

My only feedback so far is that the python2 port commits have broken python3 for me (particularly the ed25519 blinding implementation).

Gotcha. For the moment tests pass with both python2 and python3 but I still break them occasionally as I work. If you have python3 issues with the final result please let me know and I'll address them.

Would it be egregious to provide hsv3 support only for python3 users so that we can use python3 features as we wish?

This is actually a particularly interesting question right now. Here's my plans...

PSF plans to drop Python 2.x support in January 2020 and Stem will do the same.
Stem's 1.x series supports python 2.6 and above. In December I plan to release Stem 1.8 which will be the last in the 1.x series and last with python 2.x support.
Stem 2.x will remove python 2.x support, drop deprecated functions, etc. This is our opportunity for cleanup that breaks backward compatibility so I plan to take my time and make our codebase more consistent.
So finally for your question of requiring python3 for hsv3 the answer is "not yet unless absolutely necessary". That said, I would appreciate 'TODO' comments that indicate where we can simplify ourselves when dropping 2.x support. We already have quite a few...

% grep -R TODO stem | grep -i 'python 2'
stem/descriptor/reader.py:    # TODO: When dropping python 2.6 support go back to using 'with' for
stem/descriptor/__init__.py:  # TODO: use 'with' for tarfile after dropping python 2.6 support
stem/client/datatype.py:    # TODO: Python 2.6's struct module behaves a little differently in a couple
stem/client/datatype.py:    # TODO: When we drop python 2.x support this can be simplified via
stem/interpreter/__init__.py:        # TODO: we can use a lambda here when dropping python 2.x support, but
stem/util/test_tools.py:# TODO: Providing a copy of SkipTest that works with python 2.6. This will be
stem/util/test_tools.py:        # TODO: remove and drop unnecessary 'returns' when dropping python 2.6
stem/util/test_tools.py:        # TODO: remove when dropping python 2.6 support
stem/util/log.py:    # TODO: At least in python 2.6 logging.Handler has a bug in that it doesn't
stem/util/__init__.py:    # TODO: I hate doing this but until Python 2.x support is dropped we

Replying to atagar:

Hi asn, swapping our tor-dev@ discussion here as requested.

What is your plan with the hsv3 branch? Should I start reviewing your changes already, or give you more time to do more?

Please give me another week. Ball's in my court, I'll let ya know once it's ready.

Sounds good! Have fun! I will be working on other things in the meanwhile!

My only feedback so far is that the python2 port commits have broken python3 for me (particularly the ed25519 blinding implementation).

Gotcha. For the moment tests pass with both python2 and python3 but I still break them occasionally as I work. If you have python3 issues with the final result please let me know and I'll address them.

Great! FWIW I use Python 3.7.4+ and the descriptor.hidden_service_v3 unittest fails for me in deba20b1.

Would it be egregious to provide hsv3 support only for python3 users so that we can use python3 features as we wish?

This is actually a particularly interesting question right now. Here's my plans...

PSF plans to drop Python 2.x support in January 2020 and Stem will do the same.

Hmm, if we drop support for python2 in 3 months, wouldn't it be OK to write this feature in Python3? I imagine that the feature will be finalized in December or so, given bugfixes and stuff. So this leaves us about a month or even less. We could even delay shipping it in a release before January if needed (and I can work on Onionbalance by using unofficial branches).

The reasons I suggested doing Python3-only is:

The main reason for me is we won't need to spend time hunting down bugs and incompatibilities in Python2. A non-trivial part of your changes are Python2 compatibility improvements, and these usually take time to fix (and find).
We will be able to use nice python3 methods like to_bytes(), fromhex() and hex().
We don't need to worry about str <-> bytes <-> unicode compatibility between python2 and python3
We might be able to make stronger assumptions on the crypto libraries being present.
We won't need to fix the ed25519 python3 code which is hairy crypto code.

I imagine that the feature will be finalized in December

Are we discussing this branch or something else? I expect to finish this branch within a week or so. HSv3 descriptor support will be part of Stem 1.8.

The main reason for me is we won't need to spend time hunting down bugs and incompatibilities in Python2.

Eh. This work is already done and I've dealt with cross version compatibility issues for years. Feel free to leave this to me. It's no big whoop. :)

We will be able to use nice python3 methods like to_bytes(), fromhex() and hex().

We already normalize all throughout our codebase...

% grep -R 'to_bytes\|to_unicode' stem | wc -l
184

We might be able to make stronger assumptions on the crypto libraries being present.

Our cryptography requirement is orthogonal to python 2/3 support. Regardless of our interpreter we need to perform prereq checks.

We won't need to fix the ed25519 python3 code which is hairy crypto code.

Gotcha. If this truly becomes a problem I'll be happy to discuss but for the moment at least I don't envision difficulty in making this branch to work with all python versions.

Replying to atagar:

I imagine that the feature will be finalized in December

Are we discussing this branch or something else? I expect to finish this branch within a week or so. HSv3 descriptor support will be part of Stem 1.8.

That's true, what I meant is that there is still things to be done for the actual hsv3 support to be proper and complete. In particular, support for encoding legacy keys and the OPE scheme in the revision counter are definitely needed for the code to be useful. Plus I imagine that bugs will be found as I use the decoding/encoding functionality in stem, so even tho this will be merged soon, I imagine that the branch will actually be in good shape at some point in late November or early December.

Hello Damian,

I'm back for good now. I see you are still working on the hsv3 branch so I'm not gonna bother you on that. I'm also waiting for you to finish before I jump in and review. In the meanwhile, I will be working on the core onionbalance code.

I see you've been changing a big part of the code. For me, the most important part is not to change the interface of the test_encode_decode_descriptor() unittest, since I'm gonna be using a similar interface in onionbalance.

So far you haven't done any major changes in the test, however you are using the same keys for all introduction points:

    intro_points = [
      IntroductionPointV3.create('1.1.1.1', 9001, expiration, onion_key, enc_key, auth_key, signing_key),
      IntroductionPointV3.create('2.2.2.2', 9001, expiration, onion_key, enc_key, auth_key, signing_key),
      IntroductionPointV3.create('3.3.3.3', 9001, expiration, onion_key, enc_key, auth_key, signing_key),
    ]

For the test to be proper, each introduction point should have unique keys which is what the old _helper_get_intro() function was doing.

Other than that, looks good to me! Keep on doing it! Thanks!

For me, the most important part is not to change the interface

Hi asn. Actually, adjusting the interface is exactly what I did today but don't worry - onionbalance can still do the same.

For example, here's what a descriptor with custom introduction points and random key material looks like...

from stem.descriptor.hidden_service import (
  HiddenServiceDescriptorV3,
  InnerLayer,
  IntroductionPointV3,
)

print(HiddenServiceDescriptorV3.content(
  inner_layer = InnerLayer.create(
    introduction_points = [
      IntroductionPointV3.create('1.1.1.1', 9001),
      IntroductionPointV3.create('2.2.2.2', 9001),
      IntroductionPointV3.create('3.3.3.3', 9001),
    ],
  ),
))

% python demo.py
hs-descriptor 3
descriptor-lifetime 180
descriptor-signing-key-cert
-----BEGIN ED25519 CERT-----
AQgABqvHAX8wXzJY+FqoJQPXNZ8u+SQGPZ1WN/r3hUna0R2AXQnEAQAgBAAuqibl
ALcKa/4nHtLZn2zKV8L4XIpkRyRm7btWPLpYN5Gseb03H5exL+I3SqfG3uNDw5QK
CmPlCQUy3usouSwhO/qWgdy0//bP5kRDma5GDXXWoi3+xTKM6Jez7TGxPAU=
-----END ED25519 CERT-----
revision-counter 1573695064
superencrypted
-----BEGIN MESSAGE-----
aDJodcMjhCvz1K7JCJEAH1H24hvoZ7gZw53AhPdvpHu+5d1Ogwio4qcIXEK1pEgy
QFF1fE6tnCzsk++eMa2WaKwIJYGLPoCnta78H5Ve6VoMj+Pyb5rE6wPTMTPSVm6M
UjllArr7DS8YcofloDxu3iwC3JZYFt/LB6ahq6lBKeot2BD/11pNggkZrZOCLgNQ
pUVyQau7K8ynagVlNNESnI3FccOBaBB4Xa5mObK2ylyiLQ08MqaImW7X2gxeZltT
/C/xtiJXGm2CzkjPpBpMWm09p7/a97GEWca5e8+fhpmGrN7zjAwjYInTvQHS5AyU
7eUFg8ItrRxAiRq4fbe/zepiq2vgfj1Pt7uxC0KCTcLWpd9O/FIvcFSk27Yrtniw
... etc...
-----END MESSAGE-----
signature VDDXXLvgU6qjRI4zfJR3GbQuVjz98qO0LI5gsI60LtGXK2POZ4E+3YVVWuVaEkvMsZaku5qCutIcu74/WQMxCQ

For the test to be proper, each introduction point should have unique keys

The keys are arguments. If you want them to have unique keys we can simply drop them to do the same thing as your helper (I only included the keys in the test to demonstrate how key arguments are provided).

Pretty close to finishing this branch but not there yet. I'll let ya know once I'm ready for a review.

Replying to atagar:

For me, the most important part is not to change the interface

Pretty close to finishing this branch but not there yet. I'll let ya know once I'm ready for a review.

Sounds great! I will be waiting for the branch to start reviewing! :) I also need to make sure that Tor can parse stem-generated descriptors properly.

[PS: There is no padding in the superencrypted layer in the example above.]

Hi asn, finally merged HSv3 descriptor creation support. Many thanks for all the help!

Blinding is disabled by default to avoid its lengthy runtime. I'd be delighted to re-enable this default once we have a performant implementation. To create a descriptor with blinding simply supply a nonce...

HiddenServiceDescriptorV3(blinding_nonce = os.urandom(32))

Is there anything you'd care to have adjusted?

Trac:
Status: needs_review to needs_information

Great work atagar! My first step here is to review the code, and also pass a descriptor created by stem to Tor to make sure that Tor can decode it correctly.

However, the v3 tests fail with errors with Python3.7.5 which is the default python3 for Debian, and this doesn't allow me to proceed with the testing plan.

Most of the errors are bytes/str conversion issues. I tried to manually fix some of those, but they are not easy to fix because the same line sometimes get called with a byte and other times with a str.

The first round of errors can be found here: https://gist.github.com/asn-d6/f00ba099acb8ffd312c8f93648707cc1 but after fixing some of those, more appear.

Even more errors appear if key blinding is enabled and those are hard to fix as discussed above in comment:4.

It would be great if you could check those out.

At the same time, please also see https://github.com/asn-d6/stem/pull/2 for a bit of code review. Most of the comments there make stem descriptors unparseable by Tor, so they are important to fix.

Thanks again for all your work!

Trac:
Status: needs_information to needs_revision

Thanks asn! Finally got my openssl bindings working. Fixed python3 support.

Hm. These fields are not actually optional. They are actually requred by rend-spec-v3.txt for the descriptor to be parseable.

Good point. Fixed.

There is an str/bytes issue with Python3.7.5 here, where stem will actually produce a descriptor that says: enc-key ntor b'sCII8wNTkbwaIReqUWR86TCR==' here.

Another good point. Fixed.

One of the goals of my old _get_middle_descriptor_layer_body() function here was to make the output of stem indistinguishable from the output of Tor.

Sorry, I'm not quite understanding this. The _descriptor_content() arguments are simply default values for mandatory fields - if a caller would care to provide their own desc-auth-ephemeral-key (rather than use a default value) they simply need to provide it...

% cat demo.py 
from stem.descriptor.hidden_service import HiddenServiceDescriptorV3, InnerLayer, IntroductionPointV3

print(HiddenServiceDescriptorV3.content({'desc-auth-ephemeral-key': 'I want a pony...'}))

% python demo.py 
hs-descriptor 3
descriptor-lifetime 180
descriptor-signing-key-cert
-----BEGIN ED25519 CERT-----
AQgABqxVAT0aH48dgIhVpHORH/KnytuFmK+lU3I7LLDs0gKosTXqAQAgBABhYWFh
YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWJiYmJiYmJiYmJiYmJiYmJiYmJi
YmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmI=
-----END ED25519 CERT-----
revision-counter 1574205680
superencrypted
-----BEGIN MESSAGE-----
54+p38IDSEm0ZaOPB4WpTXvAPOt2M9tLeDclSQNJ2PXPhVzsxG3YhO8M4fyQjySm
MiVDypHcDJPEeKDAkG2rq22Iwn2TPNCpcggDDLueOMHxMmWSnbqnkWsIU1nEgQKx
41rI11LrU0JtJ1jRUGeYITxPn+62LZ5vAKFcaW86m2k8snSI0jA0+gpreEllMI1F
fZPRIMlT+lcs8qi1oIh1GxD3dlOiqj1bliEYF9ZD1EL4xXCg2WjZ0UOPA6PTbL/y
cfuAg+5bOIeIVwZEuWYtAbWQjQncL3or42JflExvSqfRhtRdWT5bnGGS4SWxcn0A
NJTfJTk4HOatA4fz0x4wn6PsAoel+gCcsWdGV66zo0jhjJwPgkfkMl724GtL/zqF
cwSPV6zC1gunnZ9IOiYIfAG5l2cKxtwErlAQYuAh+6eGR924AgaX9K+a9Hg/nv3r
9nmvx4+hMhiV9vAjr+W1vBcMb0ScAfJO2XfFHjDJZJLntcYdqaaUsGy/p7MpgO1b
8CKcdx5Fftq2Sr/2QKLFs76o7+uIQfbF8dGDc5wq+m5nizo4FG8Ysedgvblvzx1w
771i3MW4f/jI3DmR7Ib4W3F7v96oMBS2jotUhYSAhk8niIQPwe5eYucCPsOqPRZh
n7Tvgi3q8HKgFhaL0OEg2z9MH54P8YIYuSfQAnkTjzZNgu3VcEfgeg==
-----END MESSAGE-----
desc-auth-ephemeral-key I want a pony...
signature sTxFMYH2iGlqzqioPY76jlq0ByyRbUBC6/coXeZ/PBSbYUNlOUtSkVwQrmkb9+scgxHgMz++qwv5A/2zZnFjCQ

Ball's back in your court.

Trac:
Status: needs_revision to needs_information

Thanks for the fixes atagar! They seem to work just fine, so I resolved all previous comments on the GH PR and added three new ones. Good news! I'm testing a stem created descriptor with little-t-tor and parsing seems to work just fine! :)

One of the goals of my old _get_middle_descriptor_layer_body() function here was to make the output of stem indistinguishable from the output of Tor. Sorry, I'm not quite understanding this. The _descriptor_content() arguments are simply default values for mandatory fields - if a caller would care to provide their own desc-auth-ephemeral-key (rather than use a default value) they simply need to provide it...

Hm. I understand that _descriptor_content() arguments are default values, but what defines a good default value here? In particular, the desc-auth-ephemeral-key and auth-client fields of the outerlayer are usually fake data in little-t-tor which are easy to generate (see my old _get_middle_descriptor_layer_body()). It's true that a caller can provide their own fake data, but why not make the default value more sensible (so that it matches with the one from little-t-tor)? The benefit here will be that stem descriptors will be closer to identical with little-t-tor's, so that a client cannot tell if a descriptor was made by stem or little-t-tor (without the individual applications having to explicitly do this themselves).

In any case and regardless of the above, we do need to provide a single fake auth-client line otherwise the descriptor won't get parsed by Tor (it's a T1N() element in hs_descriptor.c). I left a comment in the PR about this.

Back to you now! Over the next few days I will be testing this deeper through onionbalance (and as key blinding becomes usable again)!

Trac:
Status: needs_information to needs_revision

Hm. I understand that _descriptor_content() arguments are default values, but what defines a good default value here?

Hi George. Yup, I'm delighted to improve our defaults and this is probably simply a matter of us misunderstanding each other's code. Here's what you had...

def _get_fake_clients_bytes():
  """  
  Generate fake client authorization data for the middle layer
  """

  final_bytes = b''
  num_fake_clients = 16  # default for when client auth is disabled

  for _ in range(num_fake_clients):
    client_id = base64.b64encode(os.urandom(8)).rstrip(b'=')
    client_iv = base64.b64encode(os.urandom(16)).rstrip(b'=')
    descriptor_cookie = base64.b64encode(os.urandom(16)).rstrip(b'=')

    final_bytes += b'%s %s %s %s\n' % (b'auth-client', client_id, client_iv, descriptor_cookie)

  return final_bytes


def _get_middle_descriptor_layer_body(encrypted):
  """  
  Get the middle descriptor layer as bytes
  (It's just fake client auth data since client auth is disabled)
  """

  from cryptography.hazmat.primitives.asymmetric.x25519 import X25519PrivateKey
  from cryptography.hazmat.primitives import serialization

  fake_pub_key = X25519PrivateKey.generate().public_key()
  fake_pub_key_bytes = fake_pub_key.public_bytes(encoding = serialization.Encoding.Raw, format = serialization.PublicFormat.Raw)
  fake_pub_key_bytes_b64 = base64.b64encode(fake_pub_key_bytes)
  fake_clients = _get_fake_clients_bytes()

  return b'desc-auth-type x25519\n' \
    b'desc-auth-ephemeral-key %s\n' \
    b'%s' \
    b'%s' % (fake_pub_key_bytes_b64, fake_clients, encrypted)

This didn't explain why values were fabricated this way so I figured it was simply overly complicated and inflexible (callers should have a mechanism to override all descriptor values).

Since you're calling out desc-auth-ephemeral-key in particular here's your version of just that...

fake_pub_key = X25519PrivateKey.generate().public_key()
fake_pub_key_bytes = fake_pub_key.public_bytes(encoding = serialization.Encoding.Raw, format = serialization.PublicFormat.Raw)

b'desc-auth-ephemeral-key %s' % base64.b64encode(fake_pub_key_bytes)

As such, to do the same as your code I could simply change...

return _descriptor_content(attr, exclude, (
  ('desc-auth-type', 'x25519'),
  ('desc-auth-ephemeral-key', base64.b64encode(os.urandom(32))),
), (
  ('encrypted', b'\n' + inner_layer._encrypt(revision_counter, subcredential, blinded_key)),
))

... to...


return _descriptor_content(attr, exclude, (
  ('desc-auth-type', 'x25519'),
  ('desc-auth-ephemeral-key', base64.b64encode(stem.util._pubkey_bytes(X25519PrivateKey.generate()))),
), (
  ('encrypted', b'\n' + inner_layer._encrypt(revision_counter, subcredential, blinded_key)),
))

Is that what you would like? The base64 encoding of a random key? If so that's a very simple tweak to make. Or are you requesting something else?

Finished for today. Few updates...

Fixed python 3.x blinding.
Replied to the pull request comments.
It sounds like Paul might be able to help with our cryptography blinding performance issue.

Replying to atagar:

Is that what you would like? The base64 encoding of a random key? If so that's a very simple tweak to make. Or are you requesting something else?

Yep, that's good for the desc-auth-ephemeral-key part!

Yep, my wrong for not documenting why client auth values were fabricated like that. The reason is that's actually also what little-t-tor does, so ideally we would blend in as well. Here is how tor.git makes fake auth-client entries (https://github.com/torproject/tor/blob/master/src/feature/hs/hs_descriptor.c#L2852) and here how it calculates the number of fakes to add (https://github.com/torproject/tor/blob/master/src/feature/hs/hs_service.c#L1794). For stem we can just always add 16 fake auth client lines and that's good enough for now (since stem does not have support for actual client authorization yet).

Other than that, key blinding works just fine now over here! I also replied on the other PR comments! Paul's link to the optimized ed25519 implementation also seems very interesting for our case!

I will keep on testing and reviewing in the meanwhile while I develop onionbalance. Thanks for the help!

Thanks asn! Quick update...

If ya spot anything else please let me know.

Damian, this looks really good now!

I think we can safely close this ticket and I will open new tickets if I find any new bugs, or need any changes.

Thanks a lot for all your work!!!!

Trac:
Resolution: N/A to fixed
Status: needs_revision to closed

Opening ticket again to report a bug with https://gitweb.torproject.org/stem.git/commit/?id=f5c8c96cdd2cdc967d44ff017b68ea58ce77ca36 :

Right now as the code stands we get the bytes b' prefix in the descriptor. I suggest you change the code to something like:

          client_id = stem.util.str_tools._to_unicode(base64.b64encode(os.urandom(8)).rstrip(b'='))
          iv = stem.util.str_tools._to_unicode(base64.b64encode(os.urandom(16)).rstrip(b'='))
          cookie = stem.util.str_tools._to_unicode(base64.b64encode(os.urandom(16)).rstrip(b'='))

Please let me know if you would prefer me to open a new ticket about this (or any other issues I find)

Trac:
Status: closed to reopened
Resolution: fixed to N/A

Oops! Thanks asn, fixed.

On tor-dev you asked how I'd prefer for issues to be reported. Everything's fine, but my preference would be a new trac ticket. When tor moves to GitLab I'll likely move to GitHub, but until then I'm remaining on trac.

Trac:
Resolution: N/A to fixed
Status: reopened to closed

closed

changed time estimate to 40h

added 16h of time spent

mentioned in issue #31909 (closed)

mentioned in issue #32049 (closed)

HSv3 descriptor support in stem [encoding]

Child items ...

Activity