It's a big branch with over 1k lines of code. It's not yet ready for production, but I posted it here to get some feedback from you atagar on what needs to be changed and improved. Also the interface of HiddenServiceV3Descriptor has changed (since we require more data to be able to minimally encode them) and this broke some of the simpler tests in the test suite. I can fix those too.
The branch basically builds up from small things (key blinding), to encoding descriptors and finally to being able to do a full on encode-to-decode unittest using the branch from #31369 (closed). The format of this unittest is the same logic I'm planning to use in onionbalance when encoding v3 descriptors. Also, I passed a stem descriptor to little-t-tor and verified that little-t-tor parses it well.
Starting tomorrow I will be offline for two weeks, so feel free to take your time reading this code. When I get back I will have more time to revise this branch and do any changes you might like.
Enjoy and thanks for the feedback!
Trac: Status: new to needs_review Cc: s7r, gk to s7r, gk, atagar
FWIW, my plan for after the branch stabilizes is to create test vectors for little-t-tor to make sure that little-t-tor correctly parses stem descriptors. I've done some rudimentary testing with the version of the branch from comment:1 and it seemed to work but need to do it proper.
Hi asn, swapping our tor-dev@ discussion here as requested.
What is your plan with the hsv3 branch? Should I start reviewing your changes already, or give you more time to do more?
Please give me another week. Ball's in my court, I'll let ya know once it's ready.
My only feedback so far is that the python2 port commits have broken python3 for me (particularly the ed25519 blinding implementation).
Gotcha. For the moment tests pass with both python2 and python3 but I still break them occasionally as I work. If you have python3 issues with the final result please let me know and I'll address them.
Would it be egregious to provide hsv3 support only for python3 users so that we can use python3 features as we wish?
This is actually a particularly interesting question right now. Here's my plans...
Stem's 1.x series supports python 2.6 and above. In December I plan to release Stem 1.8 which will be the last in the 1.x series and last with python 2.x support.
Stem 2.x will remove python 2.x support, drop deprecated functions, etc. This is our opportunity for cleanup that breaks backward compatibility so I plan to take my time and make our codebase more consistent.
So finally for your question of requiring python3 for hsv3 the answer is "not yet unless absolutely necessary". That said, I would appreciate 'TODO' comments that indicate where we can simplify ourselves when dropping 2.x support. We already have quite a few...
% grep -R TODO stem | grep -i 'python 2'stem/descriptor/reader.py: # TODO: When dropping python 2.6 support go back to using 'with' forstem/descriptor/__init__.py: # TODO: use 'with' for tarfile after dropping python 2.6 supportstem/client/datatype.py: # TODO: Python 2.6's struct module behaves a little differently in a couplestem/client/datatype.py: # TODO: When we drop python 2.x support this can be simplified viastem/interpreter/__init__.py: # TODO: we can use a lambda here when dropping python 2.x support, butstem/util/test_tools.py:# TODO: Providing a copy of SkipTest that works with python 2.6. This will bestem/util/test_tools.py: # TODO: remove and drop unnecessary 'returns' when dropping python 2.6stem/util/test_tools.py: # TODO: remove when dropping python 2.6 supportstem/util/log.py: # TODO: At least in python 2.6 logging.Handler has a bug in that it doesn'tstem/util/__init__.py: # TODO: I hate doing this but until Python 2.x support is dropped we
Hi asn, swapping our tor-dev@ discussion here as requested.
What is your plan with the hsv3 branch? Should I start reviewing your changes already, or give you more time to do more?
Please give me another week. Ball's in my court, I'll let ya know once it's ready.
Sounds good! Have fun! I will be working on other things in the meanwhile!
My only feedback so far is that the python2 port commits have broken python3 for me (particularly the ed25519 blinding implementation).
Gotcha. For the moment tests pass with both python2 and python3 but I still break them occasionally as I work. If you have python3 issues with the final result please let me know and I'll address them.
Great! FWIW I use Python 3.7.4+ and the descriptor.hidden_service_v3 unittest fails for me in deba20b1.
Would it be egregious to provide hsv3 support only for python3 users so that we can use python3 features as we wish?
This is actually a particularly interesting question right now. Here's my plans...
Hmm, if we drop support for python2 in 3 months, wouldn't it be OK to write this feature in Python3? I imagine that the feature will be finalized in December or so, given bugfixes and stuff. So this leaves us about a month or even less. We could even delay shipping it in a release before January if needed (and I can work on Onionbalance by using unofficial branches).
The reasons I suggested doing Python3-only is:
The main reason for me is we won't need to spend time hunting down bugs and incompatibilities in Python2. A non-trivial part of your changes are Python2 compatibility improvements, and these usually take time to fix (and find).
We will be able to use nice python3 methods like to_bytes(), fromhex() and hex().
We don't need to worry about str <-> bytes <-> unicode compatibility between python2 and python3
We might be able to make stronger assumptions on the crypto libraries being present.
We won't need to fix the ed25519 python3 code which is hairy crypto code.
We might be able to make stronger assumptions on the crypto libraries being present.
Our cryptography requirement is orthogonal to python 2/3 support. Regardless of our interpreter we need to perform prereq checks.
We won't need to fix the ed25519 python3 code which is hairy crypto code.
Gotcha. If this truly becomes a problem I'll be happy to discuss but for the moment at least I don't envision difficulty in making this branch to work with all python versions.
I imagine that the feature will be finalized in December
Are we discussing this branch or something else? I expect to finish this branch within a week or so. HSv3 descriptor support will be part of Stem 1.8.
That's true, what I meant is that there is still things to be done for the actual hsv3 support to be proper and complete. In particular, support for encoding legacy keys and the OPE scheme in the revision counter are definitely needed for the code to be useful. Plus I imagine that bugs will be found as I use the decoding/encoding functionality in stem, so even tho this will be merged soon, I imagine that the branch will actually be in good shape at some point in late November or early December.
I'm back for good now. I see you are still working on the hsv3 branch so I'm not gonna bother you on that. I'm also waiting for you to finish before I jump in and review. In the meanwhile, I will be working on the core onionbalance code.
I see you've been changing a big part of the code. For me, the most important part is not to change the interface of the test_encode_decode_descriptor() unittest, since I'm gonna be using a similar interface in onionbalance.
So far you haven't done any major changes in the test, however you are using the same keys for all introduction points:
For the test to be proper, each introduction point should have unique keys
The keys are arguments. If you want them to have unique keys we can simply drop them to do the same thing as your helper (I only included the keys in the test to demonstrate how key arguments are provided).
Pretty close to finishing this branch but not there yet. I'll let ya know once I'm ready for a review.
Blinding is disabled by default to avoid its lengthy runtime. I'd be delighted to re-enable this default once we have a performant implementation. To create a descriptor with blinding simply supply a nonce...
Great work atagar! My first step here is to review the code, and also pass a descriptor created by stem to Tor to make sure that Tor can decode it correctly.
However, the v3 tests fail with errors with Python3.7.5 which is the default python3 for Debian, and this doesn't allow me to proceed with the testing plan.
Most of the errors are bytes/str conversion issues. I tried to manually fix some of those, but they are not easy to fix because the same line sometimes get called with a byte and other times with a str.
Even more errors appear if key blinding is enabled and those are hard to fix as discussed above in comment:4.
It would be great if you could check those out.
At the same time, please also see https://github.com/asn-d6/stem/pull/2 for a bit of code review. Most of the comments there make stem descriptors unparseable by Tor, so they are important to fix.
Hm. These fields are not actually optional. They are actually requred by rend-spec-v3.txt for the descriptor to be parseable.
Good point. Fixed.
There is an str/bytes issue with Python3.7.5 here, where stem will actually produce a descriptor that says:
enc-key ntor b'sCII8wNTkbwaIReqUWR86TCR==' here.
Another good point. Fixed.
One of the goals of my old _get_middle_descriptor_layer_body() function here was to make the output of stem indistinguishable from the output of Tor.
Sorry, I'm not quite understanding this. The _descriptor_content() arguments are simply default values for mandatory fields - if a caller would care to provide their own desc-auth-ephemeral-key (rather than use a default value) they simply need to provide it...
% cat demo.py from stem.descriptor.hidden_service import HiddenServiceDescriptorV3, InnerLayer, IntroductionPointV3print(HiddenServiceDescriptorV3.content({'desc-auth-ephemeral-key': 'I want a pony...'}))% python demo.py hs-descriptor 3descriptor-lifetime 180descriptor-signing-key-cert-----BEGIN ED25519 CERT-----AQgABqxVAT0aH48dgIhVpHORH/KnytuFmK+lU3I7LLDs0gKosTXqAQAgBABhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmJiYmI=-----END ED25519 CERT-----revision-counter 1574205680superencrypted-----BEGIN MESSAGE-----54+p38IDSEm0ZaOPB4WpTXvAPOt2M9tLeDclSQNJ2PXPhVzsxG3YhO8M4fyQjySmMiVDypHcDJPEeKDAkG2rq22Iwn2TPNCpcggDDLueOMHxMmWSnbqnkWsIU1nEgQKx41rI11LrU0JtJ1jRUGeYITxPn+62LZ5vAKFcaW86m2k8snSI0jA0+gpreEllMI1FfZPRIMlT+lcs8qi1oIh1GxD3dlOiqj1bliEYF9ZD1EL4xXCg2WjZ0UOPA6PTbL/ycfuAg+5bOIeIVwZEuWYtAbWQjQncL3or42JflExvSqfRhtRdWT5bnGGS4SWxcn0ANJTfJTk4HOatA4fz0x4wn6PsAoel+gCcsWdGV66zo0jhjJwPgkfkMl724GtL/zqFcwSPV6zC1gunnZ9IOiYIfAG5l2cKxtwErlAQYuAh+6eGR924AgaX9K+a9Hg/nv3r9nmvx4+hMhiV9vAjr+W1vBcMb0ScAfJO2XfFHjDJZJLntcYdqaaUsGy/p7MpgO1b8CKcdx5Fftq2Sr/2QKLFs76o7+uIQfbF8dGDc5wq+m5nizo4FG8Ysedgvblvzx1w771i3MW4f/jI3DmR7Ib4W3F7v96oMBS2jotUhYSAhk8niIQPwe5eYucCPsOqPRZhn7Tvgi3q8HKgFhaL0OEg2z9MH54P8YIYuSfQAnkTjzZNgu3VcEfgeg==-----END MESSAGE-----desc-auth-ephemeral-key I want a pony...signature sTxFMYH2iGlqzqioPY76jlq0ByyRbUBC6/coXeZ/PBSbYUNlOUtSkVwQrmkb9+scgxHgMz++qwv5A/2zZnFjCQ
Thanks for the fixes atagar! They seem to work just fine, so I resolved all previous comments on the GH PR and added three new ones. Good news! I'm testing a stem created descriptor with little-t-tor and parsing seems to work just fine! :)
One of the goals of my old _get_middle_descriptor_layer_body() function here was to make the output of stem indistinguishable from the output of Tor.
Sorry, I'm not quite understanding this. The _descriptor_content() arguments are simply default values for mandatory fields - if a caller would care to provide their own desc-auth-ephemeral-key (rather than use a default value) they simply need to provide it...
Hm. I understand that _descriptor_content() arguments are default values, but what defines a good default value here? In particular, the desc-auth-ephemeral-key and auth-client fields of the outerlayer are usually fake data in little-t-tor which are easy to generate (see my old _get_middle_descriptor_layer_body()). It's true that a caller can provide their own fake data, but why not make the default value more sensible (so that it matches with the one from little-t-tor)? The benefit here will be that stem descriptors will be closer to identical with little-t-tor's, so that a client cannot tell if a descriptor was made by stem or little-t-tor (without the individual applications having to explicitly do this themselves).
In any case and regardless of the above, we do need to provide a single fake auth-client line otherwise the descriptor won't get parsed by Tor (it's a T1N() element in hs_descriptor.c). I left a comment in the PR about this.
Back to you now! Over the next few days I will be testing this deeper through onionbalance (and as key blinding becomes usable again)!
Hm. I understand that _descriptor_content() arguments are default values, but what defines a good default value here?
Hi George. Yup, I'm delighted to improve our defaults and this is probably simply a matter of us misunderstanding each other's code. Here's what you had...
def _get_fake_clients_bytes(): """ Generate fake client authorization data for the middle layer """ final_bytes = b'' num_fake_clients = 16 # default for when client auth is disabled for _ in range(num_fake_clients): client_id = base64.b64encode(os.urandom(8)).rstrip(b'=') client_iv = base64.b64encode(os.urandom(16)).rstrip(b'=') descriptor_cookie = base64.b64encode(os.urandom(16)).rstrip(b'=') final_bytes += b'%s %s %s %s\n' % (b'auth-client', client_id, client_iv, descriptor_cookie) return final_bytesdef _get_middle_descriptor_layer_body(encrypted): """ Get the middle descriptor layer as bytes (It's just fake client auth data since client auth is disabled) """ from cryptography.hazmat.primitives.asymmetric.x25519 import X25519PrivateKey from cryptography.hazmat.primitives import serialization fake_pub_key = X25519PrivateKey.generate().public_key() fake_pub_key_bytes = fake_pub_key.public_bytes(encoding = serialization.Encoding.Raw, format = serialization.PublicFormat.Raw) fake_pub_key_bytes_b64 = base64.b64encode(fake_pub_key_bytes) fake_clients = _get_fake_clients_bytes() return b'desc-auth-type x25519\n' \ b'desc-auth-ephemeral-key %s\n' \ b'%s' \ b'%s' % (fake_pub_key_bytes_b64, fake_clients, encrypted)
This didn't explain why values were fabricated this way so I figured it was simply overly complicated and inflexible (callers should have a mechanism to override all descriptor values).
Since you're calling out desc-auth-ephemeral-key in particular here's your version of just that...
Other than that, key blinding works just fine now over here! I also replied on the other PR comments! Paul's link to the optimized ed25519 implementation also seems very interesting for our case!
I will keep on testing and reviewing in the meanwhile while I develop onionbalance. Thanks for the help!
On tor-dev you asked how I'd prefer for issues to be reported. Everything's fine, but my preference would be a new trac ticket. When tor moves to GitLab I'll likely move to GitHub, but until then I'm remaining on trac.
Trac: Resolution: N/Ato fixed Status: reopened to closed