The scope of this code review includes literally the code in stem.client, but also any supporting code or consumers of it, such as stem.descriptor.remote.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
atagar: commits are ready for your review in [pull request on GitHub]. (Branch head 79396f510ba5713ec5aea78dcaee66d9dd045399.)
The github master was 1 commit behind the git.tpo master, so I rebased to the github master in case github would otherwise get confused with the PR.
The commits should be pretty self-explanatory, but feel free to ask any questions or comment. If you've code code changes you'd suggest/require, let's try out the GitHub review features!
I've got more comments that didn't make it into changes, and I'll be adding those within the next day. Sorry for the delay!
Trac: Status: assigned to needs_review Reviewer: N/Ato atagar
[...] within the next day. Sorry for the delay!
Sorry //again// for the delay!
Here's comments I've collected from my code review.
Some of them are pretty straightforward and probably need no discussion. I ran out of time with the commits I pushed, but these could probably be addressed without much review. I'll note that as Straightforward; you can probably skip over reading those ;).
For others, where I note Suggestion or Question, I'd like to get some feedback on whether you think such a change is appropriate. Feel free to respond with a quick "Go ahead" or "Let's not do that" for various suggestions!
I can then make tickets or commits to the code-review branch - whichever is most appropriate!
== Maybe //slightly// out of stem.client scope, but mentioning anyways... (lest I forget)
==== stem.descriptor.remote _download_from_*port() signatures
Straightforward
Instead of url, this method should take an endpoint and resource - having the same signature as _download_from_orport()
url creation should be thus done within the _download_from_dirport() method
==== Assume DirPort for _pick_endpoint(), or...?
Question: Should we assume DirPort for the directory authorities by default, or should it switch to ORPort, or have it be specifiable?
Right now it assumes DirPort.
(Note that the last retry for _download_descriptors() will use directory authorities (and thus DirPort) if Query.fall_back_to_authority is True.)
Suggestion, if have it be specifiable:
We could allow fall_back_to_authority in the constructor to take in an Endpoint //class// to make it clear which type to use, and default to DirPort if specified as True (backwards-compatible).
== stem.client and consumers
==== New Relay connection per _download_from_orport() call
Summary: each _download_from_orport() call creates a new Relay connection; ideally these should be multiplexed
Suggestion (for future)
At a future point, we may want to refactor this to share connections and possibly circuits with a stem.client singleton.
This is discussed more in the architecture ticket (#26227 (closed))
At that point, we'd want to avoid hardcoding the stream_id.
==== Protocol constants: stem.client.RelaySuggestion
Somewhere I think we want to define a SUPPORTED_LINK_PROTOCOLS constant
We likely won't have full support for any protocol listed, so that could be a bit disingenuous for a while. But in order to move forward with development, it's a chicken-and-the-egg problem ;).
Suggestion
In similar fashion to defining the SUPPORTED_LINK_PROTOCOLS constant, we should define constants for all the subprotocols. See [section 9].
==== Parameter naming: stem.client.Relay.connect(), link_protocolsSummary: The method takes a link_protocols parameter. This should be named a bit clearer for its purpose.
Suggestion: It could be named something like link_protocols_to_advertise and default to SUPPORTED_LINK_PROTOCOLs.
Rationale:
I can see that it appears to already generate confusion with its current name, because _download_from_orport() attempts to use link_protocol values associated with the endpoint, i.e. the //remote// relay's advertised link protocols.
==== Constants/magics for link_protocol values
Straightforward
Based on the above, refactor these
There's a few constants and a few magic numbers, e.g. [3].
==== Naming: stem.client.datatype.Size subclasses/attributes
Suggestion:
It might be good to switch CHAR/SHORT/etc. to UCHAR/USHORT/etc.
I don't know what the convention is here, but it may help for readability.
I'm personally used to U<size> to signify unsigned and <size> to signify signed. I think switching to U<size> would make the code //potentially// easier to read for newcomers from various backgrounds.
Suggestion:
Similarly, it may help to put the bits length in it, too - for the most immediate readability.
So, e.g.:
==== RelaySocket recv() may give us partial cells
Summary: our socket recv(), by nature of network communication, may unblock at "incomplete" cell lengths.
(I believe so, at least.)
We don't account for mid-Cell reads/recv()s. We try to pop a Cell at a time, and I feel it's probably possible to get a buffer read that is between cell boundaries from our lower-level I/O, and thus error out. Variable-width cells could potentially exacerbate this.
stem.client should be designed to handle that at a higher layer
Suggestion: new exception type named* IncompleteCell, raised from pop()/unpack() instead of ValueError
That would let us catch it specifically and re-block on the recv continuously, appending buffers until we have received the complete cell.
ValueError then would be used for malformed cells in which the code believe it's seen all the content.
//* exact naming TBD//
==== RelaySocket recv() and send() use/handling
Summary: recv() is directly used by consumers at different layers. send() too
We furthermore don't have an architecture with a centralized recv(). For instance, it looks like Relay.__init__(), Relay.create_circuit(), and Circuit.send() all directly use orport recv(), instead of e.g. having a thread delegated to do the recv and sorting of cells into buckets or whatnot.
The problem with these disparate consumers doing recv() so directly is that they are dropping cells that don't pertain to what they're trying to do. (And per above, maybe dropping portions of a cell, corrupting the connection "framing".)
There may also be something similar with send(), but since underneath it uses a socket //file// and flushes, I think it might be ok. (Though I've read a lot of different documentation on receive/send methods - I could be mixing things up.)
Suggestion: Manage these buffers in a centralized fashion, in a thread
More discussion thereon in the architecture ticket (#26227 (closed)).
==== Decryption/encryption of RelayCell - wrong abstraction layer?
Summary: it looks like our abstraction is breaking down a bit
Much of what's happening in [[https://gitweb.torproject.org/stem.git/tree/stem/client/init.py?id=4306013ab6e868e99553fbb78ed14ef51a896b7d#n223|stem.client.Circuit.send()]] is more directly handling cell content than it should be.
Suggestion: move much of this into RelayCell methods. The forward-key and backward-key data will need to reside in the Circuit, but much of the rest of this can be encapsulated in the Cell.
==== Nonuniform integrity/content/etc. checking within stem.client.cellSummary: we do cell content checks in a variety of methods, but not uniformly
Often this is in __init__ //or// pack //and/or// unpack.
Mostly it looks like the validation only happens in a subset of these, for a given cell type. Sometimes the checks may be inconsistent.
//(For example: AuthChallengeCell seems to allow for 0 methods to be specified, which would be bad (the unpacking expects at least 1 method, iiuc))//
I think ideally we'd want it to happen for all of these.
That would allow a cell's contents to be changed e.g.:
after it's unpacked, before it's packed
after it's constructed, before it's packed
etc.
... without bypassing the checks
Suggestion: Cells define a more generic validate() method for each cell type; call this in each of __init__, pack, and unpack
Side suggestion:
To allow delayed validation and improper cell creation, we might want to provide some kwargs to the constructor and pack methods; maybe:
__init__(..., delay_validation=False)
pack(..., allow_invalid=False)
And similarly, if the parsing allows for it, we might have something similar (allow_invalid=False) for the unpack method.
Using these in non-default situations would be rare and probably mostly useful for integration tests of edge cases with tor, seeing how it responds to bad input.
==== Incongruency with Cell.unpack() and Field.unpack()
Summary: Cell.unpack() creates a generator whereas Field.unpack() ensures that there is //no// remainder content.
Suggestion: we might want to standardize these
Perhaps to...:
unpack() creates a generator; unpack_one() ensures there is no remainder content
unpack_many() creates a generator; unpack() ensures there is no remainder content
==== CircuitCell subclass; what about ConnectionCell subclass?
Summary: We have a CircuitCell subclass, for cells that must pertain to a circuit. Can we have a ConnectionCell subclass for cells that //must not// pertain to a circuit?
I believe* for Cells, they would be either CircuitCell (circuit id must be > 0) or ConnectionCell (circuit id must be 0).
Currently we enforce in _pack() that CircuitCells have a circuit id, but we don't enforce anything for the other cells.
Suggestion: subclass all other Cells into a ConnectionCell subclass; refactor a bit for circuit id checks
(Looking for your feedback here on //whether// you think we should do this. The //how// is very straightforward.)
*This division is actually not completely clear to me in the spec, but I'm pretty confident about it. I can hunt down spec wording while implementing, and if it's not really not clear, track an edit to make this division clearer.
==== Define a max-payload-length security parameter?
Summary: variable-length cell payloads have a length stored in a SHORT but no specified max size
(Open) Question: does little-t tor define a security parameter here?
Potentially this should have a security parameter.
A SHORT allows a payload up to 65535 bytes, which isn't that bad (just under 64KBytes), but I'd wonder if little-t tor actually defines something smaller. There certainly isn't anything defined in the spec.
==== Not all the ValueError exceptions have tests
Suggestion: employ a code coverage tool.
Question: Possibly coverage?
==== No timeout specifiable for _download_from_orport()Summary: _download_from_dirport() takes a timeout parameter, but _download_from_orport() doesn't
Suggestion: should have a timeout argument, ideally
This in practice may be a bit harder to implement without rearchitecting a bit. But probably can be done with a different architecture (see #26227 (closed)). Low priority in the interim?
== ...
//splitting here because the spam filter is complaining I have too many links//
==== HTTP, potential bugs
Summary: HTTP parsing has a lot of edge cases
Suggestion: use a library for this (see architecture ticket, #26227 (closed))
There are a number of potential bugs with the HTTP handling.
HTTP is a nuanced spec. It's very possible that these bugs will never be exercised due to tor's implementation, but it's also hard to know how tor will evolve in the future.
Both of the above may mean that the content encoding may not be correctly passed to _decompress().
We're using HTTP/1.0. I'm not super-familiar with the specifics of HTTP/1.0, but there may be some restrictions that we run into, as far as e.g. [encodings allowed]. Might want to be using HTTP/1.1 to be "safe". (I haven't looked at the dir spec or tor implementation to know if there's any functional impact.)
fixed in [ba275219bd50dfc408befca81ef4d3116008dbcc]: the Reason-Phrase of the HTTP response is only recommended to be OK for 200, but that is not required. The parsing code did require it to be OK.
So there are ways to fix all these cases, but my preference would be to not reinvent the wheel here. This is discussed more in the architecture ticket (#26227 (closed)).
== Phew
That's a lot. Sorry - still working on my verbosity! Hopefully it's at least well organized!
Thanks Dave. Finally cobbled together some time to begin reviewing this and looks great! I really love how you broke these up - that's making it far easier to review.
I've pushed your first five commits with some tweaks, most notably that I added a LinkProtocol class that centralizes the constants which vary by version. Mind taking a peek to see what you think?
I've run out of steam for today, and tomorrow I'm hitting the Fremont Fair with my dad, so I'll continue chewing on these over the course of next week.
Thanks Dave. Finally cobbled together some time to begin reviewing this and looks great! I really love how you broke these up - that's making it far easier to review.
Thanks! I'll keep trying to do granular commits. (And hopefully push them more often!)
I've pushed your first five commits with some tweaks, most notably that I added a LinkProtocol class that centralizes the constants which vary by version. Mind taking a peek to see what you think?
Overall: Looks good!
Also regarding the LinkProtocol class - do you envision we might turn it into something beyond a NamedTuple in the future, for version-specific behavior? Or do you expect we'll keep it limited to constants? (this touches a bit on #26226 (closed))
==== circ_id allocation
I also saw a change to the circ_id allocation algorithm. The spec says this is arbitrary ("[CircID for a CREATE cell is an arbitrarily chosen nonzero integer,]"), but I think it might be useful to factor this out. It is trivially network-observable behavior to the guard, and thus might be used to distinguish stem.client connections and treat them differently.
Sure. All attributes of the LinkProtocol are derived from its integer version, so it would be perfectly fine to do equality and hash on that.
do you envision we might turn it into something beyond a NamedTuple in the future, for version-specific behavior?
Good question. I don't have any plans to extend it at present but I'd be happy to chat if you think it would be useful to do so.
I also saw a change to the circ_id allocation algorithm.
If you the LinkProtocol addition then nope, I didn't change it. Just moved it. The part that concerns us is a few paragraphs later...
With protocol version 3 or lower, a client with no public keyMAY choose any CircID it wishes, since clients never need to process aCREATE cell.In link protocol version 4 or higher, whichever node initiated theconnection sets its MSB to 1, and whichever node didn't initiate theconnection sets its MSB to 0.
I really liked 84e4e657b4785e4888e567fbc04c8ea29fd43cc4 - I was contemplating how we might do that, and having an unused attribute makes a lot of sense!
Neat, glad ya like it!
Do you think it might make more sense to call it padding, though?
Actually, initially I did call it that until I realized the name conflicted with PADDING and VPADDING cells. I kinda like the name 'unused' since it makes it clear that it's bytes that have no impact.
I also saw a change to the circ_id allocation algorithm.
If you the LinkProtocol addition then nope, I didn't change it. Just moved it. The part that concerns us is a few paragraphs later...
[snip]
Ah, sorry, I meant that the allocation algorithm switched in how it would assign circ_ids.
Previous code was:
# __init__.py circ_id = 0x80000000 if self.link_protocol > 3 else 0x01 while circ_id in self._circuits: circ_id += 1
and now is:
# __init__.py circ_id = max(self._circuits) + 1 if self._circuits else self.link_protocol.first_circ_id# datatype.py - LinkProtocol.first_circ_id first_circ_id = 0x80000000 if version > 3 else 0x01
There's a subtle difference here - the former would reuse circuit ids after a circuit closed, grabbing from the lowest available.
The latter still reuses circuit ids, but only if //all circuits are closed// - then it resets because self._circuits is empty.
Really, the spec says this is arbitrary, especially the MAY as you pointed out for link_protocol <= 3, but it does represent network-observable behavior that could be used to distinguish different client implementations, fairly trivially.
(Hence: maybe the spec ought to specify a SHOULD here?)
That's starting to get into threat modeling, for stem.client and for the spec at large. I think this should be covered within a broader ticket. Filed one for stem.client: #26431 (closed). We can decide after resolving that whether the spec needs updates.
I really liked 84e4e657b4785e4888e567fbc04c8ea29fd43cc4 - I was contemplating how we might do that, and having an unused attribute makes a lot of sense!
Neat, glad ya like it!
Do you think it might make more sense to call it padding, though?
Actually, initially I did call it that until I realized the name conflicted with PADDING and VPADDING cells. I kinda like the name 'unused' since it makes it clear that it's bytes that have no impact.
While I agree that unused is very clear that those bytes are incidental to the cell, I guess what I was trying to say by referencing #26228 (moved) is that ... whether PADDING and VPADDING carry a payload or whether what they carry should be padding ... was discussed a bit therein; see my opening post, [[ticket:26228#comment:1|teor's comment]], and [[ticket:26228#comment:3|arma's comment]].
This is a bit bikeshed-y and really doesn't affect the implementation much, but I just wanted to make sure I represented those voices. I'm fine with moving forward with payload and unused, and re-assessing this later if needed.
Hi Dave, sorry about the delay! Finally done chewing through these and merged.
Work looks great! Many thanks for the pull request. Opted to pass on three of the smaller commits...
Commit 8353d83 (Remove redundant content-length checks): Think I'd prefer to pass on this one. Those conditionals are there so we provide callers with nicer error messages. True, they might not be necessary but not spotting a need to drop 'em.
Commit 819c1a2 (Refactor fixed_cell_len determination into a @staticmethod): No longer necessary due to the LinkProtocol class addition.
Commit 79396f5 (Add TODO notes for Cell types without a unit test definition): Eh. Happy to add TODOs but these didn't add much.
Everything else has been merged. Feel free to reopen if there's anything I missed here.
Hi Dave, sorry about the delay! Finally done chewing through these and merged.
Work looks great! Many thanks for the pull request.
Hey Damian! Thanks for the review! Glad you got through it - sorry again that it was such a big chunk to chew on.
I'm glancing through the commits you've added in between - they look pretty self-explanatory, but I'll let you know if I have any questions, and will definitely let you know if I have any comments.
Opted to pass on three of the smaller commits...
Commit 8353d83 (Remove redundant content-length checks): Think I'd prefer to pass on this one. Those conditionals are there so we provide callers with nicer error messages. True, they might not be necessary but not spotting a need to drop 'em.
Commit 819c1a2 (Refactor fixed_cell_len determination into a @staticmethod): No longer necessary due to the LinkProtocol class addition.
Commit 79396f5 (Add TODO notes for Cell types without a unit test definition): Eh. Happy to add TODOs but these didn't add much.
Everything else has been merged.
Cool! At a glance, it seems like these all make sense to leave out.
In the case of the NetinfoCell, the code is practically unreachable, so I'm not sure the check/message really adds much.
Either way, I think a good follow-up commit would be to annotate the if check with a comment saying the purpose is for a nicer exception message. (That way it won't look like the code //otherwise wouldn't// check size.)
Feel free to reopen if there's anything I missed here.
Hey, in particular I've reopened this for the review comments in this ticket!
Namely:
[[comment:4|comment 4]]
[[comment:5|comment 5]] (separate comment, 'cause [[comment:4]] was too long)
[[comment:9|comment 9]] (pending discussions: circ_id allocation; padding, esp. comments in #26228 (moved) as called out above)
Setting to needs_review since that seems to be the most relevant thing here.
I just noticed that several other places relied on the Size pop() for the exception behavior; these few seemed inconsistent.
Ok, you persuaded me. Consistency is important - pushed.
Suggestion: use a library for this (see architecture ticket, #26227 (closed))
I'm confused. Is this ticket citing itself?
This conversation has grown way too long for me to keep track of. Please file a separate ticket to discuss the HTTP edge cases you pointed out. The approach Stem takes with libraries is...
Use builtins if at all possible.
If a library is vital then we can take a soft dependency on it. An example of this is cryptography - if unavailable Stem still works, but lacks certain features like descriptor signature validation.
On first glance maybe we can use builtins to avoid the gotchas you mentioned?
What in particular from comment #4 (closed) would you like for me to reply to? As you mention, it's veeeeeery long.
circ_id allocation
Yup, we subtly changed how circ_ids are allocated but unless I'm missing something it still conforms with the spec.
We are not attempting to be indistinguishable from the C tor codebase on the wire. That is a different and much, much harder project that would involve wireshark, a moving target, and tears. Rather, our goal is to make a compatible tor implementation that conforms with the spec.
I'm glancing through the commits you've added in between - they look pretty self-explanatory, but I'll let you know if I have any questions, and will definitely let you know if I have any comments.
I'm using some "warm up" time to glance through these gradually, since review isn't a large focus of the project.
I'll also fix bugs / improve things gradually for them, too.
I'm using some "warm up" time to glance through these gradually, since review isn't a large focus of the project.
I'll also fix bugs / improve things gradually for them, too.
Not just "warm up" time like I had thought - a decent effort! - but here's another:
#26766 (closed)