Opened 4 months ago

Closed 2 months ago

Last modified 5 weeks ago

#26227 closed task (implemented)

Review existing stem.client code

Reported by: dmr Owned by: dmr
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords: client
Cc: atagar Actual Points:
Parent ID: Points:
Reviewer: atagar Sponsor:

Description

The scope of this code review includes literally the code in stem.client, but also any supporting code or consumers of it, such as stem.descriptor.remote.

Child Tickets

Change History (18)

comment:1 Changed 4 months ago by dmr

Comments will be provided in this ticket. Smaller changes will be provided via granular commits.

comment:2 Changed 4 months ago by dmr

Type: defecttask

Oops, this should be a task, not a defect.

comment:3 Changed 4 months ago by dmr

Reviewer: atagar
Status: assignedneeds_review

atagar: commits are ready for your review in this pull request on GitHub. (Branch head 79396f510ba5713ec5aea78dcaee66d9dd045399.)

The github master was 1 commit behind the git.tpo master, so I rebased to the github master in case github would otherwise get confused with the PR.

The commits should be pretty self-explanatory, but feel free to ask any questions or comment. If you've code code changes you'd suggest/require, let's try out the GitHub review features!

I've got more comments that didn't make it into changes, and I'll be adding those within the next day. Sorry for the delay!

comment:4 Changed 4 months ago by dmr

Replying to dmr:

[...] within the next day. Sorry for the delay!

Sorry again for the delay!

Here's comments I've collected from my code review.

Some of them are pretty straightforward and probably need no discussion. I ran out of time with the commits I pushed, but these could probably be addressed without much review. I'll note that as Straightforward; you can probably skip over reading those ;).

For others, where I note Suggestion or Question, I'd like to get some feedback on whether you think such a change is appropriate. Feel free to respond with a quick "Go ahead" or "Let's not do that" for various suggestions!

I can then make tickets or commits to the code-review branch - whichever is most appropriate!

Maybe slightly out of stem.client scope, but mentioning anyways... (lest I forget)

stem.descriptor.remote _download_from_*port() signatures

Straightforward
Instead of url, this method should take an endpoint and resource - having the same signature as _download_from_orport()
url creation should be thus done within the _download_from_dirport() method

Assume DirPort for _pick_endpoint(), or...?

Question: Should we assume DirPort for the directory authorities by default, or should it switch to ORPort, or have it be specifiable?
Right now it assumes DirPort.

(Note that the last retry for _download_descriptors() will use directory authorities (and thus DirPort) if Query.fall_back_to_authority is True.)

Suggestion, if have it be specifiable:
We could allow fall_back_to_authority in the constructor to take in an Endpoint class to make it clear which type to use, and default to DirPort if specified as True (backwards-compatible).

stem.client and consumers

New Relay connection per _download_from_orport() call

Summary: each _download_from_orport() call creates a new Relay connection; ideally these should be multiplexed
Suggestion (for future)
At a future point, we may want to refactor this to share connections and possibly circuits with a stem.client singleton.
This is discussed more in the architecture ticket (#26227)
At that point, we'd want to avoid hardcoding the stream_id.

Protocol constants: stem.client.Relay

Suggestion
Somewhere I think we want to define a SUPPORTED_LINK_PROTOCOLS constant
We likely won't have full support for any protocol listed, so that could be a bit disingenuous for a while. But in order to move forward with development, it's a chicken-and-the-egg problem ;).

Suggestion
In similar fashion to defining the SUPPORTED_LINK_PROTOCOLS constant, we should define constants for all the subprotocols. See tor-spec section 9.

Summary: The method takes a link_protocols parameter. This should be named a bit clearer for its purpose.
Suggestion: It could be named something like link_protocols_to_advertise and default to SUPPORTED_LINK_PROTOCOLs.

Rationale:
I can see that it appears to already generate confusion with its current name, because _download_from_orport() attempts to use link_protocol values associated with the endpoint, i.e. the remote relay's advertised link protocols.

Straightforward
Based on the above, refactor these

There's a few constants and a few magic numbers, e.g. [3].

Naming: stem.client.datatype.Size subclasses/attributes

Suggestion:
It might be good to switch CHAR/SHORT/etc. to UCHAR/USHORT/etc.
I don't know what the convention is here, but it may help for readability.

I'm personally used to U<size> to signify unsigned and <size> to signify signed. I think switching to U<size> would make the code potentially easier to read for newcomers from various backgrounds.

Suggestion:
Similarly, it may help to put the bits length in it, too - for the most immediate readability.
So, e.g.:

  • UCHAR8
  • USHORT16
  • etc.

For reference, on a quick glance...

RelaySocket recv() may give us partial cells

Summary: our socket recv(), by nature of network communication, may unblock at "incomplete" cell lengths.
(I believe so, at least.)

We don't account for mid-Cell reads/recv()s. We try to pop a Cell at a time, and I feel it's probably possible to get a buffer read that is between cell boundaries from our lower-level I/O, and thus error out. Variable-width cells could potentially exacerbate this.

stem.client should be designed to handle that at a higher layer

Suggestion: new exception type named* IncompleteCell, raised from pop()/unpack() instead of ValueError

That would let us catch it specifically and re-block on the recv continuously, appending buffers until we have received the complete cell.
ValueError then would be used for malformed cells in which the code believe it's seen all the content.

* exact naming TBD

RelaySocket recv() and send() use/handling

Summary: recv() is directly used by consumers at different layers. send() too

We furthermore don't have an architecture with a centralized recv(). For instance, it looks like Relay.__init__(), Relay.create_circuit(), and Circuit.send() all directly use orport recv(), instead of e.g. having a thread delegated to do the recv and sorting of cells into buckets or whatnot.

The problem with these disparate consumers doing recv() so directly is that they are dropping cells that don't pertain to what they're trying to do. (And per above, maybe dropping portions of a cell, corrupting the connection "framing".)

There may also be something similar with send(), but since underneath it uses a socket file and flushes, I think it might be ok. (Though I've read a lot of different documentation on receive/send methods - I could be mixing things up.)

Suggestion: Manage these buffers in a centralized fashion, in a thread
More discussion thereon in the architecture ticket (#26227).

Decryption/encryption of RelayCell - wrong abstraction layer?

Summary: it looks like our abstraction is breaking down a bit
Much of what's happening in `stem.client.Circuit.send()` is more directly handling cell content than it should be.

Suggestion: move much of this into RelayCell methods. The forward-key and backward-key data will need to reside in the Circuit, but much of the rest of this can be encapsulated in the Cell.

Nonuniform integrity/content/etc. checking within stem.client.cell

Summary: we do cell content checks in a variety of methods, but not uniformly

Often this is in __init__ or pack and/or unpack.
Mostly it looks like the validation only happens in a subset of these, for a given cell type. Sometimes the checks may be inconsistent.

(For example: AuthChallengeCell seems to allow for 0 methods to be specified, which would be bad (the unpacking expects at least 1 method, iiuc))

I think ideally we'd want it to happen for all of these.

That would allow a cell's contents to be changed e.g.:

  • after it's unpacked, before it's packed
  • after it's constructed, before it's packed
  • etc.

... without bypassing the checks

Suggestion: Cells define a more generic validate() method for each cell type; call this in each of __init__, pack, and unpack

Side suggestion:
To allow delayed validation and improper cell creation, we might want to provide some kwargs to the constructor and pack methods; maybe:

  • __init__(..., delay_validation=False)
  • pack(..., allow_invalid=False)

And similarly, if the parsing allows for it, we might have something similar (allow_invalid=False) for the unpack method.

Using these in non-default situations would be rare and probably mostly useful for integration tests of edge cases with tor, seeing how it responds to bad input.

Incongruency with Cell.unpack() and Field.unpack()

Summary: Cell.unpack() creates a generator whereas Field.unpack() ensures that there is no remainder content.
Suggestion: we might want to standardize these

Perhaps to...:

  • unpack() creates a generator; unpack_one() ensures there is no remainder content
  • unpack_many() creates a generator; unpack() ensures there is no remainder content

CircuitCell subclass; what about ConnectionCell subclass?

Summary: We have a CircuitCell subclass, for cells that must pertain to a circuit. Can we have a ConnectionCell subclass for cells that must not pertain to a circuit?

I believe* for Cells, they would be either CircuitCell (circuit id must be > 0) or ConnectionCell (circuit id must be 0).
Currently we enforce in _pack() that CircuitCells have a circuit id, but we don't enforce anything for the other cells.

Suggestion: subclass all other Cells into a ConnectionCell subclass; refactor a bit for circuit id checks
(Looking for your feedback here on whether you think we should do this. The how is very straightforward.)

*This division is actually not completely clear to me in the spec, but I'm pretty confident about it. I can hunt down spec wording while implementing, and if it's not really not clear, track an edit to make this division clearer.

Define a max-payload-length security parameter?

Summary: variable-length cell payloads have a length stored in a SHORT but no specified max size
(Open) Question: does little-t tor define a security parameter here?

Potentially this should have a security parameter.
A SHORT allows a payload up to 65535 bytes, which isn't that bad (just under 64KBytes), but I'd wonder if little-t tor actually defines something smaller. There certainly isn't anything defined in the spec.

Not all the ValueError exceptions have tests

Suggestion: employ a code coverage tool.
Question: Possibly coverage?

No timeout specifiable for _download_from_orport()

Summary: _download_from_dirport() takes a timeout parameter, but _download_from_orport() doesn't
Suggestion: should have a timeout argument, ideally

This in practice may be a bit harder to implement without rearchitecting a bit. But probably can be done with a different architecture (see #26227). Low priority in the interim?

splitting here because the spam filter is complaining I have too many links

comment:5 Changed 4 months ago by dmr

HTTP, potential bugs

Summary: HTTP parsing has a lot of edge cases
Suggestion: use a library for this (see architecture ticket, #26227)

There are a number of potential bugs with the HTTP handling.
HTTP is a nuanced spec. It's very possible that these bugs will never be exercised due to tor's implementation, but it's also hard to know how tor will evolve in the future.

Here's the potential bugs I saw:

So there are ways to fix all these cases, but my preference would be to not reinvent the wheel here. This is discussed more in the architecture ticket (#26227).

Phew

That's a lot. Sorry - still working on my verbosity! Hopefully it's at least well organized!

comment:6 Changed 3 months ago by atagar

Thanks Dave. Finally cobbled together some time to begin reviewing this and looks great! I really love how you broke these up - that's making it far easier to review.

I've pushed your first five commits with some tweaks, most notably that I added a LinkProtocol class that centralizes the constants which vary by version. Mind taking a peek to see what you think?

I've run out of steam for today, and tomorrow I'm hitting the Fremont Fair with my dad, so I'll continue chewing on these over the course of next week.

comment:7 in reply to:  6 Changed 3 months ago by dmr

Replying to atagar:

Thanks Dave. Finally cobbled together some time to begin reviewing this and looks great! I really love how you broke these up - that's making it far easier to review.

Thanks! I'll keep trying to do granular commits. (And hopefully push them more often!)

I've pushed your first five commits with some tweaks, most notably that I added a LinkProtocol class that centralizes the constants which vary by version. Mind taking a peek to see what you think?

Overall: Looks good!

A few comments...

LinkProtocol __eq__, future direction

w.r.t. 76c17caabc13b21dde7a5930fe30a59ffea9b2fe, would it make sense to have the LinkProtocol __eq__ method allow an integer argument to be checked against the LinkProtocol.version attribute? (Or e.g. would that be considered non-Pythonic?)

Also regarding the LinkProtocol class - do you envision we might turn it into something beyond a NamedTuple in the future, for version-specific behavior? Or do you expect we'll keep it limited to constants? (this touches a bit on #26226)

circ_id allocation

I also saw a change to the circ_id allocation algorithm. The spec says this is arbitrary ("The CircID for a CREATE cell is an arbitrarily chosen nonzero integer,"), but I think it might be useful to factor this out. It is trivially network-observable behavior to the guard, and thus might be used to distinguish stem.client connections and treat them differently.

Maybe the spec ought to specify it, at least a SHOULD behavior? I suppose it depends on the threat model the spec itself is trying to cover. Please let me know your thoughts!

Cell.unused -> Cell.padding?

I really liked 84e4e657b4785e4888e567fbc04c8ea29fd43cc4 - I was contemplating how we might do that, and having an unused attribute makes a lot of sense!
Do you think it might make more sense to call it padding, though? There was some discussion about terminology in #26228, and it's not fully resolved therein, but a few opinions did come up that padding bytes are not a payload, even in the [V]PADDING cells.

Testing literals

For the side point of testing literals you brought up in 7711050619af1a2f8ecf4aa87f774baa5f367b3b, I filed a separate ticket - see #26420. (I was already planning to file this.)

comment:8 Changed 3 months ago by atagar

LinkProtocol eq, future direction

Sure. All attributes of the LinkProtocol are derived from its integer version, so it would be perfectly fine to do equality and hash on that.

do you envision we might turn it into something beyond a NamedTuple in the future, for version-specific behavior?

Good question. I don't have any plans to extend it at present but I'd be happy to chat if you think it would be useful to do so.

I also saw a change to the circ_id allocation algorithm.

If you the LinkProtocol addition then nope, I didn't change it. Just moved it. The part that concerns us is a few paragraphs later...

With protocol version 3 or lower, a client with no public key
MAY choose any CircID it wishes, since clients never need to process a
CREATE cell.

In link protocol version 4 or higher, whichever node initiated the
connection sets its MSB to 1, and whichever node didn't initiate the
connection sets its MSB to 0.

I really liked ​84e4e657b4785e4888e567fbc04c8ea29fd43cc4 - I was contemplating how we might do that, and having an unused attribute makes a lot of sense!

Neat, glad ya like it!

Do you think it might make more sense to call it padding, though?

Actually, initially I did call it that until I realized the name conflicted with PADDING and VPADDING cells. I kinda like the name 'unused' since it makes it clear that it's bytes that have no impact.

comment:9 in reply to:  8 Changed 3 months ago by dmr

Replying to atagar:

LinkProtocol __eq__, future direction

Sure. All attributes of the LinkProtocol are derived from its integer version, so it would be perfectly fine to do equality and hash on that.

Sounds good. Filed as #26432.

I also saw a change to the circ_id allocation algorithm.

If you the LinkProtocol addition then nope, I didn't change it. Just moved it. The part that concerns us is a few paragraphs later...
[snip]

Ah, sorry, I meant that the allocation algorithm switched in how it would assign circ_ids.
Previous code was:

# __init__.py
      circ_id = 0x80000000 if self.link_protocol > 3 else 0x01

      while circ_id in self._circuits:
        circ_id += 1

and now is:

# __init__.py
      circ_id = max(self._circuits) + 1 if self._circuits else self.link_protocol.first_circ_id

# datatype.py - LinkProtocol.first_circ_id
      first_circ_id = 0x80000000 if version > 3 else 0x01

There's a subtle difference here - the former would reuse circuit ids after a circuit closed, grabbing from the lowest available.
The latter still reuses circuit ids, but only if all circuits are closed - then it resets because self._circuits is empty.

Really, the spec says this is arbitrary, especially the MAY as you pointed out for link_protocol <= 3, but it does represent network-observable behavior that could be used to distinguish different client implementations, fairly trivially.
(Hence: maybe the spec ought to specify a SHOULD here?)

That's starting to get into threat modeling, for stem.client and for the spec at large. I think this should be covered within a broader ticket. Filed one for stem.client: #26431. We can decide after resolving that whether the spec needs updates.

I really liked 84e4e657b4785e4888e567fbc04c8ea29fd43cc4 - I was contemplating how we might do that, and having an unused attribute makes a lot of sense!

Neat, glad ya like it!

Do you think it might make more sense to call it padding, though?

Actually, initially I did call it that until I realized the name conflicted with PADDING and VPADDING cells. I kinda like the name 'unused' since it makes it clear that it's bytes that have no impact.

While I agree that unused is very clear that those bytes are incidental to the cell, I guess what I was trying to say by referencing #26228 is that ... whether PADDING and VPADDING carry a payload or whether what they carry should be padding ... was discussed a bit therein; see my opening post, teor's comment, and arma's comment.

This is a bit bikeshed-y and really doesn't affect the implementation much, but I just wanted to make sure I represented those voices. I'm fine with moving forward with payload and unused, and re-assessing this later if needed.

comment:10 Changed 3 months ago by atagar

Hi Dave, sorry about the delay! Finally done chewing through these and merged.

Work looks great! Many thanks for the pull request. Opted to pass on three of the smaller commits...

  • Commit 8353d83 (Remove redundant content-length checks): Think I'd prefer to pass on this one. Those conditionals are there so we provide callers with nicer error messages. True, they might not be necessary but not spotting a need to drop 'em.
  • Commit 819c1a2 (Refactor fixed_cell_len determination into a @staticmethod): No longer necessary due to the LinkProtocol class addition.
  • Commit 79396f5 (Add TODO notes for Cell types without a unit test definition): Eh. Happy to add TODOs but these didn't add much.

Everything else has been merged. Feel free to reopen if there's anything I missed here.

comment:11 Changed 3 months ago by atagar

Resolution: implemented
Status: needs_reviewclosed

comment:12 Changed 3 months ago by dmr

Resolution: implemented
Status: closedreopened

Reopening. Can't set status in same step, so intermediate edit...

comment:13 in reply to:  10 Changed 3 months ago by dmr

Status: reopenedneeds_review

Replying to atagar:

Hi Dave, sorry about the delay! Finally done chewing through these and merged.

Work looks great! Many thanks for the pull request.

Hey Damian! Thanks for the review! Glad you got through it - sorry again that it was such a big chunk to chew on.

I'm glancing through the commits you've added in between - they look pretty self-explanatory, but I'll let you know if I have any questions, and will definitely let you know if I have any comments.

Opted to pass on three of the smaller commits...

  • Commit 8353d83 (Remove redundant content-length checks): Think I'd prefer to pass on this one. Those conditionals are there so we provide callers with nicer error messages. True, they might not be necessary but not spotting a need to drop 'em.
  • Commit 819c1a2 (Refactor fixed_cell_len determination into a @staticmethod): No longer necessary due to the LinkProtocol class addition.
  • Commit 79396f5 (Add TODO notes for Cell types without a unit test definition): Eh. Happy to add TODOs but these didn't add much.

Everything else has been merged.

Cool! At a glance, it seems like these all make sense to leave out.

For 8353d83 specifically:
I guess I didn't think about it from that perspective. I just noticed that several other places relied on the Size pop() for the exception behavior; these few seemed inconsistent.

In the case of the NetinfoCell, the code is practically unreachable, so I'm not sure the check/message really adds much.

Either way, I think a good follow-up commit would be to annotate the if check with a comment saying the purpose is for a nicer exception message. (That way it won't look like the code otherwise wouldn't check size.)

Feel free to reopen if there's anything I missed here.

Hey, in particular I've reopened this for the review comments in this ticket!
Namely:

Setting to needs_review since that seems to be the most relevant thing here.

comment:14 Changed 3 months ago by atagar

I just noticed that several other places relied on the Size pop() for the exception behavior; these few seemed inconsistent.

Ok, you persuaded me. Consistency is important - pushed.

Suggestion: use a library for this (see architecture ticket, #26227)

I'm confused. Is this ticket citing itself?

This conversation has grown way too long for me to keep track of. Please file a separate ticket to discuss the HTTP edge cases you pointed out. The approach Stem takes with libraries is...

  • Use builtins if at all possible.
  • If a library is vital then we can take a soft dependency on it. An example of this is cryptography - if unavailable Stem still works, but lacks certain features like descriptor signature validation.

On first glance maybe we can use builtins to avoid the gotchas you mentioned?

https://stackoverflow.com/questions/4685217/parse-raw-http-headers/5955949#5955949

comment 4

What in particular from comment #4 would you like for me to reply to? As you mention, it's veeeeeery long.

circ_id allocation

Yup, we subtly changed how circ_ids are allocated but unless I'm missing something it still conforms with the spec.

We are not attempting to be indistinguishable from the C tor codebase on the wire. That is a different and much, much harder project that would involve wireshark, a moving target, and tears. Rather, our goal is to make a compatible tor implementation that conforms with the spec.

Did I miss anything?

comment:15 Changed 3 months ago by dmr

Replying to dmr:

I'm glancing through the commits you've added in between - they look pretty self-explanatory, but I'll let you know if I have any questions, and will definitely let you know if I have any comments.

I'm using some "warm up" time to glance through these gradually, since review isn't a large focus of the project.
I'll also fix bugs / improve things gradually for them, too.

First such change, bug filed:
#26684

comment:16 in reply to:  15 Changed 3 months ago by dmr

Replying to dmr:

I'm using some "warm up" time to glance through these gradually, since review isn't a large focus of the project.
I'll also fix bugs / improve things gradually for them, too.

Not just "warm up" time like I had thought - a decent effort! - but here's another:
#26766

comment:17 Changed 2 months ago by atagar

Resolution: implemented
Status: needs_reviewclosed

Hi Dave, think everything's merged. Unsure there's value in keeping this open so tidying it up.

comment:18 in reply to:  4 Changed 5 weeks ago by dmr

From comment 4:

Naming: stem.client.datatype.Size subclasses/attributes

...

Filed as #27253

Note: See TracTickets for help on using tickets.