Refactor circuit_predict_and_launch_new()

changed milestone to %Tor: 0.3.0.x-final

added component::core tor/tor milestone::Tor: 0.3.0.x-final owner::chelseakomlo priority::low refactoring resolution::implemented review-group-13 reviewer::dgoulet reviewer::teor severity::normal status::closed type::defect labels

I'd particularly like to see the magic numbers replaced with const int or #define or something.

I'll be submitting a patch for this (ideally) this week- have gotten some input on this from dgoulet!

Ok, here are my changes:

First, I extracted a useful test helper function. I'm not sure if this is the right place to put it though. https://github.com/chelseakomlo/tor_patches/commit/7b46a8a45c5a67673a7ee2cd746b58344ba950c7

Second, I refactored circuit_predict_and_launch_new. This is mainly a refactor that improves readability/testability, but I did remove one piece of logic which- after writing unit tests to support this- seemed unnecessary. https://github.com/chelseakomlo/tor_patches/commit/c06005d35d35339e61659cb5c5e43e8bcb60db7d

The piece of logic that I removed is here: https://github.com/chelseakomlo/tor_patches/commit/c06005d35d35339e61659cb5c5e43e8bcb60db7d#diff-748f68b44784d7a77ebf42a2db48694eL1025

As later, we exit if the circuit is not general purpose: https://github.com/chelseakomlo/tor_patches/commit/c06005d35d35339e61659cb5c5e43e8bcb60db7d#diff-748f68b44784d7a77ebf42a2db48694eL1031

Let me know if this isn't a good idea and I can add it back in.

Third, I extracted some magic numbers into named variables. I am not sure if the naming for these is the best, please let me know ideas for better names. https://github.com/chelseakomlo/tor_patches/commit/9ea57c857b3c5939ca3394e017f8de4ea1ac9d08

A few things that I did not do:

I did not add unit tests for needs_hs_client_circuits, as it would require mocking the behavior of rep_hist_get_predicted_internal. I could definitely do that as part of this patch though.
I did not deeply analyze the logic in needs_exit_circuits, needs_hs_server_circuits, needs_hs_client_circuits, or needs_circuits_for_build as I'm still a bit unfamiliar with the domain.
I did not add integration tests

One thing I was uncertain about was whether circuit_is_available_for_use should be named circuit_is_clean instead.

Trac:
Status: new to needs_review
Milestone: Tor: 0.2.??? to Tor: 0.3.0.x-final

Trac:
Reviewer: N/A to dgoulet
Cc: dgoulet, sysrqb to sysrqb

This is an incomplete review:

Replying to chelseakomlo:

Second, I refactored circuit_predict_and_launch_new. This is mainly a refactor that improves readability/testability, but I did remove one piece of logic which- after writing unit tests to support this- seemed unnecessary. https://github.com/chelseakomlo/tor_patches/commit/c06005d35d35339e61659cb5c5e43e8bcb60db7d

The piece of logic that I removed is here: https://github.com/chelseakomlo/tor_patches/commit/c06005d35d35339e61659cb5c5e43e8bcb60db7d#diff-748f68b44784d7a77ebf42a2db48694eL1025

As later, we exit if the circuit is not general purpose: https://github.com/chelseakomlo/tor_patches/commit/c06005d35d35339e61659cb5c5e43e8bcb60db7d#diff-748f68b44784d7a77ebf42a2db48694eL1031

Let me know if this isn't a good idea and I can add it back in.

In general, if you're going to change behaviour in a refactor, please put it in a separate commit that describes the changes. That way, you separate changes that are meant to keep the same behaviour, from changes that are meant to modify behaviour.

In this particular case, if you're convinced it's unnecessary, I would prefer it if we replaced this check with a BUG macro like so:

if (BUG(!CIRCUIT_IS_ORIGIN(circ)))
    continue;
origin_circ = TO_ORIGIN_CIRCUIT(circ);
if (origin_circ->unusable_for_new_conns)
    continue;
...

That way, we won't assert and crash in TO_ORIGIN_CIRCUIT() if the check turns out to be needed now, or if someone breaks that assumption in calling code in future.

Also, I find add_flags(1, 1, 1) much less readable than flags = (CIRCLAUNCH_NEED_CAPACITY | CIRCLAUNCH_NEED_UPTIME | CIRCLAUNCH_IS_INTERNAL).

Can we either remove the add_flags part of the refactor, or keep the flag names?

Third, I extracted some magic numbers into named variables. I am not sure if the naming for these is the best, please let me know ideas for better names. https://github.com/chelseakomlo/tor_patches/commit/9ea57c857b3c5939ca3394e017f8de4ea1ac9d08

I don't think the name SUFFICIENT_UNUSED_OPEN_CIRCUITS captures the intent here:

-    if (num < MAX_UNUSED_OPEN_CIRCUITS-2
+    if (num < MAX_UNUSED_OPEN_CIRCUITS-SUFFICIENT_UNUSED_OPEN_CIRCUITS

The comment above that explains why we subtract 2:

  /* Finally, check to see if we still need more circuits to learn
   * a good build timeout. But if we're close to our max number we
   * want, don't do another -- we want to leave a few slots open so
   * we can still build circuits preemptively as needed.

I suggest instead:

#define CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS 2
#define CBT_MAX_UNUSED_OPEN_CIRCUITS (MAX_UNUSED_OPEN_CIRCUITS-CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS)

And then putting the defines under the existing comment.

Thanks for these notes- I agree on all of these. I'll submit these changes here today.

Ok, so here are the updates from teor's review. I couldn't think of how to extract add_flags in a clean way, so I kept this as it was before the refactor. I also added the recommended defines.

See branch circuituse, git@github.com:chelseakomlo/tor_patches.git

I added a separate commit for the refactor to remove the check for CIRCUIT_IS_ORIGIN in circuit_is_available_for_use. The refactor is because we first check the purpose of the circuit to see if it is an origin circuit, but then we later reject all cicuits that do not have the purpose CIRCUIT_PURPOSE_C_GENERAL.

We shouldn't need to use the BUG macro because circuit_is_available_for_use should be able to accept both origin and non-origin circuits, and silently return if the circuit's purpose is not CIRCUIT_PURPOSE_C_GENERAL (which I believe also entails that it is an origin circuit).

#define CIRCUIT_PURPOSE_IS_ORIGIN(p) ((p)>CIRCUIT_PURPOSE_OR_MAX_) #define CIRCUIT_PURPOSE_OR_MAX_ 4 #define CIRCUIT_PURPOSE_C_GENERAL 5

If this is the case, we shouldn't be able to crash in TO_ORIGIN_CIRCUIT() as all circuits that reach this point will have the purpose CIRCUIT_PURPOSE_C_GENERAL and therefore (I think) are origin circuits.

Let me know what you think!

Replying to chelseakomlo:

... I added a separate commit for the refactor to remove the check for CIRCUIT_IS_ORIGIN in circuit_is_available_for_use. The refactor is because we first check the purpose of the circuit to see if it is an origin circuit, but then we later reject all cicuits that do not have the purpose CIRCUIT_PURPOSE_C_GENERAL.

We shouldn't need to use the BUG macro because circuit_is_available_for_use should be able to accept both origin and non-origin circuits, and silently return if the circuit's purpose is not CIRCUIT_PURPOSE_C_GENERAL (which I believe also entails that it is an origin circuit).

#define CIRCUIT_PURPOSE_IS_ORIGIN(p) ((p)>CIRCUIT_PURPOSE_OR_MAX_) #define CIRCUIT_PURPOSE_OR_MAX_ 4 #define CIRCUIT_PURPOSE_C_GENERAL 5

If this is the case, we shouldn't be able to crash in TO_ORIGIN_CIRCUIT() as all circuits that reach this point will have the purpose CIRCUIT_PURPOSE_C_GENERAL and therefore (I think) are origin circuits.

I think you are confusing CIRCUIT_PURPOSE_IS_ORIGIN() and CIRCUIT_IS_ORIGIN().

And while I agree that is the current logic, I have seen too many cases like this where calling code breaks an assumption, and then the called code crashes.

Replying to teor:

Replying to chelseakomlo:

... I added a separate commit for the refactor to remove the check for CIRCUIT_IS_ORIGIN in circuit_is_available_for_use. The refactor is because we first check the purpose of the circuit to see if it is an origin circuit, but then we later reject all cicuits that do not have the purpose CIRCUIT_PURPOSE_C_GENERAL.

We shouldn't need to use the BUG macro because circuit_is_available_for_use should be able to accept both origin and non-origin circuits, and silently return if the circuit's purpose is not CIRCUIT_PURPOSE_C_GENERAL (which I believe also entails that it is an origin circuit).

#define CIRCUIT_PURPOSE_IS_ORIGIN(p) ((p)>CIRCUIT_PURPOSE_OR_MAX_) #define CIRCUIT_PURPOSE_OR_MAX_ 4 #define CIRCUIT_PURPOSE_C_GENERAL 5

If this is the case, we shouldn't be able to crash in TO_ORIGIN_CIRCUIT() as all circuits that reach this point will have the purpose CIRCUIT_PURPOSE_C_GENERAL and therefore (I think) are origin circuits.

I think you are confusing CIRCUIT_PURPOSE_IS_ORIGIN() and CIRCUIT_IS_ORIGIN().

Oops, no, I am confusing them:

CIRCUIT_PURPOSE_IS_ORIGIN() and CIRCUIT_IS_ORIGIN() are equivalent - one tests the purpose itself, the other tests the circuit.

But TO_ORIGIN_CIRCUIT does tor_assert(x->magic == ORIGIN_CIRCUIT_MAGIC);, and does not check the purpose. So it is not equivalent to either of those other macros.

And while I agree that is the current logic, I have seen too many cases like this where calling code breaks an assumption, and then the called code crashes.

In particular, the assumption that if a circuit satisfies purpose == CIRCUIT_PURPOSE_C_GENERAL, it will also have the right magic number.

But in the end, I think you're right, because assert_circuit_ok() checks that invariant. So we can skip that check.

Ok, great, that is good everything checks out for that refactor, thanks!

Let me know if you have further recommendations for better naming or anything else that can be extracted from circuit_predict_and_launch_new. I can look at adding more unit tests, to needs_hs_client_circuits and circuit_predict_and_launch_new as well.

I can help with investigating how the behavior could be improved, but it would be good to get some ideas on how to start. I.e, what constitutes better/worse behavior.

Replying to chelseakomlo:

Ok, great, that is good everything checks out for that refactor, thanks!

Let me know if you have further recommendations for better naming or anything else that can be extracted from circuit_predict_and_launch_new. I can look at adding more unit tests, to needs_hs_client_circuits and circuit_predict_and_launch_new as well.

I think this looks good, and can be merged as-is. But I'd like someone else to review it too.

I can help with investigating how the behavior could be improved, but it would be good to get some ideas on how to start. I.e, what constitutes better/worse behavior.

Given the length of this ticket, I'd put behaviour changes in another ticket. For example, there are existing tickets relating to this function: an enhancement in #17360 (moved), and a bug in #17359 (moved).

Replying to teor:

Ok, great, that is good everything checks out for that refactor, thanks!

I think this looks good, and can be merged as-is. But I'd like someone else to review it too.

Sounds great! My hesitation around adding more unit tests is the level of mocking that is necessary to test certain functions. Maybe that topic can be a larger conversation.

Syntax nitpick. This has an extra line as it should be MOCK_IMPL(int,. See basically how we do it across the code.

+MOCK_IMPL(
+int,
+circuit_all_predicted_ports_handled, (time_t now, int *need_uptime,
+                                      int *need_capacity))

This comment confuses me as it doesn't appear to indicate what the function does or return. As I understand it, the function returns true iff the circuit matches some criteria that made it available for use. Documenting those criteria would be useful here. Also, what kind of circuit can be passed. Any kind? Or origin only or? as I see that it must be a GENERAL purpose for instance.

+/* Figure out how many circuits we have open that we can use. */
+STATIC int
+circuit_is_available_for_use(circuit_t *circ)

I would const circuit_t *circ here and then use CONST_TO_ORIGIN_CIRCUIT().

circuit_is_available_for_use(circuit_t *circ)

Syntax nitpicK: all the needs_* have indentation issues. They should be aligned to each other like such:

  return (!circuit_all_predicted_ports_handled(now, needs_uptime,
                                               needs_capacity) &&
          router_have_consensus_path() == CONSENSUS_PATH_EXIT);

In the case of this one above, it goes beyond 79 chars on the first line thus the newline. Note that ideally we like && to be at the end of the line instead of beginning.

Further more on this function. This one is tricky as needs_uptime and needs_capacity are only set if we do need more Exit circuits so I would mention it in the comment as there is only one return value that sets them.

needs_exit_circuits(time_t now, int *needs_uptime, int *needs_capacity)

I would honestly break this one down inside the function. This has quite the complicated conditions and the clearer the better as right now it's not that easy to get and also very easy to misunderstand:

needs_hs_client_circuits(...)

Syntax:

#define CBT_MAX_UNUSED_OPEN_CIRCUITS (MAX_UNUSED_OPEN_CIRCUITS- \
    CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS)

to

#define CBT_MAX_UNUSED_OPEN_CIRCUITS (MAX_UNUSED_OPEN_CIRCUITS - \
                                     CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS)
}}}

* This condition has disappeared which is not good as `TO_ORIGIN_CIRCUIT()` in `circuit_is_available_for_use()` will explode if it's not an origin circuit.
{{{
-    if (!CIRCUIT_IS_ORIGIN(circ))
-      continue;

and furthermore, this also could explode: {{{ cpath_build_state_t *build_state = TO_ORIGIN_CIRCUIT(circ)->build_state; }}}

That's it for now. Note to myself, next time use Gitlab for code review as it would have been easier for you, sorry about that :).

Trac:
Status: needs_review to needs_revision

Ok, see changes at git@github.com:chelseakomlo/tor_patches.git, branch circuituse. Thanks for all the great feedback!

This comment confuses me as it doesn't appear to indicate what the function does or return. As I understand it, the function returns true iff the circuit matches some criteria that made it available for use. Documenting those criteria would be useful here. Also, what kind of circuit can be passed. Any kind? Or origin only or? as I see that it must be a GENERAL purpose for instance.
+/* Figure out how many circuits we have open that we can use. */
+STATIC int
+circuit_is_available_for_use(circuit_t *circ)
}}}

Great, let me know if it is more clear now.

I would honestly break this one down inside the function. This has quite the complicated conditions and the clearer the better as right now it's not that easy to get and also very easy to misunderstand: {{{ needs_hs_client_circuits(...) }}}

Ok, I pulled some of the logic out into separate variables. Let me know if this makes it more readable/easy to understand.

This condition has disappeared which is not good as TO_ORIGIN_CIRCUIT() in circuit_is_available_for_use() will explode if it's not an origin circuit. {{{
if (!CIRCUIT_IS_ORIGIN(circ))
 continue;
}}}

and furthermore, this also could explode: {{{ cpath_build_state_t *build_state = TO_ORIGIN_CIRCUIT(circ)->build_state; }}}

Hey! Ok, here is my understanding:

A circuit with purpose CIRCUIT_PURPOSE_C_GENERAL will always be an origin circuit, because:

in or.h {{{ #define CIRCUIT_IS_ORIGIN(c) (CIRCUIT_PURPOSE_IS_ORIGIN((c)->purpose)) #define CIRCUIT_PURPOSE_IS_ORIGIN(p) ((p)>CIRCUIT_PURPOSE_OR_MAX_) #define CIRCUIT_PURPOSE_OR_MAX_ 4 #define CIRCUIT_PURPOSE_C_GENERAL 5


so in circuit_is_available_for_use, circuituse.c

`origin_circ = TO_ORIGIN_CIRCUIT(circ);`

should not explode, because of the previous check:

if (circ->purpose != CIRCUIT_PURPOSE_C_GENERAL) return 0;/* only pay attention to general purpose circs */


So that is the reason we should be able to safely remove

if (!CIRCUIT_IS_ORIGIN(circ))
  continue;


Let me know if you have a different understanding of this.

Trac:
Status: needs_revision to needs_review

Replying to chelseakomlo: [snip]

This condition has disappeared which is not good as TO_ORIGIN_CIRCUIT() in circuit_is_available_for_use() will explode if it's not an origin circuit.
-    if (!CIRCUIT_IS_ORIGIN(circ))
-      continue;
}}}

 and furthermore, this also could explode:
 {{{
cpath_build_state_t *build_state = TO_ORIGIN_CIRCUIT(circ)->build_state;
 }}}
Hey! Ok, here is my understanding:

A circuit with purpose CIRCUIT_PURPOSE_C_GENERAL will always be an origin circuit, because:

in or.h {{{ #define CIRCUIT_IS_ORIGIN(c) (CIRCUIT_PURPOSE_IS_ORIGIN((c)->purpose)) #define CIRCUIT_PURPOSE_IS_ORIGIN(p) ((p)>CIRCUIT_PURPOSE_OR_MAX_) #define CIRCUIT_PURPOSE_OR_MAX_ 4 #define CIRCUIT_PURPOSE_C_GENERAL 5 }}}

I see that CIRCUIT_IS_ORIGIN(c) translates to if c > CIRCUIT_PURPOSE_OR_MAX_ then circ is origin. We have lots of purposes that are above CIRCUIT_PURPOSE_C_GENERAL = 5 in values like a client doing an HS introduction to a relay CIRCUIT_PURPOSE_C_INTRODUCING = 6. Am I missing something here?

Origin circuits means that we (tor) initiated them and just with the HS, there are lots of purposes.

I think you missed this syntax fix as well:

{{{ #define CBT_MAX_UNUSED_OPEN_CIRCUITS (MAX_UNUSED_OPEN_CIRCUITS-
CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS)


**Trac**:  
**Status**: needs_review **to** needs_revision

Replying to dgoulet:

Replying to chelseakomlo: [snip]

Hey! Ok, here is my understanding:

A circuit with purpose CIRCUIT_PURPOSE_C_GENERAL will always be an origin circuit, because:

in or.h {{{ #define CIRCUIT_IS_ORIGIN(c) (CIRCUIT_PURPOSE_IS_ORIGIN((c)->purpose)) #define CIRCUIT_PURPOSE_IS_ORIGIN(p) ((p)>CIRCUIT_PURPOSE_OR_MAX_) #define CIRCUIT_PURPOSE_OR_MAX_ 4 #define CIRCUIT_PURPOSE_C_GENERAL 5 }}}

I see that CIRCUIT_IS_ORIGIN(c) translates to if c > CIRCUIT_PURPOSE_OR_MAX_ then circ is origin. We have lots of purposes that are above CIRCUIT_PURPOSE_C_GENERAL = 5 in values like a client doing an HS introduction to a relay CIRCUIT_PURPOSE_C_INTRODUCING = 6. Am I missing something here?

Origin circuits means that we (tor) initiated them and just with the HS, there are lots of purposes.

In circuit_is_available_for_use, we exclude all circuits that don't have purpose CIRCUIT_PURPOSE_C_GENERAL, which is an origin circuit. See line 1023.

I can add a comment in the code explaining this if it isn't clear in the code.

I think you missed this syntax fix as well:

{{{ #define CBT_MAX_UNUSED_OPEN_CIRCUITS (MAX_UNUSED_OPEN_CIRCUITS-
CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS) }}}

That change is in commit extract magic numbers in circuituse.c

Oh I see right... only general purpose so it's safe. Ok I think it's fine indeed and yes adding a comment saying "Well GENERAL purpose has to be an origin circuit" would be good idea so we are clear that after those conditions downcasting to an origin circuit is safe :).

The rest lgtm;

See changes at git@github.com:chelseakomlo/tor_patches.git, branch circuituse.

Changes added:

Syntax fix for CBT_MIN_REMAINING_PREEMPTIVE_CIRCUITS
Comment to explain general purpose circuits are always origin circuits.

Trac:
Status: needs_revision to needs_review

Thanks chelsea! This lgtm; Builds properly and test passes.

Trac:
Status: needs_review to merge_ready
Reviewer: dgoulet to dgoulet, teor

Okay, cool; this looks like a straightforward function-extraction refactoring pattern.

I'd request documentation on the extracted function in the unit tests, and on the new constants, but that isn't a showstopper, since nothing is getting less documented in this patch.

The indentation inside the "needs_circuits_for_build(num)" block needs to get cleaned up. Also, the spacing "needs_circuits_for_build" function itself violates the K&R C style we use elsewhere.

For 0be9da8b6e016a00e234ad023513f44f7ba76843, instead of removing the check, I'd prefer to keep it, and add a BUG() wrapper around it.

All of these are pretty trivial; I'll do them when I merge unless somebody else wants to do them first. (Please use separate commits if you decide to do them yourself.)

Hey Nick! Thanks for the review. I added these changes as new commits here: git@github.com:chelseakomlo/tor_patches.git, branch circuituse.

I'd request documentation on the extracted function in the unit tests, and on the new constants, but that isn't a showstopper, since nothing is getting less documented in this patch.

Great, I added documentation for these.

The indentation inside the "needs_circuits_for_build(num)" block needs to get cleaned up. Also, the spacing "needs_circuits_for_build" function itself violates the K&R C style we use elsewhere.

I wasn't able to see the indentation issue here. I did fix the K&R style issue although i kept the inner brace (dgoulet style) but indented correctly.

For 0be9da8b6e016a00e234ad023513f44f7ba76843, instead of removing the check, I'd prefer to keep it, and add a BUG() wrapper around it.

The BUG wrapper actually won't work here as this check isn't a bug, it just is redundant. I added this case back in the possibility that it is needed without being immediately obvious...

One final thing I was thinking about is naming. In several places in the code, I use the term "hs servers" , but that probably isn't right. I've heard the term "hidden service routers" or just "hidden services." If you have consistent naming for these that you prefer, let me know and I can fix those up as well. For example, needs_hs_server_circuits and SUFFICIENT_UPTIME_INTERNAL_HS_SERVERS

Thanks for all the help!

Trac:
Status: merge_ready to needs_review

Trac:
Keywords: N/A deleted, review-group-13 added

Trac:
Status: needs_review to assigned
Owner: N/A to chelseakomlo

One final thing I was thinking about is naming. In several places in the code, I use the term "hs servers" , but that probably isn't right. I've heard the term "hidden service routers" or just "hidden services." If you have consistent naming for these that you prefer, let me know and I can fix those up as well. For example, needs_hs_server_circuits and SUFFICIENT_UPTIME_INTERNAL_HS_SERVERS

Ok, I chatted with asn, and hs_servers is fine, especially because it makes the distinction from clients in this case.

I made fixes for Nick's feedback, final code at git@github.com:chelseakomlo/tor_patches.git, branch circuituse.

Trac:
Status: assigned to merge_ready

Looks nice; merging. Thanks!

Trac:
Resolution: N/A to implemented
Status: merge_ready to closed

closed

moved to tpo/core/tor#18873 (closed)

Refactor circuit_predict_and_launch_new()

Child items 0

Activity