Opened 4 months ago

Last modified 2 days ago

#28780 needs_review defect

circpadding: Add machine flag for not closing circuit if machine is active

Reported by: asn Owned by:
Priority: Very High Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: wtf-pad, tor-relay, tor-cell, padding, 041-proposed, network-team-roadmap-2019-Q1Q2
Cc: nickm Actual Points: 6
Parent ID: #28634 Points: 5
Reviewer: Sponsor: Sponsor2

Description

Child Tickets

Change History (26)

comment:1 Changed 4 months ago by asn

I attempted to implement this feature in my branch bug28780: https://github.com/torproject/tor/pull/630

The branch is based on adaptive_padding-final from #28142 and contains tests.

Let me know if you like the approach :)

comment:2 Changed 4 months ago by asn

Status: newneeds_review

comment:3 Changed 3 months ago by asn

Potential issue with branch above: What happens if one side closes the circuit anyhow, while the other side (with the machine enabled) keeps it alive even if the other side is closed?

comment:4 Changed 3 months ago by dgoulet

Reviewer: mikeperry

comment:5 Changed 3 months ago by mikeperry

Status: needs_reviewneeds_revision

Needs revision wrt to the one-side close issue and my comments in the PR.

comment:6 Changed 3 months ago by asn

Status: needs_revisionmerge_ready

I pushed Mike's latest branch here so that CI runs: https://github.com/torproject/tor/pull/644

I'm OK with the purpose-based approach, but now I see that we are using both my "special flag" and also the "special purpose" so we moved from one special thing to two. In this case, can you please motivate the new purpose further in the docs of CIRCUIT_PURPOSE_C_CIRCUIT_PADDING. IIUC, the new purpose is so that Tor does not attach
new streams to the circuit, or are there more reasons?

If that's the only reason, perhaps it would be wise to ask feedback from Nick to see if a new purpose is the best way to achieve that goal.

comment:7 Changed 3 months ago by asn

Status: merge_readyneeds_revision

Oops, marked as merge_ready before, switching to needs_rev.

Last edited 3 months ago by asn (previous) (diff)

comment:8 Changed 3 months ago by mikeperry

Status: needs_revisionneeds_review

Ok I switched to checking the purpose instead of the flag, and removed the flag.

While thinking about #28634, I started to realize we may need more here. We may want to flip this back to C_GENERAL isntead of closing it. Or at least think about how we would make this circuit behave more like it was real. Not sure about hidden service circuits, but general circuits are marked dirty and allowed to sit around for 10 minutes after use before being closed. I am not sure the best way to do this, though. Maybe we do need some kind of timer that does not send a packet but just causes a state transition.. Bleh.

I'm also getting more and more pessimistic about doing this or #28634 by Tuesday.. I think it is more important that what we deploy is sane and good rather than just does something. I think if we act early enough in 0.4.1 to make more reasonable machines, we'll have plenty of testing.

comment:9 in reply to:  8 Changed 3 months ago by asn

Replying to mikeperry:

I'm also getting more and more pessimistic about doing this or #28634 by Tuesday.. I think it is more important that what we deploy is sane and good rather than just does something. I think if we act early enough in 0.4.1 to make more reasonable machines, we'll have plenty of testing.

Agreed. Let's relax and focus on #28142 till then.

comment:10 Changed 3 months ago by asn

Milestone: Tor: 0.4.0.x-finalTor: unspecified

comment:11 Changed 3 months ago by mikeperry

Reviewer: mikeperry
Status: needs_reviewneeds_revision

Setting this to needs_revision while we think about what to do about the ditry and other time-based teardowns.

comment:12 Changed 3 months ago by mikeperry

Keywords: 041-proposed added

comment:13 Changed 3 months ago by mikeperry

Points: 5

(We've done about 2 points on this so far.. maybe 3 more to go, depending on the time thing?)

comment:14 Changed 3 months ago by mikeperry

Priority: MediumImmediate

comment:15 Changed 3 months ago by mikeperry

Priority: ImmediateVery High

comment:16 Changed 3 months ago by nickm

Sponsor: Sponsor2

comment:17 Changed 6 weeks ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

comment:18 Changed 12 days ago by mikeperry

Status: needs_revisionneeds_review

Ok I think the best way to handle comment:8 is to not mark circuit for close upon transition to the end state. Then, circuit_expire_old_circuits_clientside() should cull them as per existing dirtyness timeouts. This is done in an extra fixup commit on https://github.com/torproject/tor/pull/644.

However, I am still going back and forth between purpose vs flag. Here's the Pro/Con:

Purpose:

  • Pros:
    1. We don't have to worry about hidden service code and circuit prediction code getting confused by outstanding circuits of a particular type, with particular state, that they thought they closed.
    2. In logs, controller handling, and everywhere else in the code, it is very clear what this circuit is doing and what it is for. Code is very unlikely to accidentally use it for something else.
  • Cons:
    1. State machines must remember to include CIRCUIT_PURPOSE_C_PADDING in their purpose masks, or they will shut down as soon as the purpose changes and cause chaos when other machines pick up this purpose.
    2. Changing purposes of things is a little scary. But we already do it; the code should support it (but: #29034 is one major outstanding instance where it does not properly do so).

Flag:

  • Pros:
    1. Our padding machine conditions are simpler. They can apply to just the purpose they are for, and not worrying about adding this extra padding purpose if they happen to use the "please keep circ open" feature.
  • Cons:
    1. Lots of wide-ranging code can be confused because it tried to close a circuit, but then that circuit stays around with the same purpose, and does not have its marked for close flag set. It may try to re-use these circuits in surprising places.

After this analysis, I am inclined to say that the purpose-based way is the way to go here. So I'm going to set this to needs_review. Please do let me know if there are other pros/cons.

comment:19 Changed 11 days ago by mikeperry

Parent ID: #28632#28634

comment:20 Changed 11 days ago by asn

Status: needs_reviewneeds_revision

ACK on the analysis. Can you please provide a clean branch rebased on master, so that it's easier to review it? That way I can also easily apply it as part of my #28634 to make sure it works fine.

Hm. It also seems like some tests are broken.

Last edited 11 days ago by asn (previous) (diff)

comment:21 Changed 10 days ago by asn

Hmm, I can't get this to work with #28634. I did a quick rebase here:
https://github.com/torproject/tor/pull/927

I also added this log to signify when we reject to close a circuit:

   if (!circpad_circuit_should_be_marked_for_close(circ, reason)) {
+    log_warn(LD_GENERAL, "Don't close this yet!!!");
    return;
   }

There are also some extra logs so that it works with https://github.com/asn-d6/padanalyzer, but I think they are also useful for debugging here.

What I see happening is that the client will send an INTRODUCE_ACK, then the above log will trigger and the circuit will stay open, but no more padding will be sent whatsoever. BTW, due to the purposes approach I also had to add the CIRCUIT_PURPOSE_C_CIRCUIT_PADDING purpose to the intro machine otherwise the machine would be shutdown when the purpose change happened.

You can test this in chutney, and if you want to test it on the real net you can use this MiddleNodes 56927E61B51E6F363FB55498150A6DDFCF7077F2 which is running an old version of the #28634 branch (commit 15e0f36). Testing this on chutney will illustrate the problem, but note that the histogram latencies are too slow for localhost so you won't see much padding.

Finally, I also get pathbias_should_count(): Bug: Circuit 3 is now being counted despite being ignored in the past. Purpose is Circuit kept open for padding, path state is already counted (on Tor 0.4.1.0-alpha-dev 2a0158f4a3873845).

Last edited 10 days ago by asn (previous) (diff)

comment:22 Changed 10 days ago by nickm

Milestone: Tor: unspecifiedTor: 0.4.1.x-final

comment:23 Changed 10 days ago by mikeperry

Status: needs_revisionneeds_review

Ok, I think I figured out some of your problems. That check for CIRCPAD_TOKEN_REMOVAL_NONE in circpad_machine_setup_tokens() is needed for the padding to work correctly. The root issue was misusing machineinfo in one of the places to inspect mi->histogram_len instead of state->histogram_len (hidden behind the CIRCPAD_INFINITY_BIN() macro).

I fixed these and a simple test works: it pads after not closing the intro circ. And I don't get pathbias errors (though I don't know why).

I do have a very strange unit test failure in your tests.. I think it's due to you switching to shutting down the machine.. But oddly if I check for null on the failing line that checks client_side->padding_info[0]->current_state, the check passes. Heisenbug.. Yay.

Anyway I didn't have to make any changes to the PR for this branch for this, just additional fixes effectively to your #28632 code. So I'm setting this back to needs_review.

Oh, my fixes to your code/circpad token removal code are here: https://github.com/mikeperry-tor/tor/tree/bug28780%2B28634%2Bfixes%2Blogs%2Bmess

Last edited 10 days ago by mikeperry (previous) (diff)

comment:24 Changed 9 days ago by asn

Status: needs_reviewneeds_revision

Thanks for the help. That was very useful.

I rebased your branch on origin/master and added a fixup commit to make it compile and add some more comments (and a useful log): https://github.com/torproject/tor/pull/933

I agree with https://github.com/torproject/tor/pull/644/commits/f31a2810b589866af3834fcb25639f10b549ea4c but instead of removing that test line, could we use circuit_expire_old_circuits_clientside() in the test to demonstrate that it closes it?

Also, a changes file is needed. Also perhaps the commit structure can be split a bit better (add purpose, add function, add tests).

comment:25 Changed 3 days ago by asn

Hm, PR#933 is completely independent from #28634. I took your branch (PR#644) and I rebased it to origin/master and added a fixup commit.

comment:26 Changed 2 days ago by mikeperry

Actual Points: 6
Status: needs_revisionneeds_review

Ok, I added a test for circuits_expire_old_circuits_clientside(), and also realized that we want to mark the circuit dirty if it wasn't already, so that we use the dirtiness timers instead of the idle ones (we want to mimic an in-use/used circuit for this).

All this and more at https://github.com/torproject/tor/pull/966

Note: See TracTickets for help on using tickets.