Opened 23 months ago

Closed 5 months ago

#17592 closed defect (fixed)

Clean up connection timeout logic

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: TorCoreTeam-postponed-201604, nickm-deferred-20160905, review-group-9, nickm-deferred-20161005, sponsor2
Cc: nickm Actual Points:
Parent ID: #16861 Points: see-parent
Reviewer: nickm Sponsor:

Description

In #6799, it was decided to keep TLS connections open for a random amount of time after they are idle, to defend against an attack that used internal Tor network connectivity information to determine Guard nodes (Slides: https://www.cryptolux.org/images/8/85/ESORICS-2012-Presentation-2.pdf Paper: https://eprint.iacr.org/2012/432.pdf).

Unfortunately, this logic (in connection_or_set_canonical()) is kind of a mess. Relays and clients are treated the same, and client connections are also kept alive for an additional hour by predictive circuit building even when otherwise idle, where as relays are not.

If we treat relays and clients separately for this timeout, we could reduce total client keep-alive time significantly (down to 30 minutes or so), and significantly increase relay connection lifetime. This should result in reduced total connection counts on relays, with better defenses against Torscan.

This is also needed in order to put reasonable bounds on padding overhead in #16861 for mobile clients. Furthermore, aside from the wieners running middle relays behind junky home routers who will whine about connection overflow, having a more well-connected Tor network is a good idea for many reasons (not just Torscan).

Child Tickets

Change History (37)

comment:1 Changed 23 months ago by mikeperry

Ok, I think I want to combine CircuitIdleTimeout and PredictedPortsRelevanceTime into a single option (call it CircuitsAvailableTimeout?) and also randomize the value by some range when it is used.

Where CircuitIdleTimeout is currently used, I would sample a random timeout value on circuit creation and store it in origin_circuit_t. Where PredictedPortsRelevantTime is used, I think the right thing to do is to sample a new value whenever the list of predicted ports is empty.

For the TLS connection timeout, I want to explicitly separate canonical relay connections from client connections and non-canonical relay connections, and make the relay connection timeout be on the order of an hour (randomized +/- 25%) and be controlled by a consensus parameter. The client TLS connection timeout can be much shorter, since client TLS lifespan will be governed primarily by circuit activity (which will be controlled via CircuitsAvailableTimeout).

With these two sets of changes, it will be much easier to control how long TLS connections live (and thus more easily control network activity and padding), for both relays and clients.

comment:2 Changed 23 months ago by mikeperry

FYI: I came to the conclusion to combine CircuitIdleTimeout and PredictedPortsRelevanceTime based on arma's comments here: https://trac.torproject.org/projects/tor/ticket/9176#comment:7.

comment:3 Changed 23 months ago by nickm

Priority: MediumHigh

comment:4 Changed 23 months ago by mikeperry

Status: newneeds_review

FYI: This commit implements the changes in comment:2 https://gitweb.torproject.org/mikeperry/tor.git/commit/?h=netflow_padding-v3-squashed&id=9ca83e5c82a9af0da18278ea64ab13e75f9dadb2.

I've tested it a bit in chutney and it has unittests. I'm posting it now for a quick review for any major refactoring/structural issues before I do more involved chutney/live network testing.

comment:5 Changed 22 months ago by mikeperry

Status: needs_reviewneeds_revision

I have a new version of this rebased to the new origin/master (with periodic events). Testing that now.

comment:6 Changed 22 months ago by mikeperry

Status: needs_revisionneeds_review

comment:7 Changed 22 months ago by teor

This patch looks good to me in general.

I like the change from port 80 as the default predicted port to port 443.

In options_validate:

  • Should we put minimum and maximum values on CircuitsAvailableTimeout, like the previous code for PredictedPortsRelevanceTime?

A nitpick:

In channelpadding_get_channel_idle_timeout:

  • I think server_mode() is the standard way of checking options->BridgeRelay || !options->ORPort_set. It does a few more checks, as far as I recall.

comment:8 Changed 22 months ago by dgoulet

Status: needs_reviewneeds_revision

comment:9 in reply to:  7 ; Changed 22 months ago by mikeperry

Status: needs_revisionneeds_review

Replying to teor:

This patch looks good to me in general.

I like the change from port 80 as the default predicted port to port 443.

In options_validate:

  • Should we put minimum and maximum values on CircuitsAvailableTimeout, like the previous code for PredictedPortsRelevanceTime?

I think the only reason why we had minimum and maximum values here was due to the concerns about discrepancy between PredictedPortsRelevanceTime and CircuitIdleTimeout. CircuitIdleTimeout never had any ranges or limits. At least, that was my read of https://trac.torproject.org/projects/tor/ticket/9176#comment:7.

A nitpick:

In channelpadding_get_channel_idle_timeout:

  • I think server_mode() is the standard way of checking options->BridgeRelay || !options->ORPort_set. It does a few more checks, as far as I recall.

Ok, I fixed this and another instance in fixup commits atop netflow_padding-v4.

comment:10 in reply to:  9 Changed 22 months ago by teor

Replying to mikeperry:

Replying to teor:

In channelpadding_get_channel_idle_timeout:

  • I think server_mode() is the standard way of checking options->BridgeRelay || !options->ORPort_set. It does a few more checks, as far as I recall.

Ok, I fixed this and another instance in fixup commits atop netflow_padding-v4.

These fixups all look good. Are we going to merge this? :-)

Last edited 22 months ago by teor (previous) (diff)

comment:11 Changed 21 months ago by nickm

I've squashed the three major commits here into a new "netflow_padding-v4_squashed" branch so I can review them one-by-one.

comment:12 Changed 20 months ago by nickm

Status: needs_reviewneeds_revision
  • channelpadding_get_channel_idle_timeout():
    • The documentation for channelpadding_get_channel_idle_timeout should describe units. It's seconds, right?
    • Should the magic numbers in the client-or-noncanonical case uses magic numbers. Should it use a network parameter instead, like the other stuff?
    • Let's document what a "circuit idle timeout" is.
    • How come this one doesn't look at any optoins like channelpadding_get_circuits_available_timeout does? Should it be looking at CircuitsAvailableTimeout?
  • channelpadding_get_circuits_available_timeout():
    • Let's also document in the function comment when this timeout controls exactly.
  • predicted_ports_prediction_time_remaining():
    • time_t differences are not guaranteed to fit into an int.

General stuff:

  • Should we be looking at monotonic time for any of this?

Otherwise looks good! And also solid!

comment:13 Changed 20 months ago by mikeperry

Status: needs_revisionneeds_review

These issues should all be fixed in e1c57775ed8a09a8767bf4d74db94d04c9fe1659 in mikeperry/netflow_padding-v4_squashed+rebased. I opted to explicitly check for time_t overflow/underflow and associated clock jumps rather than use monotonic time. I don't like the idea of waiting for the clock to catch up with itself here, either. I'd rather reset the timeout in that case.

comment:14 Changed 19 months ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

These seem like features, or like other stuff unlikely to be possible this month. Bumping them to 0.2.9

comment:15 Changed 18 months ago by nickm

Points: see-parent

comment:16 Changed 18 months ago by nickm

Keywords: TorCoreTeam201604 added

Every postponed needs_review ticket should get a review in April

comment:17 Changed 18 months ago by nickm

Reviewer: nickm

comment:18 Changed 17 months ago by nickm

Keywords: TorCoreTeam-postponed-201604 TorCoreTeam201605 added; TorCoreTeam201604 removed

I will not be getting these revised and reviewed this week. I hold out hope for May. Sorry mike. Please let me know whether you want to revise them wrt my handles/timing patches, or whether I should. I'm happy either way.

comment:19 Changed 17 months ago by nickm

Status: needs_reviewneeds_revision

My current understanding here is that mike means to revise this branch based on other merges we're doing. Moving these to needs_revision in the meantime. Please let me know if I'm incorrect.

comment:20 Changed 16 months ago by nickm

Keywords: TorCoreTeam201605 removed

Remove "TorCoreTeam201605" keyword. The time machine is broken.

comment:21 Changed 13 months ago by nickm

Keywords: nickm-deferred-20160905 added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

Deferring many tickets that are in needs_revision with no progress noted for a while, where I think they could safely wait till 0.3.0 or later.

Please feel free to move these back to 0.2.9 if you finish the revisions soon.

comment:22 Changed 13 months ago by mikeperry

Milestone: Tor: 0.2.???Tor: 0.2.9.x-final
Status: needs_revisionneeds_review

Alright. I switched the code over to using the new handle, monotonic timer, and timer wheel abstractions. All unit tests pass without leaks from this code (though the unit tests have grown new memory leaks of their own).

mikeperry/netflow_padding-v6. The commit specific to this bug is 0e709402d9fe5cb48188bf65e335dcdedadb268b.

comment:23 Changed 13 months ago by nickm

Keywords: review-group-9 added

comment:24 Changed 12 months ago by nickm

Haven't reviewed this one yet, but I will, on https://gitlab.com/nickm_tor/tor/merge_requests/8 .

comment:25 Changed 12 months ago by nickm

Status: needs_reviewneeds_revision

Okay, review done. I have probably misunderstood a few important things. Please feel free to focus on correcting my misunderstandings here, so that I can say smarter things about the code.

comment:26 Changed 12 months ago by mikeperry

Status: needs_revisionneeds_review

comment:27 Changed 12 months ago by nickm

Keywords: nickm-deferred-20161005 added
Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final

Deferring big/risky-feature things (even the ones I really love!) to 0.3.0. Please argue if I'm wrong.

comment:28 Changed 11 months ago by nickm

Keywords: review-group-11 added

comment:29 Changed 10 months ago by nickm

Keywords: review-group-12 added; review-group-11 removed

comment:30 Changed 10 months ago by nickm

Keywords: review-group-13 added; review-group-12 removed

comment:31 Changed 9 months ago by nickm

Keywords: review-group-14 added; review-group-13 removed

And that's all for review-group-13.

comment:32 Changed 9 months ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

comment:33 Changed 9 months ago by nickm

Keywords: review-group-14 removed

comment:34 Changed 7 months ago by nickm

Keywords: review-group-16 added

comment:35 Changed 7 months ago by nickm

Keywords: sponsor2 added

comment:36 Changed 6 months ago by nickm

Keywords: review-group-16 removed

comment:37 Changed 5 months ago by nickm

Resolution: fixed
Status: needs_reviewclosed

merging parent

Note: See TracTickets for help on using tickets.