Opened 17 months ago

Last modified 4 days ago

#26294 merge_ready defect

attacker can force intro point rotation by ddos

Reported by: arma Owned by: asn
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, tor-dos, network-team-roadmap-august, security
Cc: asn Actual Points: 6
Parent ID: #29999 Points: 7
Reviewer: dgoulet Sponsor: Sponsor27-must

Description

Currently, an onion service's intro points each expire (intentionally rotate) after receiving rand(16384, 16384*2) intro requests.

Imagine an attacker who generates many introduction attempts. Since each intro attempt can take its own path to the target intro point, the bottleneck will be the introduction circuit itself. Let's say that intro circuit can sustain 500KBytes/s of traffic. That's 1000 intro requests per second coming in -- so after 24ish seconds (rand(16,32)), that intro point will expire: the onion service will pick a new one and start publishing new onion descriptors.

If the intro circuit can handle 1MByte/s, then rotation will happen after 12ish seconds.

With three intro circuits, each receiving intro requests at a different rate, we could end up changing our descriptor even more often than this. There are at least four impacts from this attack:

(1) Onion services spend energy and bandwidth generating new intro circuits, and publishing new descriptors to list them.

(2) Clients might get the last onion descriptor, not the next one, and so they'll attempt to introduce to a circuit that's no longer listening.

(3) The intro points themselves get a surprise 16k-32k incoming circuits, probably plus a lot more after that because the attacker wouldn't know when to stop. Not only that, but for v2 onion services these circuits use the slower TAP as the circuit handshake at the intro point.

(4) The HSDirs get a new descriptor every few seconds, which aside from the bandwidth and circuit load, tells them that the onion service is under attack like this.

Intro points that can handle several megabytes of traffic per second will keep up and push the intro requests back to the onion service, thus hastening the rotation. Intro points that *can't* handle that traffic will become congested and no fun to use for others during the period of the attack.

The reason we rotate after 16k-32k requests is because the intro point keeps a replay cache, to avoid ever responding to a given intro request more than once.

One direction would be to work on bumping up the size of the replay cache, or designing a different data structure like a bloom filter so we can scale the replay cache better. I think we could succeed there. The benefits would be to (1) and (2) and (4) above, i.e. onion services won't spend so much time making new descriptors, and clients will be more likely to use an onion descriptor that's still accurate. The drawback would be to (3), where the hotspots last longer, that is, the poor intro point feels the damage for a longer period of time.

Taking a step back, I think there are two directions we can go here. Option one, we can try to scale to handle the load. We would focus on load balancing better, like reacting to an attack by choosing super fast intro points, and either choosing fast middle nodes too, or some fancier approach like having multiple circuits to your intro point. Option two, we recognize that this volume of introduction requests represents a problem in itself, and try to introduce defenses at the intro point level. Here we focus on proof of work schemes or other ways to slow down the flow, or we improve the protocol to pass along hints about how to sort the intro requests by priority.

Child Tickets

Change History (44)

comment:1 Changed 17 months ago by asn

Cc: asn added

Agreed this is an interesting and useful problem to work on.

comment:2 Changed 16 months ago by dgoulet

Keywords: tor-hs tor-dos added
Milestone: Tor: unspecified

comment:3 Changed 7 months ago by pili

Sponsor: Sponsor27

comment:4 Changed 6 months ago by asn

Sponsor: Sponsor27Sponsor27-must

comment:5 Changed 6 months ago by asn

Points: 7

There are some easy stuff we can do here. Assigning 7 points to do the easy stuff and think about future stuff, if we don't get to fix this completely.

comment:6 Changed 6 months ago by pili

Parent ID: #29999

comment:7 Changed 6 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

Add keyword to tickets in network team's roadmap.

comment:8 Changed 6 months ago by asn

Back when that ticket was filed, I also had the chance to meet with some onion service experts and independently discuss this issue. Here are some unpublished notes:

We decided that allowing this attack because of the replay cache is a red herring. Specifically, the replay cache is not that big with only 16k-32k requests so we could indeed grow it. Furthermore, we could also clear the cache after X requests and start with a new one; that would allow the attacker to replay each introduction once, but that's fine because making new intro requests is not *that heavy* anyway, and it's definitely better than allowing them to rotate our intro points non-stop.

Also it's important to realize that the replay cache is held on the HS-side and not on the intropoint-side. I just verified this in our codebase, because I was also confused about this! The HS keeps two (!) replay caches for each INTRODUCE2 cell: one is per-intropoint (v3: replay_cache / v2: accepted_intro_rsa_parts) and the other is per-HS and (v3: replay_cache_rend_cookie / v2: accepted_intro_dh_parts).

I think what we should do here is:

a) Short-term: Reevaluate our replay detection strategy and see whether it's indeed too heavy. Evaluate whether we need both caches. Evalute the size of our replay caches given X requests. Evaluate whether we can clear our replay caches after Y requests and just keep on using the same intro points.

c) Medium-term: Consider more high-level directions to handle big load, like proof of work, path selection, or other intro protocol.s

comment:9 Changed 5 months ago by asn

Owner: set to asn
Status: newassigned

comment:10 Changed 5 months ago by arma

I like 'a' as a short term plan.

comment:11 Changed 5 months ago by s7r

I like 'a' as a short term plan as well. Proof of work solutions are non trivial engineering challenges, consume time and it eventually still gets down to the simple question how much resources/work/time/bandwidth is the attacker willing to give to pull this of.

what if we add a time based lifetime for each intro point, which will be a random value chosen at intro point selection between n and m hours, along with a ALLOW_RESET_CACHE parameter which will be a random number between o and p and also maintain the intro requests lifetime rand(16384, 16384*2) which will be combined with ALLOW_RESET_CACHE, and rebuild descriptor when the first from these two is reached. This way we don't have to increase the cache but only reset it.

For example:
An onion service selects Z as intro point. It also chooses these random values and remembers them for this intro point:

  • time based lifetime = 5 hours (let's pretend n = 1; m = 6)
  • ALLOW_RESET_CACHE = 1400 (let's pretend ALLOW_RESET_CACHE = rand(100, 7000))
  • intro requests lifetime = 20122 (rand(16384, 16384*2)

Now, this intro point will be rotated either after 5 hours, if the onion service is not under attack, either after 20122 * 1400 = 28,170,800 intro requests.

If high values would have been chose for ALLOW_RESET_CACHE and intro requests lifetime, indeed we will be getting many introduction requests through the same introduction point, but we still have the time based lifetime parameter as a safety precaution that will eventually move us from this introduction point.

We can go even go more crazy about this and use the introduction point measured bandwidth or consensus weight so we choose parameters based on how much the intro point is actually able to support in terms of bandwidth, so we don't end up with maintaining an introduction point that is hammered and can't process the requests because it's too slow. Or find another way to check if the intro point is actually responding to intro requests. But even without these smarter computations the presented solution still has to be better than what we have now.

All 3 parameters must be randomized as described, otherwise we open the door for easier analysis and predictability for attackers, like estimate with high probability when will the intro point change occur, etc. (outside the scope of this ticket).

The numbers for time based lifetime and ALLOW_RESET_CACHE don't have any analysis behind, they are just from top of my head and only to illustrate and example about the logic we need to code. We need to evaluate and choose good parameters for these values, if we think this is a good idea.

comment:12 Changed 4 months ago by cypherpunks

My concern about a proof of work approach is it appears to open a back channel where a hidden service operator has influence over client behaviour. This could result in clients executing possibly rarely used/exploitable codepaths, or new correlation attacks. For example, the hidden service operator sets a requirement for a PoW that takes 1.21 KW to compute. The operator has also hacked in to an energy company with high resolution "smart" meters, then could sit back and watch as users login to the service.

comment:13 in reply to:  12 ; Changed 4 months ago by cypherbits

Replying to cypherpunks:

My concern about a proof of work approach is it appears to open a back channel where a hidden service operator has influence over client behaviour. This could result in clients executing possibly rarely used/exploitable codepaths, or new correlation attacks. For example, the hidden service operator sets a requirement for a PoW that takes 1.21 KW to compute. The operator has also hacked in to an energy company with high resolution "smart" meters, then could sit back and watch as users login to the service.

PoW should be a fixed value on the network consensus or hardcoded, if we want the HS to be capable of configuring it then we should limit the parameters. Thats it.


On the other hand I have two questions on the implementation and replay caches:

-How does the replay cache works for INTRODUCE1 cells? The bug allowing for the same circuit to send many INTRODUCE1 should be closed years ago.

-Why we actually rotate Introduction Points? and why we do it after x INTRODUCE cells and not based on a time, like each 24 hours?

comment:14 in reply to:  13 Changed 4 months ago by asn

Replying to cypherbits:

On the other hand I have two questions on the implementation and replay caches:

-How does the replay cache works for INTRODUCE1 cells? The bug allowing for the same circuit to send many INTRODUCE1 should be closed years ago.

-Why we actually rotate Introduction Points? and why we do it after x INTRODUCE cells and not based on a time, like each 24 hours?

Hello, this is not a discussion forum. Please use the mailing list for such discussions. Please see comment:8 for more info on the replay cache.

And yes, the plan with this ticket is to only rorate intro points based on time, and not based on number of introductions (see comment:8 again).

comment:15 Changed 4 months ago by asn

Actual Points: 4

comment:16 Changed 3 months ago by asn

Actual Points: 46
Status: assignedneeds_review

OK here we go: https://github.com/torproject/tor/pull/1163

The functionality was not so hard to do, but the tests were a real PITA to write since I needed to create a parseable INTRO2 cell (they actually look quite simple in the final branch but that took tons of experimentation and mocking to do).

WRT v3 code quality, I created a new periodic function called maintain_intro_point_replay_caches() which maintains the replay cache. An alternative (perhaps cleaner but definitely harder) approach would be to make this "max number of entries" to be a parameter of the replaycache and do the purging when we add elements as part of the replaycache subsystem. I tried to do this, but the replaycache code is kinda messy and I opted for the easier approach.

Also, I made good unittests for v3, but I never attempted to do the same for v2. It just seems like too much work, given how much work the v3 test was.

Finally, I have not tested this on chutney or the real network. This is something I need to do before putting it in merge_ready.

comment:17 Changed 3 months ago by dgoulet

Reviewer: dgoulet

comment:18 Changed 3 months ago by dgoulet

Status: needs_reviewneeds_revision

Two tiny comments. Else, this is solid! Put it in merge_ready once teor's and my comments have been addressed.

LGTM!

I'm currently running this on our test bed. We'll let you know if anything comes up but so far so good for upstream merge!

comment:19 Changed 3 months ago by gaba

Keywords: network-team-roadmap-august added; network-team-roadmap-2019-Q1Q2 removed

comment:20 in reply to:  18 Changed 2 months ago by asn

Status: needs_revisionneeds_review

Replying to dgoulet:

Two tiny comments. Else, this is solid! Put it in merge_ready once teor's and my comments have been addressed.

LGTM!

I'm currently running this on our test bed. We'll let you know if anything comes up but so far so good for upstream merge!

OK, I pushed a branch with fixups and rebased to latest master (because of the practracker changes): https://github.com/torproject/tor/pull/1199

Apart from the PR comments, it fixes a bug on the v2 side.

comment:21 Changed 2 months ago by dgoulet

Status: needs_reviewmerge_ready

Travis had a weird failure over "not having Internet for apt install" ... but else this is ready to go.

comment:22 Changed 2 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.4.2.x-final

Assuming this is for 0.4.2, and not backport?

comment:23 in reply to:  22 Changed 2 months ago by asn

Replying to nickm:

Assuming this is for 0.4.2, and not backport?

Agreed.

comment:24 Changed 2 months ago by nickm

Status: merge_readyneeds_revision

It looks like there are assertion failures reported in the chutney test here on travis.

comment:25 in reply to:  24 Changed 2 months ago by asn

Status: needs_revisionmerge_ready

Replying to nickm:

It looks like there are assertion failures reported in the chutney test here on travis.

Oops. I managed to reproduce this with chutney and fixed it. I pushed a fixup. The cause was that v2 initializes the replay cache upon receiving an INTRO2 cell, so it can be uninitialized in the beginning; I added a test against that.

I also pushed a squashed branch because there were some conflicts with autosquash:
https://github.com/torproject/tor/pull/1207

Marking as merge_ready assuming that all CI passes.

comment:26 Changed 2 months ago by nickm

I think this code looks okay but before we merge it, I think we should have a patch for tor-spec that explains the new behavior of the replay cache. We should also have a quick proposal that explains why it's safe to allow replays, since I've usually thought of them as a way to mount active traffic analysis attacks.

comment:27 in reply to:  26 Changed 2 months ago by asn

Replying to nickm:

I think this code looks okay but before we merge it, I think we should have a patch for tor-spec that explains the new behavior of the replay cache. We should also have a quick proposal that explains why it's safe to allow replays, since I've usually thought of them as a way to mount active traffic analysis attacks.

Here is a torspec patch: https://github.com/asn-d6/torspec/commit/f0fbcf3d606b8fb8ec49b1ba8f790607725dbd8b
https://github.com/asn-d6/torspec/tree/bug26294

We actually had not heard that replay caches are there to protect against traffic analysis attacks. How does the attack work? I considered that identical INTRO2 cells could be used as a signal to the HS guard, but since they are end-to-end encrypted the singal should not be visible, right?

comment:28 Changed 8 weeks ago by nickm

IIRC, the problem would be if an attacker found an introduce cell that they were very interested in, and replayed it a lot in order to see which rendezvous point got a bunch of retries.

comment:29 in reply to:  28 ; Changed 8 weeks ago by asn

Replying to nickm:

IIRC, the problem would be if an attacker found an introduce cell that they were very interested in, and replayed it a lot in order to see which rendezvous point got a bunch of retries.

Hm, I'd like some more help with understanding this attack. The replay cache refactored by this ticket is the one that protects against replays from the intro point. So assuming that a malicious intro can now do replays, how does it also have visibility on which rendezvous point gets the retries? And how does the knowledge of retry help the attacker get information about the client or the service?

comment:30 Changed 5 weeks ago by nickm

Keywords: security added

Mark defects that are missing the "security" tag.

comment:31 Changed 5 weeks ago by nickm

Mark some merge_ready tickets as 042-should

comment:32 Changed 5 weeks ago by nickm

Keywords: 042-should added

comment:33 in reply to:  29 ; Changed 5 weeks ago by arma

Replying to asn:

We actually had not heard that replay caches are there to protect against traffic analysis attacks. How does the attack work? I considered that identical INTRO2 cells could be used as a signal to the HS guard, but since they are end-to-end encrypted the singal should not be visible, right?

The encryption protects the payload, but not the communications metadata (timing and volume).

I worry about two impacts from replays by the intro point:

  • Capture an intro2 cell and later play it repeatedly, to create a pattern at the onion service's guard, at a time of our choosing. The replay cache at the onion service doesn't completely resolve this concern, because the intro point gets to send the cells before the onion service can realize they're replays. But if Mike succeeds at removing every side channel from the world, then the replayed intro2 cells are "legitimate" (i.e. expected and correctly formed) cells so without a replay cache there is no way to realize that they're not wanted.
  • Capture an intro2 cell and later play it repeatedly to create a pattern at the rendezvous point. This one is directly resolved by a replay cache at the onion service side. The impact is a bit subtle/indirect, but it would for example allow attacks where later you discover which rendezvous point a given introduction attempt used.

The generalization of that second issue is that you get to induce the onion service to interact with the Tor network, at a time and frequency of your choosing, when otherwise you shouldn't have that capability. That possibility seemed like a good building block to all sorts of traffic confirmation attacks, and that's why we put the replay cache in place.

I think the thinking has gone deeper since this original design, e.g. in the Vanguards discussion. So if we are ok with these issues, great. But at least now you know the original context. :)

Last edited 5 weeks ago by arma (previous) (diff)

comment:34 in reply to:  33 Changed 5 weeks ago by arma

Replying to arma:

The impact is a bit subtle/indirect, but it would for example allow attacks where later you discover which rendezvous point a given introduction attempt used.

For example, you could do this discovery by roving around the network looking at relays and seeing if they receive the burst of rendezvous attempts. Or you could run some fast inconsistent (i.e. not Guard) relays and get chosen sometimes as the hop before the rendezvous cell, and since our design doesn't use 'rendezvous guards', over time you become confident that the rendezvous point is the one receiving the connections more often than baseline.

If the intro point can guess what onion service it's an intro point for, it can look up the descriptor, discover the ephemeral key for its intro point, and do introductions itself. So the original goal was that if it *doesn't* know what onion service it's introducing to, it can't cause the onion service to make any circuits.

comment:35 Changed 5 weeks ago by s7r

The attacks are quite possible, but also the current replay cache behavior can be trivially gamed so the onion service will rotate intro points more often than we would normally want and thus trigger a different sybil type attack where eventually the onion service picks a hostile introduction point. Both time-based limit and number of introductions limit are important and mitigate different thread models. The first one lowers the potential "rotate too often" while the second one prevents "timing and volume replays" because volume becomes something not under the control of the attacker.

Which is why I think configuring the replay cache to limit on a "hybrid" threshold (time + introductions) as described in comment:11 will not interfere with the issues and concerns described above. It's just about choosing the right variable min and max values so that introduction points are not rotated too fast but also cannot send unlimited replays (introductions) during their time-based lifetime. A "hybrid" limitation as described will simply enhance the current behavior instead of radically changing it.

Last edited 5 weeks ago by s7r (previous) (diff)

comment:36 in reply to:  33 ; Changed 4 weeks ago by asn

Replying to arma:

Replying to asn:

We actually had not heard that replay caches are there to protect against traffic analysis attacks. How does the attack work? I considered that identical INTRO2 cells could be used as a signal to the HS guard, but since they are end-to-end encrypted the singal should not be visible, right?

The encryption protects the payload, but not the communications metadata (timing and volume).

Thanks for the explanation. The attacks indeed make sense.

I worry about two impacts from replays by the intro point:

  • Capture an intro2 cell and later play it repeatedly, to create a pattern at the onion service's guard, at a time of our choosing. The replay cache at the onion service doesn't completely resolve this concern, because the intro point gets to send the cells before the onion service can realize they're replays. But if Mike succeeds at removing every side channel from the world, then the replayed intro2 cells are "legitimate" (i.e. expected and correctly formed) cells so without a replay cache there is no way to realize that they're not wanted.

I think this side-channel attack is kinda interesting, and it gives even more reason to change the current design since (as s7r mentioned) it's currently easy to make onion services rotate their introduction points, and hence place the attacker in the right position to carry out this attack.

Also, introduction points will always be in position to carry out this attack since they can send arbitrary cells to the end of the circuit (the HS). If those cells are not expected, the service will drop them (and issue a log warning) but the attack will still be carried out since the signal will reach the guard.

IMO, this attack is a reason to increase the lifetime of the introduction points, but not a reason to drop this patch.

  • Capture an intro2 cell and later play it repeatedly to create a pattern at the rendezvous point. This one is directly resolved by a replay cache at the onion service side. The impact is a bit subtle/indirect, but it would for example allow attacks where later you discover which rendezvous point a given introduction attempt used.

This is indeed an attack that becomes possible if this patch gets merged, and creates a tradeoff here that is worth discussing.

I feel like an adversary that would end up launching this attack is dealing with a super advanced (almost artificial) scenario: The adversary does not know the identity of the onion, but still they capture INTRO2 cells and then replay them to learn the rendezvous point. Now let's assume that they can instantly pair an introduction with a rendezvous point, what's next? Maybe if they later learn the actual identity of the onion service, they can learn that a given rendezvous circuit was destined to that onion? And what's next? Maybe they can do traffic correlation to identify the identity of the onion or of the client? But this ends up being a super strong attacker suddenly, and would probably have other ways to achieve the same goal.

Also, the patch of this ticket won't allow the introduction point to generate arbitrary signals to the rendezvous point, since the replay cache needs to be reset before a replay is possible. And a replay cache can only be reset by legitimate (non-replay) traffic. So this attack assumes a popular onion service, and the attacker can only replay each cell once, so they can only create a signal of one rendezvous circuit per cache reset (except if they also replay other intro2 cells, but those will be going to other rendezvous points).

So I can see how this attack can work, but it still seems pretty remote, compared to the one that is currently possible (attacker forces HS to use an evil intro, and then does the above attack, or collects statistics, or whatever), and hence I still feel like the patch in this ticket is superior. As always, it's a tradeoff.


Replying to s7r:

Which is why I think configuring the replay cache to limit on a "hybrid" threshold (time + introductions) as described in comment:11 will not interfere with the issues and concerns described above. It's just about choosing the right variable min and max values so that introduction points are not rotated too fast but also cannot send unlimited replays (introductions) during their time-based lifetime. A "hybrid" limitation as described will simply enhance the current behavior instead of radically changing it.

Hm. I still don't see how the hybrid construction can help here: IMO the future ideal scenario is that introduction points will last for ever (kinda like guards) to resist Sybil atacks, and hence adding any rotation parameter apart from time will make rotation happen faster which is no good.

In particular, adding 'introductions' as a rotation parameter, opens us up to the attack of this ticket, where an adversary can force your intros to rotate because of too many introductions. The only reason that the design from comment:11 works out (in paper) is because the intro will rotate after 28 million introductions which is a huge number and will never happen (at least in theory, and if it happens it's bad). The problem with that is that a replay cache that holds 28million elements will be around 1.8 GB of memory (according to my over-the-envelope computations in https://github.com/torproject/tor/pull/1163/commits/6ef1ac5eed85d7cf3cafa1797dc1003912d1a63c) which doesn't really work out in practice....

It is the case that we might want to make the replay cache of this patch even bigger since it can currently hold 150k-300k elements for a memory overhead of 8.4MB to 16.8MB. Do you think we can afford to make them bigger? Perhaps double them or triple them or even bigger? An onion service should have about 6 of these right now (and it will become 3 when we do #22893).

comment:37 Changed 4 weeks ago by zmer

INTRO cells used to have a timestamp that could be used to discard old ones being replayed. All things considered, the reasons given for removing it didn't make good sense. Arguably it should be brought back. And perhaps in future clients could be made to keep their own accurate time, independent of what the host clock is set to. There would be a number of benefits. But that is another ticket.

comment:38 in reply to:  36 Changed 4 weeks ago by s7r

Replying to s7r:
Hm. I still don't see how the hybrid construction can help here: IMO the future ideal scenario is that introduction points will last for ever (kinda like guards) to resist Sybil atacks, and hence adding any rotation parameter apart from time will make rotation happen faster which is no good.

In particular, adding 'introductions' as a rotation parameter, opens us up to the attack of this ticket, where an adversary can force your intros to rotate because of too many introductions. The only reason that the design from comment:11 works out (in paper) is because the intro will rotate after 28 million introductions which is a huge number and will never happen (at least in theory, and if it happens it's bad). The problem with that is that a replay cache that holds 28million elements will be around 1.8 GB of memory (according to my over-the-envelope computations in https://github.com/torproject/tor/pull/1163/commits/6ef1ac5eed85d7cf3cafa1797dc1003912d1a63c) which doesn't really work out in practice....

It is the case that we might want to make the replay cache of this patch even bigger since it can currently hold 150k-300k elements for a memory overhead of 8.4MB to 16.8MB. Do you think we can afford to make them bigger? Perhaps double them or triple them or even bigger? An onion service should have about 6 of these right now (and it will become 3 when we do #22893).

Of course, 1.8 GB memory for replay cache is unacceptable. But the way I thought of it it was never meant to hold 28 million elements. It should hold the same number of elements like it does now, but instead keep one more value in memory (ALLOW_RESET_CACHE) and choose a number for how many times we are willing to allow. When it stats, it starts with value 1 and after rand(16384, 16384*2) it gets cleared and reset, and number incremented to +1 = 2. When we hit the max. number of ALLOW_RESET_CACHE, we rotate the introduction point. So maybe I'm missing something but the memory requirement should not be much bigger as opposite to what we have now since we need to only keep in memory two things more: how many times we already reset the cache and for how many times are we willing to reset it. Again, this is if course not perfect but I think it's better.

Also, I don't think having the lifetime of introduction points as long as possible (based on time) like Guards have is a good idea, because simply introduction points are nothing like the Guards. While the Guard is (theoretically) know ONLY to the server (the Tor daemon hosting the onion service in this case), the introduction points are known to every client/visitor trying to connect to the onion service. From my point of view here we should assume that every client (visitor) of an onion service is a potential attacker. Even the ones that use authentication, are still under this threat (what if a genuine client gets squeezed with the door to handle the onion auth credentials, etc). So the longer you hold a connection to an introduction point the more time you give to an attacker to pull a guard discovery attack.

Introduction points vs vanguards:

  • guard (first hop, layer 1 guard - not know to the onion service visitor)
  • 2nd hop (layer 2 guard - not known to the onion service visitor)
  • 3nd hop (layer 3 guard - not know to the onion service visitor and even for these we thought about making the 3rd hop always random because it's just before the rendezvous point which is selected by the client [potential attacker]. Even if in vanguards we decided 3rd hop static but for shortest period of time this node is still at least theoretically not directly known to the onion service visitor).

IMO introduction points don't share the same property as the Guards and keeping them static only based on time for longer period of time while open the door for easier attacks. In vanguards I believe we already keep the circuit path to the introduction point static for some time, but not the introduction point itself.

It's worth to keep the nodes we use static (like Guard, maybe even layer 2 and 3 guards) because they are theoretically not known by the remote connecting client and we don't want to face a Sybil attack to choose a hostile node under these positions so that they become from theoretically not known to known. But introduction points are directly known and advertised in the descriptor.

Last edited 4 weeks ago by s7r (previous) (diff)

comment:39 Changed 4 weeks ago by nickm

Keywords: 042-should removed
Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final

I believe we plan to continue discussing this and aim to reach a decision for 0.4.3; please correct me if that's wrong.

comment:40 in reply to:  39 Changed 4 weeks ago by asn

Replying to nickm:

I believe we plan to continue discussing this and aim to reach a decision for 0.4.3; please correct me if that's wrong.

During the meeting we said that if we keep the same design as the branch above, we can still get it in 042. It's just that we lack the decision procedure on whether the branch above is better than the status quo.

Last edited 4 weeks ago by asn (previous) (diff)

comment:41 Changed 4 weeks ago by nickm

Milestone: Tor: 0.4.3.x-finalTor: 0.4.2.x-final

comment:42 Changed 13 days ago by dgoulet

After reviewing this thread, I personally feel like the trade off here in favor of this patch is OK.

I'm not too worried about the INTRO2 cell being used as a side channel for the service Guard. We allow many more cells to do that, as in any other HS cell not meant for an origin circuit for instance will be dropped silently with that log info:

log_info(LD_PROTOCOL, "Dropping cell (type %d) for wrong circuit type.", command);

The part that worries me more is the "make the service interact with the tor network" as in opening RP circuits. But this will be for N interactions where N is quite low since it can only be done when the replay cache is reset which is drastically more with this patch.

My two cents: All in all, less IP rotation is a better compromise overall than what we allow with regards to INTRO2 cell replay.

comment:43 Changed 12 days ago by asn

Hello team,

how should we proceed here? I hear two explicit "yes" from me and dgoulet, a few valid concerns and discussion, and zero "no" so far. In this case, should we proceed with merging this?

I can also make tickets for various future improvements here like:

  • Investigate bloom filters for replay cache to increase them to much bigger sizes.
  • Investigate reinstalling a timestamp of some sorts to the INTRO cell to avoid replays?

comment:44 Changed 4 days ago by nickm

I think I have to lean "no" on this for 0.4.2 right now; it removes one security feature to add another, and I am worried about the implications. I'm also worried about increasing the memory load for services so much: it seems prohibitive for a service that is running on (say) a cheap android device, yeah?

On the alternatives:

  • Bloom filters have an accuracy/storage tradeoff, so if we use one, we still need to be prepared to either get false positives, or replace the filter periodically. It's still more space-efficient than the hash map though.
  • Timestamps are really scary to me; they leak information about the client's view of the time, which can be correlated to the time it sends in other places.

Here in another alternative that we could do:

  • Allow replay caches to grow without bounds; when we approach MaxMemInQueues, evict a random subset of the cache and/or close the circuit.

For 0.4.2, I'd be fine with increasing the limit of cells per introduction circuit, and doing a better solution for 0.4.3.

Roger suggests on IRC that this is complicated enough that we should write something up describing the goals, tradeoffs, and design rationale. That sounds like a proposal to me. :/

Note: See TracTickets for help on using tickets.