Opened 4 years ago

Last modified 2 months ago

#16387 new enhancement

Improve reachability of hidden services on mobile phones

Reported by: asn Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, sponsor8-maybe, 034-triage-20180328, 034-removed-20180328
Cc: n8fr8, stephan Actual Points:
Parent ID: Points: 10
Reviewer: Sponsor:

Description

Mobile phones are unstable and their IP changes all the time. Hidden services don't work well on them.

Here are some things that can go wrong when the mobile phone (and hence the HS) loses network or changes its IP address:

  • The circuits to the intro points get broken, the HS establishes new intro points and republishes its descriptor. Its clients are not aware of the new intro points, and keep on trying the old ones. This is #8239 which might be fixed soon.
  • The rendezvous circuits to current clients get broken, and the HS does not reestablish them. Then clients keep on trying the same broken rendezvous point on and on, instead of re-introducing themselves (or fetching a new descriptor entirely). We should verify that this behavior is broken, and think of better ones here.

A related thread can be found on [tor-dev] here:
https://lists.torproject.org/pipermail/tor-dev/2015-May/008841.html

Child Tickets

TicketTypeStatusOwnerSummary
#19522defectneeds_revisionHS intro circuit retry logic fails when network interface is down

Attachments (2)

torlog (1.5 KB) - added by timonh 3 years ago.
reestablish.log
torlog2 (2.7 KB) - added by timonh 3 years ago.

Download all attachments as: .zip

Change History (31)

comment:1 Changed 4 years ago by dgoulet

Keywords: SponsorR tor-hs added

comment:2 Changed 4 years ago by dgoulet

Milestone: Tor: 0.2.7.x-finalTor: 0.2.???
Type: defectenhancement

comment:3 Changed 3 years ago by nickm

Keywords: SponsorR removed
Sponsor: SponsorR

Bulk-replace SponsorR keyword with SponsorR sponsor field in Tor component.

comment:4 Changed 3 years ago by dgoulet

Sponsor: SponsorRSponsorR-can

Move those from SponsorR to SponsorR-can.

comment:5 Changed 3 years ago by timonh

Severity: Normal

I recently analyzed the behavior of Hidden Services (HS) when their IP address changes and identified the following problems.

  1. A client connected to a HS doesn't notice, that the established circuit is broken when the HS changes it's address. This happens because the circuit won't be closed until the entry guard of the HS detects a TCP timeout and sends a destroy cell down the circuit. The problem can be handled by the application itself by using own acknowledgments and closing the circuit when it detects a timeout. The circuit can be closed by using the Tor Control Protocol (use GETINFO stream-status to find the id and CLOSECIRCUIT to close it).
  1. A HS has to notice, that it's own connections are broken after an IP address change so it can reestablish circuits to the introduction points. On linux this lasts until TCP reports a timeout which can take quite long. On Android connections get killed by the OS when an interface changes.
  1. On Android I noticed, that an HS chose new introduction points after an IP change because Tor thought they were down. The issue seems to be addressed here: #8239.
  1. Another related problem is described in: #16966. It may result in a 5 minutes waiting time when a HS reuses it's introduction points after a downtime.

Any other opinions on how these problems can be solved?

I'm currently working on https://github.com/kit-tm/PTP and therefore interested in making Hidden Services work smooth on mobile devices (particularly Android).

comment:6 in reply to:  5 Changed 3 years ago by asn

Replying to timonh:

I recently analyzed the behavior of Hidden Services (HS) when their IP address changes and identified the following problems.

Hello timonh! Thanks for doing this analysis!

I don't know much about mobile networking and programming, but it's definitely something we need to improve ASAP, so any pointers/feedback is welcome!

  1. A client connected to a HS doesn't notice, that the established circuit is broken when the HS changes it's address. This happens because the circuit won't be closed until the entry guard of the HS detects a TCP timeout and sends a destroy cell down the circuit. The problem can be handled by the application itself by using own acknowledgments and closing the circuit when it detects a timeout. The circuit can be closed by using the Tor Control Protocol (use GETINFO stream-status to find the id and CLOSECIRCUIT to close it).
  1. A HS has to notice, that it's own connections are broken after an IP address change so it can reestablish circuits to the introduction points. On linux this lasts until TCP reports a timeout which can take quite long. On Android connections get killed by the OS when an interface changes.

Hm, I feel the two points above are connected. For example, if the client realizes that the rend circuit is broken before the HS reestablishes its intro circuits, then the client will try to introduce herself to a broken intro point. That's no good.

Since the client has to reintroduce herself when a rend circuit dies (right?), it probably makes sense to have the HS reestablish intro circuits as fast as possible.

On this topic, you mentioned that in Android connections get killed when the interface changes; do you think this behavior is something we could use? Or maybe Tor already uses this behavior implicitly, since it will notice the killed connections and try to reestablish its intro circuits? Can we do better here?

  1. On Android I noticed, that an HS chose new introduction points after an IP change because Tor thought they were down. The issue seems to be addressed here: #8239.

This should be fixed now.

  1. Another related problem is described in: #16966. It may result in a 5 minutes waiting time when a HS reuses it's introduction points after a downtime.

Looking at the ticket, this seems like something we will fix as part of "next gen hidden services" (proposal 224). Does this happen frequently enough for you, that we should consider baking it into the current system as well?

Any other opinions on how these problems can be solved?

I'm currently working on https://github.com/kit-tm/PTP and therefore interested in making Hidden Services work smooth on mobile devices (particularly Android).

Interesting project :) Best of luck and keep in touch! You might also want to join our IRC channels on OFTC (#tor-dev).

Changed 3 years ago by timonh

Attachment: torlog added

reestablish.log

comment:7 Changed 3 years ago by timonh

Replying to asn:

Hm, I feel the two points above are connected. For example, if the client realizes that the rend circuit is broken before the HS reestablishes its intro circuits, then the client will try to introduce herself to a broken intro point. That's no good.

Since the client has to reintroduce herself when a rend circuit dies (right?), it probably makes sense to have the HS reestablish intro circuits as fast as possible.

I totally agree with you on that. If the HS reestablishes it's intro circuits fast clients can just reconnect after they noticed a timeout without the need to fetch a new descriptor.

On this topic, you mentioned that in Android connections get killed when the interface changes; do you think this behavior is something we could use? Or maybe Tor already uses this behavior implicitly, since it will notice the killed connections and try to reestablish its intro circuits? Can we do better here?

The behavior of Android is good for us in the sense that we don't have to wait for the long TCP timeout as on Linux. I could confirm, that Tor 0.2.8 notices that the circuits are broken after an IP change and tries to reestablish the intro circuits. But it seems that when I switch from wifi to mobile network Tor tries to reconnect too early when the interface isn't up yet and therefore thinks the intro points aren't reachable anymore. This results in Tor choosing new intro points.
I attached the interesting part of the log. Looking at the log Tor notices that the network is unreachable but draws the wrong conclusion from it (intro point isn't reachable anymore).

Another problem here is that a client, that has an old descriptor of a HS that chose new intro points and published a new descriptor in the meantime will try to reach the old intro points for a long time before trying to fetch the descriptor again. The old intro points don't notice that the circuit to the HS is broken because of the long TCP timeout. Therefore they acknowledge the RELAY_COMMAND_INTRODUCE1 cells of the client.

Looking at the ticket, this seems like something we will fix as part of "next gen hidden services" (proposal 224). Does this happen frequently enough for you, that we should consider baking it into the current system as well?

I haven't noticed the problem during my tests yet. Is there a schedule when proposal 224 will be implemented/released?

comment:8 Changed 3 years ago by timonh

I reran the test switching from wifi to mobile network on android. Looking at the log it seems that Tor retries on the intro points once but then decides to try other ones.
See attachment torlog2.
I'm using Tor 0.2.8 so #8239 should be fixed and Tor should retry each intro point three times.

Looking at the code (rend_consider_services_intro_points() in rendservice.c) the intro points to retry are determined by a call to remove_invalid_intro_points(). Then Tor will try to establish a circuit to them. But after that Tor will try other intro points if there aren't enough yet. So if the retry points fail Tor will choose other ones.
The code to retry an introduction point three times is contained in remove_invalid_intro_points() using MAX_INTRO_POINT_CIRCUIT_RETRIES.
So subsequent calls to remove_invalid_intro_points() will return an introduction point to retry up to three times.
But a single call to rend_consider_services_intro_points() will only retry each introduction point once and then try others.
Is this intended behavior?

Replying to timonh:

Another problem here is that a client, that has an old descriptor of a HS that chose new intro points and published a new descriptor in the meantime will try to reach the old intro points for a long time before trying to fetch the descriptor again. The old intro points don't notice that the circuit to the HS is broken because of the long TCP timeout. Therefore they acknowledge the RELAY_COMMAND_INTRODUCE1 cells of the client.

Regarding this issue the idea came up that an intro point could wait for a INTRODUCE2_ACK from the HS before sending a INTRODUCE_ACK to the client.
Then a client would notice that the HS didn't receive the RELAY_COMMAND_INTRODUCE2 cell using a timeout and wouldn't wait at the rendezvous point for a long time.
I'm not sure which other implications the change would have.
Another possibility would be to close ready rendezvous points earlier using a timeout and than fetch the descriptor again.

Changed 3 years ago by timonh

Attachment: torlog2 added

comment:9 Changed 3 years ago by asn

Cc: n8fr8 added

comment:10 in reply to:  8 Changed 3 years ago by asn

Replying to timonh:

I reran the test switching from wifi to mobile network on android. Looking at the log it seems that Tor retries on the intro points once but then decides to try other ones.
See attachment torlog2.
I'm using Tor 0.2.8 so #8239 should be fixed and Tor should retry each intro point three times.

Looking at the code (rend_consider_services_intro_points() in rendservice.c) the intro points to retry are determined by a call to remove_invalid_intro_points(). Then Tor will try to establish a circuit to them. But after that Tor will try other intro points if there aren't enough yet. So if the retry points fail Tor will choose other ones.
The code to retry an introduction point three times is contained in remove_invalid_intro_points() using MAX_INTRO_POINT_CIRCUIT_RETRIES.
So subsequent calls to remove_invalid_intro_points() will return an introduction point to retry up to three times.
But a single call to rend_consider_services_intro_points() will only retry each introduction point once and then try others.
Is this intended behavior?

Hey timonh! Thanks for helping us track down this bug. I opened a ticket for it at #19522. I also CCed you in case you want to try to fix it, or maybe you want to help us test any patches.

comment:11 Changed 3 years ago by teor

It's worth noting that Tor clients also have some of these issues on mobile, whether accessing hidden services or exits. So general improvements that improve tor's response to broken circuits will also help with hidden services on mobile.

comment:12 in reply to:  11 Changed 3 years ago by timonh

Replying to teor:

It's worth noting that Tor clients also have some of these issues on mobile, whether accessing hidden services or exits. So general improvements that improve tor's response to broken circuits will also help with hidden services on mobile.

To detect broken circuits earlier Tor could use TCP keepalive or use own keepalive messages. If Tor would use the keepalive messages to detect broken connections (and through that circuits) it would be necessary to negotiate the interval (so the other end knows when keepalive messages should arrive and when the connection expired). Right now keepalive messages are only used to keep firewalls from expiring connections and the interval is set by KeepalivePeriod.

I don't know how easy it is to use TCP keepalive in a platform-independent manner. Another question is if this might violate the privacy of a user. If a user uses a different interval than the majority he might stand out.

Also on Linux there is a nice option called TCP_USER_TIMEOUT which allows to set a "maximum amount of time in milliseconds that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection". So this would improve the situation where an IP sends a RELAY_COMMAND_INTRODUCE2 to a HS which isn't reachable anymore. The IP would detect earlier (depending on TCP_USER_TIMEOUT) that the connection is broken.
For idle connections it would still be necessary to use a keepalive mechanism.

comment:13 Changed 3 years ago by stephan

Cc: stephan added

comment:14 Changed 3 years ago by asn

Keywords: TorCoreTeam201608 added

comment:15 Changed 3 years ago by asn

Actual Points: 3

comment:16 Changed 3 years ago by nickm

Keywords: TorCoreTeam201609 added; TorCoreTeam201608 removed

Move unassigned items in August to September.

comment:17 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:18 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:19 Changed 22 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:20 Changed 22 months ago by nickm

Keywords: TorCoreTeam201609 removed

comment:21 Changed 22 months ago by dgoulet

Sponsor: SponsorR-can

comment:22 Changed 21 months ago by nickm

Actual Points: 3
Keywords: sponsor8-maybe added
Points: 10

comment:23 Changed 21 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.2.x-final
Sponsor: Sponsor8-can

comment:24 Changed 19 months ago by dgoulet

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

comment:25 Changed 14 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Deferring various "new"-status enhancement tickets to 0.3.4

comment:26 Changed 12 months ago by nickm

Keywords: 034-triage-20180328 added

comment:27 Changed 12 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:28 Changed 12 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:29 Changed 2 months ago by gaba

Sponsor: Sponsor8-can
Note: See TracTickets for help on using tickets.