Opened 3 years ago

Last modified 19 months ago

#18620 needs_revision enhancement

HSFORGET command to clear cached client state for a HS

Reported by: str4d Owned by: str4d
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.7.6
Severity: Normal Keywords: tor-hs, tor-control
Cc: special Actual Points:
Parent ID: Points: 1
Reviewer: asn, special Sponsor:

Description

This is a patch used by the Android app Briar (since October 2014) that we would like to upstream. It adds an HSFORGET command to the control protocol which clears any cached client state relating to a specified hidden service. This can be used to flush state that's likely to be stale before trying to connect to a hidden service via an unstable network connection (such as a mobile data connection).

Child Tickets

Attachments (4)

tor.patch (5.6 KB) - added by str4d 3 years ago.
Patch to implement HSFORGET command in Tor 0.2.7.6
control-spec.diff (897 bytes) - added by str4d 3 years ago.
Patch to add HSFORGET command to control-spec.txt
tor.2.patch (16.6 KB) - added by str4d 3 years ago.
Patch to implement HSFORGET command in Tor 0.2.7.6
control-spec.2.diff (1.1 KB) - added by str4d 3 years ago.
Patch to add HSFORGET command to control-spec.txt

Download all attachments as: .zip

Change History (41)

comment:1 Changed 3 years ago by nickm

Milestone: Tor: 0.2.9.x-final
Status: newneeds_review

comment:2 Changed 3 years ago by cypherpunks

The explicit casting of the value returned by strmap_remove_lc is unnecessary because it returns a pointer to a void which supports implicit casting.

comment:3 Changed 3 years ago by nickm

Keywords: TorCoreTeam201604 added

Every postponed needs_review ticket should get a review in April

comment:4 Changed 3 years ago by dgoulet

Keywords: tor-hs added; TorCoreTeam201604 removed
Points: small
Sponsor: SponsorR-can
Status: needs_reviewneeds_revision

Thanks for this! That sounds useful and also until we have a way to reliably detect network changes in tor, I can see a use for that (well clearly you have one :).

(This also need a control-spec.txt change)

comment:5 Changed 3 years ago by nickm

Reviewer: dgoulet

Setting dgoulet as reviewer; please let me know if you don't want to review.

Another suggestion: if possible, it would be good to have tests for the new rendcache and rendclient functions.

Changed 3 years ago by str4d

Attachment: tor.patch added

Patch to implement HSFORGET command in Tor 0.2.7.6

Changed 3 years ago by str4d

Attachment: control-spec.diff added

Patch to add HSFORGET command to control-spec.txt

comment:6 Changed 3 years ago by dgoulet

Status: needs_revisionneeds_review

comment:7 Changed 3 years ago by isabela

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

tickets market to be removed from milestone 029

Changed 3 years ago by str4d

Attachment: tor.2.patch added

Patch to implement HSFORGET command in Tor 0.2.7.6

Changed 3 years ago by str4d

Attachment: control-spec.2.diff added

Patch to add HSFORGET command to control-spec.txt

comment:8 Changed 3 years ago by str4d

New patches uploaded that add descriptor-cookie support (as part of fixing a bug in the previous patch), as well as tests for the new rendcache and rendclient functions per comment:5.

The rendcache test currently fails, but not due to the new code - the descriptor fails to parse during test setup. I copied the descriptor from src/test/test_hs.c so it should have been fine... I am also unsure how duplicate test data should be handled in the Tor testing framework. Pointers welcome :)

comment:9 Changed 3 years ago by nickm

Keywords: 029-proposed added

comment:10 Changed 3 years ago by nickm

Keywords: 029-accepted added; 029-proposed removed
Milestone: Tor: 0.2.???Tor: 0.2.9.x-final

These have code, so I am okay with reviewing them for this release. (no objections at today's meeting)

comment:11 Changed 3 years ago by nickm

Owner: set to str4d
Status: needs_reviewassigned

reassigning in order to set the "owner" field.

comment:12 Changed 3 years ago by nickm

Status: assignedneeds_review

reassigning in order to set the "owner" field.

comment:13 Changed 3 years ago by nickm

Keywords: review-group-1 added

comment:14 Changed 3 years ago by asn

Reviewer: dgouletasn

comment:15 Changed 3 years ago by asn

Status: needs_reviewneeds_revision

Hello and sorry for the slow review! This seems like a great feature for mobile devices, and the patch seems to be almost there. The main issue is the unittest not passing.

Here are some comments:

  • Your tests don't work because your descriptor string was prepared for the control port tests. If you remove the carriage returns "\r" characters from the end of each line then the parsing should proceed.

However, that's not enough for it to go through rend_cache_store_v2_desc_as_client(). You also need to fix up the desc id which currently is not correct, and you also need a dynamic descriptor timestamp to bypass the age check. I had to uncomment the following checks to complete the test (which passes fine btw):

  /* if (parsed->timestamp < now - REND_CACHE_MAX_AGE-REND_CACHE_MAX_SKEW) { */
  /*   printf( "Service descriptor with service ID %s is too old.", */
  /*            safe_str_client(service_id)); */
  /*   goto err; */
  /* } */
+  /* if (tor_memneq(desc_id, want_desc_id, DIGEST_LEN)) { */
+  /*   printf( "Received service descriptor for %s with incorrect " */
+  /*           "descriptor ID (%s).", service_id, want_desc_id); */
+  /*   goto err; */
+  /* } */


To debug this, I turned all the log_warn()s in the parsing functions to printfs, and then I could see the offending error messages when I ran the tests.

  • I think there is no need to add the base64 padding yourself in descriptor_cookie_base64ext(). Instead you can use base64_decode_nopad(). I think this should work, but there were no tests about descriptor cookies so I'm not 100% sure.
  • Also, in rend_cache_remove_entry() there is no need to handle v0 descriptors. These have been deprecated, feel free to kill that code.
  • The patch does not compile with ./configure --enable-gcc-warnings because you have a few unused function arguments in the tests. You can solve this by doing (void) arg;.
  • Also, it would be great if you could provide a git branch based on the current master, instead of a standalone patch because I had trouble merging it. Feel free to upload it ot any repo (github, gitlab) or even just post a detached git-bundle. If that's too hard, no worries.
  • Finally, I noticed that in rend_client_note_connection_attempt_ended() we do:
      if (cache_entry != NULL) {
        SMARTLIST_FOREACH(cache_entry->parsed->intro_nodes,
                          rend_intro_point_t *, ip,
                          ip->timed_out = 0; );
      }
    
    Do you think that would be useful for this patch?

comment:16 Changed 3 years ago by str4d

Thanks for the review asn! Apologies for the issues getting the patch to work - it was based on Tor 0.2.7.6 (which is the version Briar is currently using). I have ported the patch onto master (with associated changes per your review), and uploaded it here:

https://github.com/str4d/tor/tree/18620-hsforget

Do you think that would be useful for this patch?

I don't believe so, because the new rend_cache_remove_entry() function calls rend_cache_entry_free(cache_entry), which calls rend_service_descriptor_free(cache_entry->parsed), which contains:

  if (desc->intro_nodes) {
    SMARTLIST_FOREACH(desc->intro_nodes, rend_intro_point_t *, intro,
      rend_intro_point_free(intro););
    smartlist_free(desc->intro_nodes);
  }

Thus the rend_intro_point_t objects are never reused.

comment:17 Changed 3 years ago by asn

Status: needs_revisionneeds_review

Putting this back in needs_review :)

comment:18 Changed 3 years ago by special

Cc: special added
Reviewer: asnasn, special

comment:19 Changed 3 years ago by arma

Taking a step back: is this design the right one to encourage client applications to use? Basically you are wanting to disable much of the client-side onion caching logic.

Is there a better design, like noticing when your network connection has been broken, and flushing all the client-side state right then, and otherwise letting Tor do its thing?

Or better, we could improve the client-side caching logic to be more robust to whatever network behavior you're seeing? It is silly for each client application to have to do this logic itself.

comment:20 Changed 3 years ago by special

Status: needs_reviewneeds_revision

I agree with arma. Any client application that needs this function is actually working around a bug in tor. We should make tor smarter to solve whatever issues this avoids.

That said, I'm not opposed to having this function if people have other uses for it. I could imagine this being useful for testing and scripts.

I have some control-spec comments:

We encourage applications to use this by mentioning unstable internet connections as a use case. I would rather not.

I don't think the DescCookie parameter is useful. There is no case as a client where we can use a descriptor that requires authorization without the cookie being configured (such as via HidServAuth). Instead of having a cookie parameter, you could look up the cookie using rend_client_lookup_service_authorization.

There are also a few code issues, some of which are obsolete after my suggestion above, but I'll mention them anyway:

 +  char descriptor_cookie_base64ext[REND_DESC_COOKIE_LEN_BASE64 + 2 + 1]; 
...
 +    descriptor_cookie_decoded = tor_malloc(REND_DESC_COOKIE_LEN + 2); 
...
 +    /* Add trailing zero bytes (AA) to make base64-decoding happy. */ 

You could replace all of this logic with the recently-created rend_auth_decode_cookie function.

 +    if (base64_decode(descriptor_cookie_decoded,
 +                      sizeof(descriptor_cookie_decoded), 

descriptor_cookie_decoded is char *, not a char array, so sizeof is incorrect. This can never succeed.

Also, you don't check the decoded length returned by base64_decode and descriptor_cookie_decoded was allocated with tor_malloc instead of tor_malloc_zero, so you could have ended up reading uninitialized memory later.

 +  tor_free(onion_address);
 +  if (descriptor_cookie_base64) {
 +    tor_free(descriptor_cookie_base64);
 +    tor_free(descriptor_cookie_decoded);
 +  } 

This block of code is repeated for the success and err cases. You could avoid that by having an int result to return and falling through the err: case after calling send_control_done.

 +/** Remove any cached descriptors for <b>service_id</b>. */
+void
+rend_cache_remove_entry(const char *service_id) 

This name (and documentation) is misleading. This function only removes from the client cache, not the service or directory caches. I would rename it to mention clients.

make check-spaces has a few other comments for you.

comment:21 Changed 3 years ago by isabela

Points: small1

comment:22 Changed 3 years ago by nickm

Keywords: review-group-2 added; review-group-1 removed

Everything in review-group-1 has had at least 1 review.

comment:23 in reply to:  description Changed 3 years ago by arma

Replying to str4d:

This can be used to flush state that's likely to be stale before trying to connect to a hidden service via an unstable network connection (such as a mobile data connection).

Can we get a more concrete case here, for exactly what behavior goes wrong? "What you do, what you expect, what happens instead."

Then we can try to reconstruct what was happening on each Tor side, and think about if there's something smarter Tor could do instead.

comment:24 Changed 2 years ago by nickm

Keywords: review-group-3 added; review-group-2 removed

These have all had at least one review during review-group-2.

comment:25 Changed 2 years ago by timonh

I think this is connected to #19522. If a HS changes it's ip address and choses new intro point because of #19522 then a client with a cached descriptor will retry the old intro points before fetching the descriptor again. The old intro points won't notice that their circuits to the HS are gone until the entry guard experiences a TCP timeout and sends a destroy cell. On Linux this might take up to 30 minutes.
So an application detecting that the connection to a HS broke (e.g. using ACKs) might use HSFORGET to avoid the described case.
If #19522 gets fixed and a HS therefore sticks to it's intro points then this might not be necessary anymore.

comment:26 in reply to:  19 Changed 2 years ago by akwizgran

Replying to arma:

Taking a step back: is this design the right one to encourage client applications to use? Basically you are wanting to disable much of the client-side onion caching logic.

Is there a better design, like noticing when your network connection has been broken, and flushing all the client-side state right then, and otherwise letting Tor do its thing?

Or better, we could improve the client-side caching logic to be more robust to whatever network behavior you're seeing? It is silly for each client application to have to do this logic itself.

Replying to arma

Can we get a more concrete case here, for exactly what behavior goes wrong? "What you do, what you expect, what happens instead."

Then we can try to reconstruct what was happening on each Tor side, and think about if there's something smarter Tor could do instead.

Let me give you a bit more information about what we're trying to achieve.

The scenario is a client connecting to a hidden service that's running on a mobile device. (The client may also be running on a mobile device, but that's not relevant.) The service may frequently lose network connectivity or switch between networks, and each time it does so, it picks new introduction points and publishes a new service descriptor.

If the client has a cached descriptor for the service, it's likely to be stale, and any attempt to connect using the stale descriptor will fail. Eventually Tor will discard the failing descriptor and the client's next connection attempt will fetch a fresh descriptor.

We're trying to skip the initial connection attempt using the cached descriptor because it's unlikely to succeed, and waiting for it to fail prevents us from connecting quickly. There are a few ways we could do that:

  • Client tells Tor not to use any cached descriptor that may already exist
  • Client tells Tor not to cache the descriptor after fetching it
  • Service indicates in its descriptor that the descriptor should not be cached

Of these possibilities, the first one seems to be the easiest to deploy, as it requires minimal changes to the client-side code and no changes to the descriptor format.

comment:27 in reply to:  25 ; Changed 2 years ago by akwizgran

Replying to timonh:

I think this is connected to #19522. If a HS changes it's ip address and choses new intro point because of #19522 then a client with a cached descriptor will retry the old intro points before fetching the descriptor again.

Thanks for pointing out the related issue - you're right that this is related to clients caching descriptors for services with unstable connectivity.

I'm not sure whether a solution to #19522 would necessarily fix this issue. We need to consider not just situations where the service's network interface is down, but also situations where one interface goes down and another comes up (for example, switching from mobile data to wifi). In that case the service can create a new set of introduction point circuits and publish a new descriptor via the new network interface while the old interface remains down, so returning to the old circuits when the old interface comes back up would not solve the problem.

comment:28 in reply to:  27 ; Changed 2 years ago by timonh

Replying to akwizgran:

I'm not sure whether a solution to #19522 would necessarily fix this issue. We need to consider not just situations where the service's network interface is down, but also situations where one interface goes down and another comes up (for example, switching from mobile data to wifi). In that case the service can create a new set of introduction point circuits and publish a new descriptor via the new network interface while the old interface remains down, so returning to the old circuits when the old interface comes back up would not solve the problem.

I discovered #19522 with Tor on Android switching from wifi to mobile data. See #16387#comment:7.
I noticed that Android kills all open connections when the network interface gets switched.
Without #19522 the descriptor won't change because Tor would reconnect to the intro points.
On which platform did you notice the behavior you describe?

comment:29 in reply to:  28 ; Changed 2 years ago by akwizgran

Replying to timonh:

Replying to akwizgran:

I'm not sure whether a solution to #19522 would necessarily fix this issue. We need to consider not just situations where the service's network interface is down, but also situations where one interface goes down and another comes up (for example, switching from mobile data to wifi). In that case the service can create a new set of introduction point circuits and publish a new descriptor via the new network interface while the old interface remains down, so returning to the old circuits when the old interface comes back up would not solve the problem.

I discovered #19522 with Tor on Android switching from wifi to mobile data. See #16387#comment:7.
I noticed that Android kills all open connections when the network interface gets switched.
Without #19522 the descriptor won't change because Tor would reconnect to the intro points.
On which platform did you notice the behavior you describe?

I'm also talking about Android. I think this ticket and #19522 both relate to the same underlying problem, i.e. the service loses its intro circuits when the network interface goes down and chooses new intro points when Tor reconnects. But the discussion on #19522 seems to assume that there's a single network interface, so we can wait for that interface to come back up and then build new circuits to the old intro points. I'm not sure that's a correct description of the situation on Android.

When an Android device switches between mobile data and wifi, it's my understanding that one network interface is taken down and another is brought up. I don't know if Tor can detect these changes in a cross-platform way, or whether it can tell that the old and new network interfaces are somehow equivalent, such that intro points from the old interface should be reused with the new interface. That's why I said above that I'm not sure whether a solution to #19522 would resolve this ticket - it would really depend on how the solution was implemented.

If we can find a solution to #19522 that builds new circuits to the old intro points via the new network interface then this ticket will be redundant. But if we don't have a clear path to achieving such a solution then I'd prefer to continue with this ticket, as we know it solves the problem for our use case.

comment:30 in reply to:  29 Changed 2 years ago by asn

Replying to akwizgran:

Replying to timonh:

Replying to akwizgran:

I'm not sure whether a solution to #19522 would necessarily fix this issue. We need to consider not just situations where the service's network interface is down, but also situations where one interface goes down and another comes up (for example, switching from mobile data to wifi). In that case the service can create a new set of introduction point circuits and publish a new descriptor via the new network interface while the old interface remains down, so returning to the old circuits when the old interface comes back up would not solve the problem.

I discovered #19522 with Tor on Android switching from wifi to mobile data. See #16387#comment:7.
I noticed that Android kills all open connections when the network interface gets switched.
Without #19522 the descriptor won't change because Tor would reconnect to the intro points.
On which platform did you notice the behavior you describe?

I'm also talking about Android. I think this ticket and #19522 both relate to the same underlying problem, i.e. the service loses its intro circuits when the network interface goes down and chooses new intro points when Tor reconnects. But the discussion on #19522 seems to assume that there's a single network interface, so we can wait for that interface to come back up and then build new circuits to the old intro points. I'm not sure that's a correct description of the situation on Android.

When an Android device switches between mobile data and wifi, it's my understanding that one network interface is taken down and another is brought up. I don't know if Tor can detect these changes in a cross-platform way, or whether it can tell that the old and new network interfaces are somehow equivalent, such that intro points from the old interface should be reused with the new interface. That's why I said above that I'm not sure whether a solution to #19522 would resolve this ticket - it would really depend on how the solution was implemented.

If we can find a solution to #19522 that builds new circuits to the old intro points via the new network interface then this ticket will be redundant. But if we don't have a clear path to achieving such a solution then I'd prefer to continue with this ticket, as we know it solves the problem for our use case.

The patch in #19522 does not care about network interfaces. It's just more stubborn in sticking with old intro points when there are local network issues. So it will wait till the network is back up (in whichever interface), and then will try to reestablish the same intro points.

The patch in that ticket could benefit from some more testing by people who use HSes on mobile phones, so that we get more confidence that it works well.

comment:31 Changed 2 years ago by nickm

Keywords: nickm-deferred-20160905 added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

Deferring many tickets that are in needs_revision with no progress noted for a while, where I think they could safely wait till 0.3.0 or later.

Please feel free to move these back to 0.2.9 if you finish the revisions soon.

comment:32 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:33 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:34 Changed 19 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:35 Changed 19 months ago by nickm

Keywords: nickm-deferred-20160905 removed

comment:36 Changed 19 months ago by nickm

Keywords: review-group-3 removed

comment:37 Changed 19 months ago by dgoulet

Keywords: tor-control added; 029-accepted removed
Sponsor: SponsorR-can
Note: See TracTickets for help on using tickets.