Opened 2 years ago

Closed 2 years ago

Last modified 16 months ago

#16381 closed defect (fixed)

Bad timestamp check when storing an HS descriptor on the client

Reported by: dgoulet Owned by: dgoulet
Priority: High Milestone: Tor: 0.2.7.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-hs, TorCore201508, PostFreeze027, 2016-bug-retrospective
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor: SponsorR

Description

Fortunately, this bug "should" happen rarely in the real Tor network but still could lead to reachability issues for high traffic hidden service.

Let's assume that all introduction points (IP) of an hidden service need to be changed (intro2 max count or expiring) in a single RendPostPeriod (default is 1 hour), we do the intro dance with our newly picked introduction points, close the old ones and upload the new descriptor. The usual.

Now, let's say we have a client that did connect to the service before in the same hour so the client has the descriptor in its cache. Then, the client tries to connect again, the IPs located in the cached descriptor are used and the client will eventually get a NACK from all of them since the service just cycled them all. This will eventually trigger a refetch of the descriptor since we have no more IP that we can use. So far so good.

Here is when it goes wrong, when receiving the new descriptor just published by the HS, we do a series of check in rend_cache_store_v2_desc_as_client() which all passes correctly in our case except this one:

  e = (rend_cache_entry_t*) strmap_get_lc(rend_cache, key);
  if (e && e->parsed->timestamp >= parsed->timestamp) {
    log_info(LD_REND, "We already have a new enough service descriptor for "
                      "service ID %s with the same desc ID and version.",
             safe_str_client(service_id));
    goto okay;
  }

The HS protocol speficies that the timestamp in the descriptor MUST be rounded down to the nearest hour. For instance, if it's 16h54, it will be set to 16h00. See rend_service_update_descriptor():

  d->timestamp = time(NULL);
  d->timestamp -= d->timestamp % 3600; /* Round down to nearest hour */

The side effect of this is that if an HS uploads a new descriptor with a new set of IPs, they won't be usable until the next hour because the tor client doesn't keep the new descriptor even though the intro points did actually change. Thus this means that anything that changes in a time period of one hour will not be seen by the client if the descriptor was already cached before.

Keep in mind that a client reuses a rendez-vous circuit but stops after MaxCircuitDirtiness (default: 10 minutes) so after 10 minutes it will redo the introduction and rendezvous circuits.

(I was able to trigger this in a chutney network by setting a lifetime of 30 seconds for introduction point which creates a lot of rotation.)

I think the fix here should look like this (fun fact, it's what rend_cache_store_v2_desc_as_dir() does):

- if (e && e->parsed->timestamp >= parsed->timestamp) {
+ if (e && e->parsed->timestamp > parsed->timestamp) {

Child Tickets

TicketStatusOwnerSummaryComponent
#16389closedRedesign the HS client descriptor cacheCore Tor/Tor

Change History (22)

comment:1 Changed 2 years ago by arma

Exciting. I remember writing that = not too long ago, and I remember I had a reason for it.

comment:2 Changed 2 years ago by arma

Yes, see #14219 for when this bug went in. git show 9407040c5921

I think we can't just change >= to >, because then when we get the same descriptor, we will not trigger the

+    log_info(LD_REND, "We already have a new enough service descriptor for "
                       "service ID %s with the same desc ID and version.",

part.

It looks like we want to actually see if the descriptor is different.

Also, if the timestamps really are only at the hour granularity, I think we can no longer use them for time ordering of hs descs. Now might be a great time to brainstorm totally different approaches, like putting a counter in the hs desc.

comment:3 Changed 2 years ago by arma

Priority: normalmajor

comment:4 Changed 2 years ago by arma

Another option is to redesign the logic to be cleaner, and keep an intro point failure cache. It seems like every time we encounter this "we keep state about the previous hs desc by keeping a copy of it, and then seeing if the new one is different, get it?" logic, somebody finds it confusing. Maybe now is a good time for it to die? :)

comment:5 in reply to:  2 Changed 2 years ago by arma

Replying to arma:

Yes, see #14219 for when this bug went in. git show 9407040c5921

Regression introduced in Tor 0.2.6.3-alpha I think.

comment:6 Changed 2 years ago by dgoulet

FYI, this is another way to reproduce it which makes this bug much more serious (thanks arma to set it to major)

  1. Start HS
  2. Connect to it with client
  3. Restart HS
  4. Client is unable to connect until the next hour.

Thanks to special for pointing it out:

16:24 < special> so is this reproducible with: 1) start HS; 2) access from a tor client; 3) restart HS; 4) access from the same client

comment:7 Changed 2 years ago by arma

To make it clearer, the reason we can't just strcmp the old desc with the new one is that each hidden service actually makes *two* descriptors per publish -- one per replica. That's exactly what bug #14219 was about.

comment:8 in reply to:  4 Changed 2 years ago by special

Replying to arma:

Another option is to redesign the logic to be cleaner, and keep an intro point failure cache.

+1 to this approach. This is essentially the same as how we handle descriptors via last_hid_serv_requests.

comment:9 Changed 2 years ago by arma

I think we should try to do a straight revert of 9407040c5921 in 0.2.6.x (the old behavior wasn't so bad), and then think about fixing this more thoroughly after that.

comment:10 in reply to:  9 Changed 2 years ago by dgoulet

Status: newneeds_review

Replying to arma:

I think we should try to do a straight revert of 9407040c5921 in 0.2.6.x (the old behavior wasn't so bad), and then think about fixing this more thoroughly after that.

I agree here, this regression is worst so let's go back to the original behavior that at least allows the client to connect and figure out a better solution in the short term (#16389).

Here is the revert branch: bug16381_026_01-revert

comment:11 Changed 2 years ago by nickm

Okay, merged to 0.2.6 and forward. Please change status as appropriate?

comment:12 Changed 2 years ago by dgoulet

Status: needs_reviewnew

Ok thanks nickm! Switching back to "new" so we can actually address this bug now by resolving the child ticket.

comment:13 Changed 2 years ago by Alan

2.6.9 (the most recent release) contains the regression. Will the reverted behavior be released as 2.6.10, or will a fix for the regression not appear until the new code is released in 2.7?

comment:14 in reply to:  13 ; Changed 2 years ago by dgoulet

Replying to Alan:

2.6.9 (the most recent release) contains the regression. Will the reverted behavior be released as 2.6.10, or will a fix for the regression not appear until the new code is released in 2.7?

Yes, it's already in the 0.2.6.x git branch so next 0.2.6 release, it will have it.

See commit 8acf5255c20c667f32313ee672c85f6ae00a4f87

comment:15 in reply to:  14 ; Changed 2 years ago by Alan

it's already in the 0.2.6.x git branch so next 0.2.6 release, it will have it.

That part I understood from the previous comments. My question was, will there be another 0.2.6.x release? Is there an anticipated timeframe?

Last edited 2 years ago by Alan (previous) (diff)

comment:16 in reply to:  15 ; Changed 2 years ago by nickm

Replying to Alan:

it's already in the 0.2.6.x git branch so next 0.2.6 release, it will have it.

That part I understood from the previous comments. My question was, will there be another 0.2.6.x release?

Yes.

Is there an anticipated timeframe?

Within the next week or so, I think (but that's an estimate, not a promise).

comment:17 in reply to:  16 Changed 2 years ago by Alan

Thanks

comment:18 Changed 2 years ago by dgoulet

Keywords: TorCore201508 added
Owner: set to dgoulet
Status: newaccepted

We expect the child ticket to get merged in 027 so setting this one for august and 027 final.

comment:19 Changed 2 years ago by nickm

Keywords: PostFreeze027 added

I'd merge patches for these for 0.2.7 if they come in on time. In some cases, that will require figuring out an as-yet-unsolved bugs.

comment:20 Changed 2 years ago by dgoulet

Resolution: fixed
Status: acceptedclosed

Bad commit has been reverted and child ticket resolved. Closing.

comment:21 Changed 23 months ago by dgoulet

Keywords: SponsorR removed
Sponsor: SponsorR

comment:22 Changed 16 months ago by nickm

Keywords: 2016-bug-retrospective added

Mark more tickets for bug retrospective based on hand-review of changelogs from 0.2.5 onwards.

Note: See TracTickets for help on using tickets.