Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?

changed milestone to %Tor: unspecified

added component::core tor/tor milestone::Tor: unspecified owner::dgoulet points::3 priority::medium resolution::wontfix severity::normal sponsor::R-can status::closed tor-hs type::enhancement labels

Yes.

First of all, this value is used by clients and hsdirs. Unless I've missed something, this means that a client which accesses a HS, waits two days without restarting, then tries again will reach out to all old intro points before fetching a new descriptor.

I think it's worth looking at this as two separate cases:

There is no value in serving descriptors older than the descriptor-id time period of 24 hours. A client with a clock this inaccurate can't be expected to function. These descriptors are unlikely to be queried and even less likely to be useful. The best implementation is to clean these when the descriptor-id for that service changes, with some fuzz for clock skew.

The current value is too high, considering that RendPostPeriod is usually one hour; a 4-hour-old descriptor on a HSDir is almost certainly useless. But, I don't see a very compelling reason to reduce it below 24 hours either. The value used here is effectively a maximum value for RendPostPeriod.

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.2.6.x-final

The relevant code is

  /* Is descriptor too old? */
  if (parsed->timestamp < now - REND_CACHE_MAX_AGE-REND_CACHE_MAX_SKEW) {
    log_warn(LD_REND, "Service descriptor with service ID %s is too old.",
             safe_str_client(service_id));
    goto err;
  }
  /* Is descriptor too far in the future? */
  if (parsed->timestamp > now + REND_CACHE_MAX_SKEW) {
    log_warn(LD_REND, "Service descriptor with service ID %s is too far in "
                      "the future.", safe_str_client(service_id));
    goto err;
  }

I bet if we set MAX_SKEW to 18 hours, and MAX_AGE to 6 hours, then we'd get a lot of what we want here.

That is, in normal circumstances we'll keep the descriptor for 23 or 24 hours after it's generated (it's not exactly 24 because check out the d->timestamp -= d->timestamp % 3600; /* Round down to nearest hour */ line in rend_service_update_descriptor()), and accept descriptors from hidden services whose time is up to 17 or 18 hours in the future.

If we wanted to maintain the "up to 24 hours in the future" feature, then we would set MAX_AGE to 0 hours, and MAX_SKEW to 24 hours. But that sure is unintuitive.

Actually, I think our best bet here would be to remove REND_CACHE_MAX_AGE from existence (i.e. make it 0 and take it out), and come up with a better name for REND_CACHE_MAX_SKEW to indicate that it has to do with descriptor age too. Any good names?

This one needs more testing but should be worked on soon since this affects HS reachability, a key aspect of SponsorR thus changing the milestone.

Trac:
Milestone: Tor: 0.2.6.x-final to Tor: 0.2.7.x-final

Trac:
Status: new to assigned

Marking more tickets as triaged-in for 0.2.7

Trac:
Keywords: N/A deleted, 027-triaged-1-in added

Trac:
Points: N/A to medium
Version: N/A to Tor: 0.2.7
Priority: normal to major

Trac:
Parent: N/A to #13209 (moved)

Handling clock skew is always tricky. 24 hours seems reasonable. In this case:

HSDirs should only keep descriptors which were published most 24 hours, and fewer hours if the current descriptor is overwritten with a newer version or to prevent running out of memory
It doesn't seem like a client's cache needs this 24 hour cutoff (it could be less), but it doesn't seem like it will hurt.

Assuming the client's clock is not skewed significantly, the descriptor that the client possesses is deleted from the cache around the same time the descriptor becomes invalid. If the clock is skewed by fewer than 3 hours then this probably, maybe, likely won't impact onion service reachability. Any client running with a clock positively skewed by more than 3 hours will never have a usable recent consensus, and a client whose clock is negatively skewed by more than 3 hours will never have a usable consensus.

Are there objections to setting the cutoff at 26 or 27 hours? Then we have two options:

REND_CACHE_MAX_AGE = 24 hours, REND_CACHE_MAX_SKEW = 3 hours
REND_CACHE_DESC_LIFETIME = 27 hours

There may yet be an even better name.

Interestingly, rend-spec agrees with this sentiment:

Alice should cache the descriptor locally, but should not use descriptors that are more than 24 hours older than their timestamp.

Replying to arma:

Actually, I think our best bet here would be to remove REND_CACHE_MAX_AGE from existence (i.e. make it 0 and take it out), and come up with a better name for REND_CACHE_MAX_SKEW to indicate that it has to do with descriptor age too. Any good names?

Ok. Branch bug13207 now does this, after writing it and tweaking it a few times and eventually getting rid of MAX_AGE. I don't have a better name for MAX_SKEW, so I left it.

Please review branch bug13207.

Trac:
Status: assigned to needs_revision

Replying to sysrqb:

Please review branch bug13207.

Ok, that was indeed the wrong approach. It decided that we should assume the current descriptor is valid until the end of the current period (based on the local clock) plus 24 hours, accounting for clock skew. That really is a bad idea.

New approach: HSDirs should hold descriptors which have a publication time of less than 25 hours ago. We shouldn't care if the HSDir thinks it's responsible for the descriptor ID because this overrides the the logic for keeping descriptors for clients with skewed clocks.

If we set the cutoff at 25 hours then:

If a client's clock is 24 hours behind the service's clock, and the HSDir is in between them, then the HSDir should serve the descriptor (if it was given one in the past) despite the service publishing new descriptors on a new HSDir
- Unfortunately, the intro points may be invalid
If the client's clock is 24 hours ahead of the service's clock, and the HSDir's clock where the client sends its query is in between, then the HSDir won't have the descriptor. The HSDir where the service publishes its descriptor should happily serve it, if anyone asks for it.
- The service is still publishing its descriptors on the previous period's HSDirs
  - Should clients ask HSDirs from the previous period for a descriptor? At face value, this seems like a bad idea
If the client's clock is 24 hours behind the HSDir's clock, and the service is in between them, then the HSDir should serve the descriptor and it should still be valid. At some point the service will enter the next period and publish new descriptors there, but its old descriptor should remain for some time.
- As above, this may make clients sad when the service changes its intro points or goes offline
If the client's clock is 24 hours ahead of the HSDir's clock, and the service is in between them, then the client will query the current HSDir until it enters the next period. There's an amount of time where the client queries the next set of HSDirs while the service is publishing to the current set.
If the HSDir's clock is 24 hours behind the service's clock, and the client is in between them, then the HSDir should have a descriptor which was published approximately 23 hours earlier (before the service moved to the next period)
If the HSDir's clock is 24 hours ahead of the service's clock, and the client is in between them, then the HSDir will serve the descriptor for 1 hour before it considered the descriptor expired.

Trac:
Status: needs_revision to new

Trac:
Keywords: N/A deleted, TorCore201508 added
Status: new to accepted
Owner: N/A to dgoulet

Trac:
Milestone: Tor: 0.2.7.x-final to Tor: 0.2.8.x-final

Trac:
Keywords: 027-triaged-1-in, TorCore201508 deleted, N/A added

Bulk-replace SponsorR keyword with SponsorR sponsor field in Tor component.

Trac:
Sponsor: N/A to SponsorR
Keywords: SponsorR deleted, N/A added

Moving this to 029. IMO, there are still unanswered question here on the HSDir behavior if we want to make that change. See below.

New approach: HSDirs should hold descriptors which have a publication time of less than 25 hours ago. We shouldn't care if the HSDir thinks it's responsible for the descriptor ID because this overrides the the logic for keeping descriptors for clients with skewed clocks.

Basically, drop the "are we responsible for desc ID?" check, and keep any descriptors that matches this timestamp heuristic? This is scaring me a bit that is descriptor can be used to store arbitrary data on all HSDirs instead of 6...

I fear that without a statistics of how much clock skew on average we have in the network, the skew limit is a bit arbitrary. Tbh, 25 hours clock skew is massive for a client and even more for a service... I'm not even sure that service will be able to operate correctly... But let's assume we use that because worst case scenario, it seems there is no way we can use a crazy cutoff efficiently and still use the check on the descriptor ID, right?

Maybe it's the start of a new era where HSDir stores all the things but expires them quicker?...

Trac:
Milestone: Tor: 0.2.8.x-final to Tor: 0.2.9.x-final
Version: Tor: 0.2.7 to N/A
Severity: N/A to Normal
Priority: High to Medium

Movement here. #18332 (moved) will remove the logic on the HSDir side of "am I responsible for that descriptor ID" so we can go back to actually considering the New approach: :)

Trac:
Sponsor: SponsorR to SponsorR-can
Type: defect to enhancement
Reviewer: N/A to N/A

tickets market to be removed from milestone 029

Trac:
Milestone: Tor: 0.2.9.x-final to Tor: 0.2.???

Milestone renamed

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.3.???

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

Trac:
Milestone: Tor: 0.3.??? to Tor: unspecified
Keywords: N/A deleted, tor-03-unspecified-201612 added

Parent ID doesn't make sense.

Trac:
Points: medium to 3
Parent: #13209 (moved) to N/A

Remove an old triaging keyword.

Trac:
Keywords: tor-03-unspecified-201612 deleted, N/A added

Ok, I'm closing this one for a couple of reasons. First, it seems unclear on what is a sane value for a cutoff and anything we change might either introduce instability or need lots of testing.

Second, prop224 work (#12424 (moved)) is fixing that by adding a expire time field in the descriptor (currently 3 hours).

Trac:
Resolution: N/A to wontfix
Status: accepted to closed

closed

changed time estimate to 24h

mentioned in issue #13214 (moved)

mentioned in issue #15729 (moved)

mentioned in issue #16574 (moved)

moved to tpo/core/tor#13207 (closed)

mentioned in issue tpo/core/tor#13214 (closed)

Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?

Child items ...

Activity