time_t cutoff = now - REND_CACHE_MAX_AGE - REND_CACHE_MAX_SKEW;
That's currently 3 days.
Yet
config.c: V(RendPostPeriod, INTERVAL, "1 hour"),
So we expect our current one to get overwritten once an hour, yet if the hidden service goes away (stops publishing) we continue to serve its old (presumably no longer working) descriptor for 71 more hours.
One downside to this poor tuning is that we waste the time of clients who try to access the hidden service -- if we instead tell them there is no descriptor, they could go to another hsdir or give up more promptly.
One upside is that if we say we don't have one, clients will go visit five other hsdirs before giving up, which could result in a lot of thrashing. Whereas providing an old one will keep the client distracted for a while. But that's not really a good reason.
Another downside is that it's easier to stuff our cache full of things, i.e. use hsdirs for arbitrary storage of blobs for three days.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
First of all, this value is used by clients and hsdirs. Unless I've missed something, this means that a client which accesses a HS, waits two days without restarting, then tries again will reach out to all old intro points before fetching a new descriptor.
I think it's worth looking at this as two separate cases:
There is no value in serving descriptors older than the descriptor-id time period of 24 hours. A client with a clock this inaccurate can't be expected to function. These descriptors are unlikely to be queried and even less likely to be useful. The best implementation is to clean these when the descriptor-id for that service changes, with some fuzz for clock skew.
The current value is too high, considering that RendPostPeriod is usually one hour; a 4-hour-old descriptor on a HSDir is almost certainly useless. But, I don't see a very compelling reason to reduce it below 24 hours either. The value used here is effectively a maximum value for RendPostPeriod.
/* Is descriptor too old? */ if (parsed->timestamp < now - REND_CACHE_MAX_AGE-REND_CACHE_MAX_SKEW) { log_warn(LD_REND, "Service descriptor with service ID %s is too old.", safe_str_client(service_id)); goto err; } /* Is descriptor too far in the future? */ if (parsed->timestamp > now + REND_CACHE_MAX_SKEW) { log_warn(LD_REND, "Service descriptor with service ID %s is too far in " "the future.", safe_str_client(service_id)); goto err; }
I bet if we set MAX_SKEW to 18 hours, and MAX_AGE to 6 hours, then we'd get a lot of what we want here.
That is, in normal circumstances we'll keep the descriptor for 23 or 24 hours after it's generated (it's not exactly 24 because check out the d->timestamp -= d->timestamp % 3600; /* Round down to nearest hour */ line in rend_service_update_descriptor()), and accept descriptors from hidden services whose time is up to 17 or 18 hours in the future.
If we wanted to maintain the "up to 24 hours in the future" feature, then we would set MAX_AGE to 0 hours, and MAX_SKEW to 24 hours. But that sure is unintuitive.
Actually, I think our best bet here would be to remove REND_CACHE_MAX_AGE from existence (i.e. make it 0 and take it out), and come up with a better name for REND_CACHE_MAX_SKEW to indicate that it has to do with descriptor age too. Any good names?
Handling clock skew is always tricky. 24 hours seems reasonable. In this case:
HSDirs should only keep descriptors which were published most 24 hours, and fewer hours if the current descriptor is overwritten with a newer version or to prevent running out of memory
It doesn't seem like a client's cache needs this 24 hour cutoff (it could be less), but it doesn't seem like it will hurt.
Assuming the client's clock is not skewed significantly, the descriptor that the client possesses is deleted from the cache around the same time the descriptor becomes invalid. If the clock is skewed by fewer than 3 hours then this probably, maybe, likely won't impact onion service reachability. Any client running with a clock positively skewed by more than 3 hours will never have a usable recent consensus, and a client whose clock is negatively skewed by more than 3 hours will never have a usable consensus.
Are there objections to setting the cutoff at 26 or 27 hours? Then we have two options:
Actually, I think our best bet here would be to remove REND_CACHE_MAX_AGE from existence (i.e. make it 0 and take it out), and come up with a better name for REND_CACHE_MAX_SKEW to indicate that it has to do with descriptor age too. Any good names?
Ok. Branch bug13207 now does this, after writing it and tweaking it a few times and eventually getting rid of MAX_AGE. I don't have a better name for MAX_SKEW, so I left it.
Ok, that was indeed the wrong approach. It decided that we should assume the current descriptor is valid until the end of the current period (based on the local clock) plus 24 hours, accounting for clock skew. That really is a bad idea.
New approach: HSDirs should hold descriptors which have a publication time of less than 25 hours ago. We shouldn't care if the HSDir thinks it's responsible for the descriptor ID because this overrides the the logic for keeping descriptors for clients with skewed clocks.
If we set the cutoff at 25 hours then:
If a client's clock is 24 hours behind the service's clock, and the HSDir is in between them, then the HSDir should serve the descriptor (if it was given one in the past) despite the service publishing new descriptors on a new HSDir
Unfortunately, the intro points may be invalid
If the client's clock is 24 hours ahead of the service's clock, and the HSDir's clock where the client sends its query is in between, then the HSDir won't have the descriptor. The HSDir where the service publishes its descriptor should happily serve it, if anyone asks for it.
The service is still publishing its descriptors on the previous period's HSDirs
Should clients ask HSDirs from the previous period for a descriptor? At face value, this seems like a bad idea
If the client's clock is 24 hours behind the HSDir's clock, and the service is in between them, then the HSDir should serve the descriptor and it should still be valid. At some point the service will enter the next period and publish new descriptors there, but its old descriptor should remain for some time.
As above, this may make clients sad when the service changes its intro points or goes offline
If the client's clock is 24 hours ahead of the HSDir's clock, and the service is in between them, then the client will query the current HSDir until it enters the next period. There's an amount of time where the client queries the next set of HSDirs while the service is publishing to the current set.
If the HSDir's clock is 24 hours behind the service's clock, and the client is in between them, then the HSDir should have a descriptor which was published approximately 23 hours earlier (before the service moved to the next period)
If the HSDir's clock is 24 hours ahead of the service's clock, and the client is in between them, then the HSDir will serve the descriptor for 1 hour before it considered the descriptor expired.
Moving this to 029. IMO, there are still unanswered question here on the HSDir behavior if we want to make that change. See below.
New approach: HSDirs should hold descriptors which have a publication time of less than 25 hours ago. We shouldn't care if the HSDir thinks it's responsible for the descriptor ID because this overrides the the logic for keeping descriptors for clients with skewed clocks.
Basically, drop the "are we responsible for desc ID?" check, and keep any descriptors that matches this timestamp heuristic? This is scaring me a bit that is descriptor can be used to store arbitrary data on all HSDirs instead of 6...
I fear that without a statistics of how much clock skew on average we have in the network, the skew limit is a bit arbitrary. Tbh, 25 hours clock skew is massive for a client and even more for a service... I'm not even sure that service will be able to operate correctly... But let's assume we use that because worst case scenario, it seems there is no way we can use a crazy cutoff efficiently and still use the check on the descriptor ID, right?
Maybe it's the start of a new era where HSDir stores all the things but expires them quicker?...
Trac: Milestone: Tor: 0.2.8.x-final to Tor: 0.2.9.x-final Version: Tor: 0.2.7 toN/A Severity: N/Ato Normal Priority: High to Medium
Movement here. #18332 (moved) will remove the logic on the HSDir side of "am I responsible for that descriptor ID" so we can go back to actually considering the New approach: :)
Trac: Sponsor: SponsorR to SponsorR-can Type: defect to enhancement Reviewer: N/AtoN/A
Ok, I'm closing this one for a couple of reasons. First, it seems unclear on what is a sane value for a cutoff and anything we change might either introduce instability or need lots of testing.
Second, prop224 work (#12424 (moved)) is fixing that by adding a expire time field in the descriptor (currently 3 hours).
Trac: Resolution: N/Ato wontfix Status: accepted to closed