Round down DNS TTL to the nearest DEFAULT_DNS_TTL (30 minutes)

changed milestone to %Tor: 0.2.9.x-final

added 029-backport 029-proposed actualpoints::.2 component::core tor/tor dns milestone::Tor: 0.2.9.x-final owner::nickm points::0.5 priority::very high resolution::fixed severity::normal status::closed type::defect labels

This would also require a change to torspec to describe the TTL rounding at: https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1359

Trac:
Keywords: N/A deleted, dns, TorCoreTeam201607 added

So, clients don't do DNS cacheing by default any more, because of risks like this. Do you think it might make more sense to simply remove client-side DNS cacheing entirely?

We have ongoing research on DNS-based traffic correlation attacks (https://nymity.ch/dns-traffic-correlation/) that relates to this. While fixing #19025 (moved) will help in mitigating attacks to an extent, the most important change to consider related to DNS is to also significantly increase MIN_DNS_TTL. This is because useful domains for our attacks today have low TTLs: about 50% of Alexa top 1M have a useful domain with TTL <= 60 seconds, and 75% a TTL <= 30 min. Do you think it would be practical to have MIN_DNS_TTL set to, say, 30 min? Would too much break?

If I understand the proposal here in #19769 (moved), rounding TTLs between [0s,30m) to MIN_DNS_TTL also for exits (?), then this will actually benefit an attacker who can observe both entry traffic and DNS requests for about 25% of Alexa top 1M (but for the remaining 25% it's an improvement together with #19025 (moved) over the status quo).

Sorry if this is the wrong place for this, especially since we don't have a paper to share yet.

Trac:
Cc: N/A to phw

Trac:
Cc: phw to phw, pulls

No further code or documentation will be written in July, due to time itself. (Leaving needs_revision tickets as-is)

Trac:
Keywords: TorCoreTeam201607 deleted, TorCoreTeam201608 added

Move unassigned items in August to September.

Trac:
Keywords: TorCoreTeam201608 deleted, TorCoreTeam201609 added

It would be nice to fix this, but we need to decide what to do first.

Trac:
Status: new to needs_information

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.3.0.x-final

Throwing this and #19025 (moved) into 0.3.0 : see https://lists.torproject.org/pipermail/tor-talk/2016-October/042445.html

Trac:
Priority: Medium to Very High

Concretely

MIN_DNS_TTL = 5*60;
MAX_DNS_TTL = 60*60;

dns_clip_ttl(uint32_t ttl)
{
  if (ttl < MIN_DNS_TTL)
    return MIN_DNS_TTL;
  else 
    return MAX_DNS_TTL;
}

Fix #19025 (moved) (otherwise ttl above will always be MIN_DNS_TTL).
Potentially refactor the DNS caching code to support evictions (while doing this, maybe rip out all old client-side DNS caching code?).
Add some form of logging to track cache size, usage, and eviction rate.
dns_get_expiry_ttl should be the same as dns_clip_ttl above. Please note that we are not sure these are the only relevant functions.

Given

For popular websites, caching at exits is highly likely, and DefecTor attacks are the same as WF attacks.
For unpopular websites, caching and TTLs are moot, since the probability of an DNS record being chached is negligible. Caching these records are just an extra burden on the exit and in a sense also a risk due to leaking recent activity at the exit on compromise. DefecTor attacks will be more precise than WF attacks here, and Tor needs WF defenses to mitigate (another long-term topic).
For long TTLs, for what it is worth, we know from #19025 (moved) that the real-world impact of ignoring these long TTLs are not a serious issue.
For short TTLs, the impact of increasing it is our primary worry since we might break something.
We want to prevent fine-grained TTLs to protect against tagging attacks.
We do not want too high TTLs to have a chance to auto-magically resolve DNS cache poisoning.
The total size of the cache might be a vector for DoS.
The client-side DNS cache remains off.

Goals for DefectTor mitigation

(Read about DefecTor attacks here: https://nymity.ch/tor-dns/tor-dns.pdf).
Allow long TTLs to be long(er).
For short TTLs, go as far up as we are comfortable to without significantly risking breaking things.

Proposal

Stage changes: start with repairing the TTL bug #19025 (moved) and change clipping to 560 seconds for MIN_DNS_TTL and 6060 seconds for MAX_DNS_TTL, honoring no intermediate values (see code above). Wait for feedback on unexpected breaks. If all manageable, increase MIN_DNS_TTL to 60*60 in a future patch, effectively always caching for 60 minutes. If DefecTor attacks become a real concern short-term, encourage concerned site owners to consider longer TTLs to hit the MAX_DNS_TTL value. Make the cache size limited and eviction when full uniformly random. We random to give an attacker less control since it can presumably cause evictions at will (LIFO is easy to manipulate for an attacker).

Getting feedback on real-world cache size, usage, and eviction rate from exit operators would be useful, so perhaps some form of log output is reasonable?

Regarding MIN_DNS_TTL, Microsoft Windows used (still does?) to have a minimum TTL of 15 minutes for it's client-side cache, IIRC. Given how prevalent that platform is, I guess a 1 minute MIN_DNS_TTL is very unlikely to break things.

Are there plans to allow more values than {MIN,MAX}_DNS_TTL?

Trac:
Cc: phw, pulls to phw, pulls, nicoo

Since pulls asked for feedback from exit operators, here is some based on my experience with Nos oignons.

Our configuration is publicly documented, but in French, so here is a summary:

We use Unbound as a local, DNSSEC-validating resolver on the exit nodes.
- It obviously only listens locally.
- We use its private-address feature to prevent RFC1918 addresses from figuring in results, to mitigate DNS rebinding attacks.
- We use hide-{identity,version}, mostly out of general principle: anybody reading our documentation would learn that we run Unbound; however, it's unclear to me whether those could be exploited to tie users to specific exits being used for DNS resolution (and if that's relevant).
- We use harden-short-bufsize and harden-large-queries to make Unbound return SERVFAIL on edge cases that can be exploited for DoSing the resolver.
- We forward queries for nos-oignons.{net,org,fr} directly to our authoritative resolver. This is not especially relevant for the exit, but error logs mails and so on will break if the domain fails to resolve.
/etc/resolv.conf always specifies search nos-oignons.net (does little-t tor honor that? that could be awkward) and 127.0.0.1 as the first nameserver. If a fallback resolver is specified, it is either operated by the network hosting the exit node or by a close-by (network-wise) organization we have friendly ties to (typically, a non-profit, associative ISP).

While writing this, I'm realising it might be useful to have “DNS resolution best-practices” for exit operators, since this is mostly something adhoc we came up with based on what our sysadmins were doing in other places, not something we systematically researched.

My branch bug19769_029 implements this.

phw, pulls: Is this about what you had in mind? I took some liberties and might have messed things up.

Trac:
Status: needs_information to needs_review
Actualpoints: N/A to .2

When we merge this, we should also merge the patch from #19025 (moved).

This is potentially backportable to 0.2.9.

Potential enhancement: these values could be consensus parameters rather than hardcoded #defines.

Trac:
Keywords: N/A deleted, 029-backport added

My bug19769_029 branch has been updated, based on feedback and corrections from pulls. Please review?

setting owner

Trac:
Status: needs_review to accepted
Owner: N/A to nickm

Trac:
Status: accepted to needs_review

Looks good to me, sorry for the delay.

Trac:
Status: needs_review to merge_ready

Trac:
Keywords: N/A deleted, review-group-15 added

Squashed, with phw's fix for #19025 (moved), as bug19769_19025_029.

Merged bug19769_19025_029 to master; possible 0.2.9 backport.

Trac:
Milestone: Tor: 0.3.0.x-final to Tor: 0.2.9.x-final
Keywords: review-group-15 deleted, N/A added

Trac:
Keywords: TorCoreTeam201609 deleted, N/A added

This has baked long enough without problems; backporting to 0.2.9

Trac:
Resolution: N/A to fixed
Status: merge_ready to closed

closed

changed time estimate to 4h

added 1h 36m of time spent

mentioned in issue #20416 (moved)

moved to tpo/core/tor#19769 (closed)

Round down DNS TTL to the nearest DEFAULT_DNS_TTL (30 minutes)

Child items ...

Activity

Concretely

Given

Goals for DefectTor mitigation

Proposal