Opened 12 months ago

Last modified 2 months ago

#19769 merge_ready defect

Round down DNS TTL to the nearest DEFAULT_DNS_TTL (30 minutes)

Reported by: teor Owned by: nickm
Priority: Very High Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 029-proposed, dns, 029-backport
Cc: phw, pulls, nicoo Actual Points: .2
Parent ID: Points: 0.5
Reviewer: Sponsor:

Description

In #19025, we fix a bug that prevented exits sending DNS TTLs to clients for IPv4 and IPv6 addresses.

But we don't want to have too many potential values for these TTLs, to avoid tagging attacks.

So I propose

  • Exits round down (truncate) the TTL received from the DNS server, and
  • Clients round down the TTL received from the Exit,

to the nearest of:

  • MIN_DNS_TTL (1 minute), or
  • DEFAULT_DNS_TTL (30, 60, 90, 120, 150, 180 minutes)

MAX_DNS_TTL is 3 hours, so there are only 7 possible values for the TTL.
I chose to round down because that way, Tor DNS TTLs are only ever shorter than the lifetime specified by the DNS server.

I don't think we need to add noise to the TTL received from either the DNS server or Exit. I can't see the value in randomising it, and allowing randomisation could hide a tagging attack.

Child Tickets

Change History (26)

comment:1 Changed 12 months ago by teor

This would also require a change to torspec to describe the TTL rounding at:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1359

comment:2 Changed 12 months ago by teor

  • Keywords dns TorCoreTeam201607 added

comment:3 Changed 12 months ago by nickm

So, clients don't do DNS cacheing by default any more, because of risks like this. Do you think it might make more sense to simply remove client-side DNS cacheing entirely?

comment:4 Changed 12 months ago by pulls

We have ongoing research on DNS-based traffic correlation attacks (https://nymity.ch/dns-traffic-correlation/) that relates to this. While fixing #19025 will help in mitigating attacks to an extent, the most important change to consider related to DNS is to also significantly increase MIN_DNS_TTL. This is because useful domains for our attacks today have low TTLs: about 50% of Alexa top 1M have a useful domain with TTL <= 60 seconds, and 75% a TTL <= 30 min. Do you think it would be practical to have MIN_DNS_TTL set to, say, 30 min? Would too much break?

If I understand the proposal here in #19769, rounding TTLs between [0s,30m) to MIN_DNS_TTL also for exits (?), then this will actually benefit an attacker who can observe both entry traffic and DNS requests for about 25% of Alexa top 1M (but for the remaining 25% it's an improvement together with #19025 over the status quo).

Sorry if this is the wrong place for this, especially since we don't have a paper to share yet.

comment:5 Changed 12 months ago by phw

  • Cc phw added

comment:6 Changed 12 months ago by pulls

  • Cc pulls added

comment:7 Changed 12 months ago by nickm

  • Keywords TorCoreTeam201608 added; TorCoreTeam201607 removed

No further code or documentation will be written in July, due to time itself. (Leaving needs_revision tickets as-is)

comment:8 Changed 11 months ago by nickm

  • Keywords TorCoreTeam201609 added; TorCoreTeam201608 removed

Move unassigned items in August to September.

comment:9 Changed 10 months ago by teor

  • Status changed from new to needs_information

It would be nice to fix this, but we need to decide what to do first.

comment:10 Changed 9 months ago by nickm

  • Milestone changed from Tor: 0.2.??? to Tor: 0.3.0.x-final

comment:12 Changed 7 months ago by nickm

  • Priority changed from Medium to Very High

comment:13 Changed 7 months ago by pulls

Concretely

MIN_DNS_TTL = 5*60;
MAX_DNS_TTL = 60*60;

dns_clip_ttl(uint32_t ttl)
{
  if (ttl < MIN_DNS_TTL)
    return MIN_DNS_TTL;
  else 
    return MAX_DNS_TTL;
}
  • Fix #19025 (otherwise ttl above will always be MIN_DNS_TTL).
  • Potentially refactor the DNS caching code to support evictions (while doing this, maybe rip out all old client-side DNS caching code?).
  • Add some form of logging to track cache size, usage, and eviction rate.
  • dns_get_expiry_ttl should be the same as dns_clip_ttl above. Please note that we are not sure these are the only relevant functions.

Given

  • For popular websites, caching at exits is highly likely, and DefecTor attacks are the same as WF attacks.
  • For unpopular websites, caching and TTLs are moot, since the probability of an DNS record being chached is negligible. Caching these records are just an extra burden on the exit and in a sense also a risk due to leaking recent activity at the exit on compromise. DefecTor attacks will be more precise than WF attacks here, and Tor needs WF defenses to mitigate (another long-term topic).
  • For long TTLs, for what it is worth, we know from #19025 that the real-world impact of ignoring these long TTLs are not a serious issue.
  • For short TTLs, the impact of increasing it is our primary worry since we might break something.
  • We want to prevent fine-grained TTLs to protect against tagging attacks.
  • We do not want too high TTLs to have a chance to auto-magically resolve DNS cache poisoning.
  • The total size of the cache might be a vector for DoS.
  • The client-side DNS cache remains off.

Goals for DefectTor mitigation

  • (Read about DefecTor attacks here: https://nymity.ch/tor-dns/tor-dns.pdf).
  • Allow long TTLs to be long(er).
  • For short TTLs, go as far up as we are comfortable to without significantly risking breaking things.

Proposal

Stage changes: start with repairing the TTL bug #19025 and change clipping to 5*60 seconds for MIN_DNS_TTL and 60*60 seconds for MAX_DNS_TTL, honoring no intermediate values (see code above). Wait for feedback on unexpected breaks. If all manageable, increase MIN_DNS_TTL to 60*60 in a future patch, effectively always caching for 60 minutes. If DefecTor attacks become a real concern short-term, encourage concerned site owners to consider longer TTLs to hit the MAX_DNS_TTL value. Make the cache size limited and eviction when full uniformly random. We random to give an attacker less control since it can presumably cause evictions at will (LIFO is easy to manipulate for an attacker).

Getting feedback on real-world cache size, usage, and eviction rate from exit operators would be useful, so perhaps some form of log output is reasonable?

Last edited 7 months ago by pulls (previous) (diff)

comment:14 Changed 7 months ago by nicoo

  • Cc nicoo added

Regarding MIN_DNS_TTL, Microsoft Windows used (still does?) to have a minimum TTL of 15 minutes for it's client-side cache, IIRC.
Given how prevalent that platform is, I guess a 1 minute MIN_DNS_TTL is very unlikely to break things.

Are there plans to allow more values than {MIN,MAX}_DNS_TTL?

comment:15 Changed 7 months ago by nicoo

Since pulls asked for feedback from exit operators, here is some based on my experience with Nos oignons.

Our configuration is publicly documented, but in French, so here is a summary:

  • We use Unbound as a local, DNSSEC-validating resolver on the exit nodes.
    • It obviously only listens locally.
    • We use its private-address feature to prevent RFC1918 addresses from figuring in results, to mitigate DNS rebinding attacks.
    • We use hide-{identity,version}, mostly out of general principle: anybody reading our documentation would learn that we run Unbound; however, it's unclear to me whether those could be exploited to tie users to specific exits being used for DNS resolution (and if that's relevant).
    • We use harden-short-bufsize and harden-large-queries to make Unbound return SERVFAIL on edge cases that can be exploited for DoSing the resolver.
    • We forward queries for nos-oignons.{net,org,fr} directly to our authoritative resolver. This is not especially relevant for the exit, but error logs mails and so on will break if the domain fails to resolve.
  • /etc/resolv.conf always specifies search nos-oignons.net (does little-t tor honor that? that could be awkward) and 127.0.0.1 as the first nameserver. If a fallback resolver is specified, it is either operated by the network hosting the exit node or by a close-by (network-wise) organization we have friendly ties to (typically, a non-profit, associative ISP).

While writing this, I'm realising it might be useful to have “DNS resolution best-practices” for exit operators, since this is mostly something adhoc we came up with based on what our sysadmins were doing in other places, not something we systematically researched.

comment:16 Changed 7 months ago by nickm

  • Actual Points set to .2
  • Status changed from needs_information to needs_review

My branch bug19769_029 implements this.

phw, pulls: Is this about what you had in mind? I took some liberties and might have messed things up.

comment:17 Changed 7 months ago by nickm

When we merge this, we should also merge the patch from #19025.

comment:18 Changed 7 months ago by nickm

  • Keywords 029-backport added

This is potentially backportable to 0.2.9.

Potential enhancement: these values could be consensus parameters rather than hardcoded #defines.

comment:19 Changed 6 months ago by nickm

My bug19769_029 branch has been updated, based on feedback and corrections from pulls. Please review?

comment:20 Changed 6 months ago by nickm

  • Owner set to nickm
  • Status changed from needs_review to accepted

setting owner

comment:21 Changed 6 months ago by nickm

  • Status changed from accepted to needs_review

comment:22 Changed 6 months ago by pulls

Looks good to me, sorry for the delay.

comment:23 Changed 6 months ago by nickm

  • Status changed from needs_review to merge_ready

comment:24 Changed 6 months ago by nickm

  • Keywords review-group-15 added

comment:25 Changed 6 months ago by nickm

  • Keywords review-group-15 removed
  • Milestone changed from Tor: 0.3.0.x-final to Tor: 0.2.9.x-final

Squashed, with phw's fix for #19025, as bug19769_19025_029.

Merged bug19769_19025_029 to master; possible 0.2.9 backport.

comment:26 Changed 2 months ago by nickm

  • Keywords TorCoreTeam201609 removed
Note: See TracTickets for help on using tickets.