Opened 8 months ago

Last modified 5 months ago

#33375 needs_information defect

Stop advertising an IPv6 exit policy when DNS is broken for IPv6

Reported by: teor Owned by: neel
Priority: Medium Milestone: Tor: 0.4.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.9.14
Severity: Normal Keywords: needs-proposal, security-review-dos-risk, extra-review, no-backport, ipv6, tor-exit, tor-dns, 044-must
Cc: cypherpunks, nickm, neel Actual Points:
Parent ID: #24833 Points:
Reviewer: Sponsor:

Description

When dns_seems_to_be_broken_for_ipv6(), exits should stop advertising an IPv6 exit policy.

Here's a rough design:

  • when dns_seems_to_be_broken_for_ipv6() is first set to 1, mark the relay descriptor dirty
  • when rebuilding the descriptor, check dns_seems_to_be_broken_for_ipv6() before including an IPv6 exit policy
  • reset dns_seems_to_be_broken_for_ipv6() periodically, maybe every 1-3 days?

Child Tickets

Change History (20)

comment:1 Changed 8 months ago by neel

Owner: set to neel
Status: newassigned

comment:2 Changed 8 months ago by neel

Status: assignedneeds_review

PR: https://github.com/torproject/tor/pull/1771

There is no test, but I didn't find tests for the existing "is DNS broken" code either.

comment:3 Changed 8 months ago by teor

Status: needs_reviewneeds_revision

We have a policy that new code needs tests, even if old code doesn't have them.

comment:4 Changed 8 months ago by teor

Keywords: extra-review no-backport added
Milestone: Tor: unspecifiedTor: 0.4.4.x-final
Reviewer: teor

Historically, exit code has been risky, and had bugs that we didn't find until much later,

So I'm marking this ticket for extra review, and ai don't think we should backport.

comment:5 Changed 8 months ago by teor

Cc: nickm added
Keywords: security-review-dos-risk added; tor-client removed

(Sorry for multiple comments, I kept thinking about this ticket today.)

Can you describe the design of this new feature?

The code doesn't match the rough design in the ticket description. That's ok, but without a design, I can't tell the difference between a bug and a feature. In particular, I wonder why we need separate "was_disabled" and "is_disabled" variables.

This IPv6 DNS code is currently unused, so it has never been tested. So I want to make sure we have the design right.

Here are some issues I noticed when reading the code:

  • the code only counts DNS errors on timeout, but there are actually 11 different DNS errors. We should consider which errors we want to track, and which ones we want to ignore. See http://www.wangafu.net/~nickm/libevent-2.1/doxygen/html/dns_8h.html
  • the minimum number of queries before failure is 10. But that could happen by chance, on server startup. Let's make the minimum something more reasonable. We can make it at least 1000. But maybe we should set it to 1 when TestingTorNetwork is set. That way, broken IPv6 exits will fail quickly in chutney.

We should find out which DNS errors can be triggered by tor clients, and ignore them. Otherwise, a client that floods an exit with bad DNS queries could disable IPv6 exiting on that relay. I think Nick might be able to help here.

I think it's ok to fail thousands of client circuits, before an IPv6 exit disables IPv6. Because getting the new descriptor to clients can take an hour or two. There's also a tradeoff here: we want quiet exits to disable IPv6 eventually. But we want busy exits to survive a momentary glitch.

comment:6 Changed 8 months ago by neel

The design is as follows:

  • When dns_is_broken_for_ipv6 is set to 1, we launch a timer for 24 hours to reset this to 0
  • While dns_is_broken_for_ipv6 = 1, the relay will not advertise an exit policy.
  • After 24 hours, the timer will reset to 0

While my PR is not ready for review yet (need tests), I have increased the minimum number of queries to 1000, and on TestingTorNetwork decreased it to 1.

comment:7 Changed 8 months ago by neel

Status: needs_revisionneeds_review

I added unit tests. However, this still only counts DNS errors on timeout.

Setting as needs review, feel free to put as needs revision if more DNS errors are necessary.

comment:8 Changed 8 months ago by neel

Status: needs_reviewneeds_revision

comment:9 Changed 8 months ago by neel

There is one problem with unit tests: The code this adds relies on code that calls the relay module, and some compilations don't include this, so that's why the tests fail.

comment:10 Changed 8 months ago by neel

Owner: neel deleted
Status: needs_revisionassigned

comment:11 Changed 8 months ago by neel

Status: assignedneeds_revision

comment:12 Changed 8 months ago by nickm

The relay module should always be built when running with the unit tests, though. Are you sure that's the issue?

comment:13 Changed 7 months ago by neel

Owner: set to neel
Status: needs_revisionassigned

comment:14 Changed 7 months ago by neel

Cc: neel added
Status: assignedneeds_revision

comment:15 Changed 7 months ago by neel

Status: needs_revisionneeds_review

comment:16 in reply to:  5 Changed 7 months ago by teor

Reviewer: teor

I don't have time to keep on reviewing this patch right now. I'm really busy with google summer of code and outreachy. So I'm going to pass it to another reviewer.

Here are some things for the reviewer to check:

Replying to teor:

This IPv6 DNS code is currently unused, so it has never been tested. So I want to make sure we have the design right.

Here are some issues I noticed when reading the code:

Which errors should we turn off IPv6 DNS for? All of them? Only the ones that clients can't trigger?

  • the minimum number of queries before failure is 10. But that could happen by chance, on server startup. Let's make the minimum something more reasonable. We can make it at least 1000. But maybe we should set it to 1 when TestingTorNetwork is set. That way, broken IPv6 exits will fail quickly in chutney.

The last version of the PR I reviewed changed the wrong "10". Please check that the new PR changes this code:
https://github.com/torproject/tor/pull/1771/files#diff-ed2a85a7ec36e73dc681fe94a7dcf524L1556

We should find out which DNS errors can be triggered by tor clients, and ignore them. Otherwise, a client that floods an exit with bad DNS queries could disable IPv6 exiting on that relay. I think Nick might be able to help here.

We also need to think about the risk of DNS-based attacks.

I think it's ok to fail thousands of client circuits, before an IPv6 exit disables IPv6. Because getting the new descriptor to clients can take an hour or two. There's also a tradeoff here: we want quiet exits to disable IPv6 eventually. But we want busy exits to survive a momentary glitch.

Overall, I wonder if this patch is the best way to solve this issue. Perhaps we should manually apply the BadExit flag through the network health team. Perhaps we should set the limits much, much higher.

Do we know how many queries a busy exit processes? And how many timeouts they have?
It's really hard to make a good design without good data.

comment:17 Changed 7 months ago by dgoulet

Status: needs_reviewneeds_revision

comment:18 Changed 7 months ago by teor

Keywords: needs-proposal added
Status: needs_revisionneeds_information

So there's a bunch of missing data here, a lot of design questions, and a lot of risks.

I think we need a proposal here, so that other developers can review the design.
The proposal should cover the questions I've asked in this ticket.
And talk about how we can test these changes.

And then we can update the code, and merge it.

comment:19 Changed 7 months ago by teor

(It's not anyone's fault. Sometimes code changes are more complicated than we expect.)

comment:20 Changed 5 months ago by nickm

Keywords: 044-must added

Add 044-must to all security tickets in 0.4.4

Note: See TracTickets for help on using tickets.