Opened 4 years ago

Last modified 20 months ago

#17605 needs_revision defect

Tell caches to remove X-Your-Address-Is from Tor Directory documents

Reported by: teor Owned by: jryans
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-dirauth, 034-triage-20180328, 034-removed-20180328
Cc: jryans@… Actual Points:
Parent ID: Points: 2
Reviewer: nickm Sponsor:

Description

Some web caches (such as Farahavar's previous cache), pass on the X-Your-IP-Address-Is header from one directory document to multiple clients. This causes the clients to guess the wrong IP address as their address.

I think we should add one or more of the following headers to every directory response:

Pragma: no-cache tells HTTP 1.0 compliant caches to disable caching entirely. (This will also disable caching for HTTP 1.1 caches unless we provide a more generous Cache-Control header, like the one below.)

Connection: close X-Your-IP-Address-Is tells HTTP 1.1 caches to never send out the X-Your-IP-Address-Is header, even to the first client requesting the document.

Cache-Control: no-cache="X-Your-IP-Address-Is" tells HTTP 1.1 caches to not cache the header at all. Alternately, if the cache doesn't support the no-cache="<header-name>" feature, it tells the cache not to cache the entire document. (This also causes the cache to attempt to revalidate the header, which might not be what we want, as Tor doesn't support cache revalidation.)

I don't know enough about how caches typically behave to recommend which ones.

See:

Child Tickets

Change History (37)

comment:1 Changed 4 years ago by yawning

I would vote for: Pragma: no-cache *and* Cache-Control: no-store (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.2) in responses.

comment:2 Changed 4 years ago by nickm

Priority: MediumHigh

IMO cacheing should probably happen on URLs that are cacheable (ie, consensuses).

comment:3 in reply to:  2 Changed 4 years ago by teor

Replying to nickm:

IMO cacheing should probably happen on URLs that are cacheable (ie, consensuses).

But when a cache stores a consensus, it also stores the X-Your-IP-Address header from either the cache's machine, or the original client that asked for the consensus.

Neither of these values of X-Your-IP-Address-Is are valid for any other clients, but caches will typically send them out with the cached response.

comment:4 Changed 4 years ago by arma

What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.

The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.

We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.

Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?

The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.

comment:5 in reply to:  4 ; Changed 4 years ago by teor

Replying to arma:

What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.

I think this is an excellent idea. As the HTTP headers of a naked dirport response are unauthenticated, they can be modified in transit, and we can't know either way.

The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.

A relay believes any directory mirror, not just the authorities. But if it doesn't know its IP address, it will only connect to authorities.

We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.

With the introduction of fallback directory mirrors in 0.2.8 (#15775), the extra load for bootstrap begindirs will be shared among 100-250 high-uptime directory mirrors, rather than just the ~9 authorities.

After bootstrap, with the introduction of "dir servers for all" (#12538) in 0.2.8, it will be shared among almost all relays.

So I think we can do begindirs for all directory fetches. We might want to fix #17848 at the same time, otherwise clients and relays won't know if they have an existing connection to a directory server, and load balancing will suffer.

Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?

It has the relay's IPv4 address.

(Although it's somewhat orthogonal, we'd like to have some way for relays to learn their IPv6 addresses, too. This would be somewhat easier to do by adding a HTTP header, rather than changing the format of a NETINFO cell. See #5940.)

The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.

Let's fix it then!

comment:6 Changed 4 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.

comment:7 Changed 4 years ago by nickm

Points: small/medium?

comment:8 Changed 3 years ago by nickm

Points: small/medium?2

small/medium => 2.

comment:9 Changed 3 years ago by isabela

Keywords: isaremoved added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

comment:10 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:11 in reply to:  5 Changed 3 years ago by teor

Replying to teor:

Replying to arma:

What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.

I think this is an excellent idea. As the HTTP headers of a naked dirport response are unauthenticated, they can be modified in transit, and we can't know either way.

The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.

A relay believes any directory mirror, not just the authorities. But if it doesn't know its IP address, it will only connect to authorities.

We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.

With the introduction of fallback directory mirrors in 0.2.8 (#15775), the extra load for bootstrap begindirs will be shared among 100-250 high-uptime directory mirrors, rather than just the ~9 authorities.

After bootstrap, with the introduction of "dir servers for all" (#12538) in 0.2.8, it will be shared among almost all relays.

So I think we can do begindirs for all directory fetches.

We made clients always use begindir in 0.2.8 in #18483.

We might want to fix #17848 at the same time, otherwise clients and relays won't know if they have an existing connection to a directory server, and load balancing will suffer.

Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?

It has the relay's IPv4 address.

(Although it's somewhat orthogonal, we'd like to have some way for relays to learn their IPv6 addresses, too. This would be somewhat easier to do by adding a HTTP header, rather than changing the format of a NETINFO cell. See #5940.)

Actually, the NETINFO format supports IPv6. If it doesn't work when you connect to a relay's IPv6 ORPort, that's a bug.

The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.

Let's fix it then!

comment:12 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:13 Changed 3 years ago by jryans

Cc: jryans@… added
Summary: Tell caches to remove X-Your-IP-Address-Is from Tor Directory documentsTell caches to remove X-Your-Address-Is from Tor Directory documents

Correcting summary to match current spelling of the header in code: X-Your-Address-Is.

comment:14 Changed 3 years ago by jryans

Owner: set to jryans
Status: newassigned

comment:15 Changed 3 years ago by jryans

Status: assignedneeds_review

My understanding from the discussion above is that we can:

  1. Remove the X-Your-Address-Is hint entirely
  2. Use NETINFO cells from remote nodes to detect IP address changes

Here's a branch with those changes:

https://github.com/torproject/tor/compare/master...jryans:rm-address-header

I've used local testing with Chutney to verify that it's working correctly.

We do appear to receive both IPv4 and IPv6 addresses as expected via NETINFO cells. We should also update the control spec to note a new method for IP address changes, but I'll wait until we discuss the approach here first.

comment:16 Changed 3 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.0.x-final

comment:17 Changed 3 years ago by nickm

Keywords: review-group-14 added

comment:18 Changed 3 years ago by nickm

Reviewer: nickm

comment:19 Changed 3 years ago by nickm

This patch looks correct to me: it removes support for giving or taking the X-Your-Address-Is hint, and replaces it with looking at the netinfo hint instead.

I have two questions about this patch -- arma or teor, could you give some feedback?

  • Should it ignore IPv6 addresses, because we can't publish them as a relay's primary address? (I think "yes".)
  • Should it continue to give the X-Your-Address-Is answer for now, in case some relays actually need it? (Here I'm not sure.)

comment:20 in reply to:  19 Changed 3 years ago by teor

Replying to nickm:

This patch looks correct to me: it removes support for giving or taking the X-Your-Address-Is hint, and replaces it with looking at the netinfo hint instead.

I have two questions about this patch -- arma or teor, could you give some feedback?

  • Should it ignore IPv6 addresses, because we can't publish them as a relay's primary address? (I think "yes".)

At some point, we would like relays to be able to learn their IPv6 addresses (#5940). But since this is a new feature, with its own ticket, there's no need to implement it in this ticket.

  • Should it continue to give the X-Your-Address-Is answer for now, in case some relays actually need it? (Here I'm not sure.)

Yes, but only over an encrypted begindir connection.
Or, we can cut out all the X-Your-Address-Is code, and just rely on NETINFO cells.

Here's what I'd like to do:
A) directory caches only send X-Your-Address-Is on begindir connections

  • modify if (!is_local_addr(&conn->base_.addr)) { in write_http_response_header_impl() to check for begindir connections

B) relays only believe X-Your-Address-Is from begindir connections

  • modify if (conn->dirconn_direct) { in connection_dir_client_reached_eof() to check for begindir connections

C) relays which don't know their own IP address make a begindir connection to an authority to discover that IP address

  • modify if (!directory_must_use_begindir(options)) { in directory_command_should_use_begindir() to also check if we know our own address using router_pick_published_address(options, &addr, 1) == 0

What will likely happen is:

  • the relay initiates a begindir connection and receives its own IP address in the NETINFO cell

If this fails, then:

  • the relay believes the X-Your-IP-Address-Is header on the encrypted begindir request

We can't just implement A) as a transitional measure for older relays talking to newer caches, because those older relays will never make begindir connections - in directory_command_should_use_begindir(), relays never make begindir connections (well, until #20711, and then only in the case when the mirror only has an ORPort).

comment:21 Changed 3 years ago by nickm

Status: needs_reviewneeds_revision

comment:22 Changed 3 years ago by nickm

Keywords: review-group-15 added; review-group-14 removed

comment:23 Changed 3 years ago by arma

If you stop sending x-your-address-is headers, then all of the relays that haven't upgraded yet will be sol. So we shouldn't stop sending them for a good while, right?

Teor suggests "relays which don't know their own IP address make a begindir connection to an authority to discover that IP address", but one of the goals here is to detect if the IP address has *changed*, not just to learn it the first time. So if relays only do begindir when they don't know any IP address for themselves, and they stop believing the naked http header, then we lose the functionality to learn when the address changed.

...Unless there is some periodic relay handshake that the relay does, which would let it learn about a new address from the netinfo cell. And I've got just the one -- the periodic reachability tests by the directory authorities -- *except*, if the IP address changes, the relays will stop being reachable anymore, so they're going to have to notice on their own that something changed, in order to generate a new descriptor with the new address in it, and only then will the authorities try reaching them on the new address. Bummer.

Ok. To summarize:

  • Directory servers shouldn't stop giving out the header yet, or it'll break existing relays.
  • We can teach new relays to listen to the address they find in the netinfo cell. Probably we should only believe it when we're interacting with a directory authority. But that change by itself won't be enough, because we also need to make relays do periodic outbound connection handshakes with directory authorities, or they won't reliably get the netinfo cells they need. That step probably requires at least some design, and makes an 030 target less likely.
  • Right now if anything caches directory objects at the middlebox, I fear they cache the http headers too. So I think Nick's statement that "cacheing should probably happen on URLs that are cacheable (ie, consensuses)" is not true yet, because *every* url will have the x-your-address-is header in it, and we shouldn't take that header out yet. Does that mean we should add the no-cache / no-store lines for now, while we still serve the x-your-address-is header?

comment:24 in reply to:  23 Changed 3 years ago by teor

Replying to arma:

If you stop sending x-your-address-is headers, then all of the relays that haven't upgraded yet will be sol. So we shouldn't stop sending them for a good while, right?

Yes. Even on unencrypted connections, sadly.

Teor suggests "relays which don't know their own IP address make a begindir connection to an authority to discover that IP address", but one of the goals here is to detect if the IP address has *changed*, not just to learn it the first time. So if relays only do begindir when they don't know any IP address for themselves, and they stop believing the naked http header, then we lose the functionality to learn when the address changed.

...Unless there is some periodic relay handshake that the relay does, which would let it learn about a new address from the netinfo cell. And I've got just the one -- the periodic reachability tests by the directory authorities -- *except*, if the IP address changes, the relays will stop being reachable anymore, so they're going to have to notice on their own that something changed, in order to generate a new descriptor with the new address in it, and only then will the authorities try reaching them on the new address. Bummer.

The preemptive (client) circuits that each relay makes elicit netinfo cells in response, as do bandwidth self-testing and ORPort and DirPort reachability checks.

Ok. To summarize:

  • Directory servers shouldn't stop giving out the header yet, or it'll break existing relays.
  • We can teach new relays to listen to the address they find in the netinfo cell. Probably we should only believe it when we're interacting with a directory authority. But that change by itself won't be enough, because we also need to make relays do periodic outbound connection handshakes with directory authorities, or they won't reliably get the netinfo cells they need. That step probably requires at least some design, and makes an 030 target less likely.

How about a design where the relay uses the netinfo cells from any circuit to detect if the address *might* have changed, and then contacts a directory authority to confirm?

  • Right now if anything caches directory objects at the middlebox, I fear they cache the http headers too. So I think Nick's statement that "cacheing should probably happen on URLs that are cacheable (ie, consensuses)" is not true yet, because *every* url will have the x-your-address-is header in it, and we shouldn't take that header out yet. Does that mean we should add the no-cache / no-store lines for now, while we still serve the x-your-address-is header?

Yes, we should add the cache lines, there's no way of escaping that.
(Well, 0.2.8 and later clients don't need them, because they all use begindir.)

comment:25 Changed 3 years ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

comment:26 in reply to:  1 Changed 3 years ago by arma

Replying to yawning:

I would vote for: Pragma: no-cache *and* Cache-Control: no-store (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.2) in responses.

Maybe somebody wants to put together a short branch that adds those headers to directory responses when they come over an unencrypted channel? It seems everybody here agrees with that component (or is that 'nobody disagrees').

comment:27 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:28 Changed 3 years ago by dgoulet

Keywords: tor-dirauth added; tor-auth removed

Turns out that tor-auth is for directory authority so make it clearer with tor-dirauth

comment:29 Changed 3 years ago by nickm

Keywords: review-group-15 removed

comment:30 Changed 3 years ago by nickm

Keywords: isaremoved removed

comment:31 Changed 2 years ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final

comment:32 Changed 2 years ago by nickm

Defer more needs_revision items to 0.3.3

comment:33 Changed 2 years ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

comment:34 Changed 22 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Mark a lot of assigned/needs_revision tickets as 0.3.4. If you think this should happen in 0.3.3 instead, just let me know?

comment:35 Changed 21 months ago by nickm

Keywords: 034-triage-20180328 added

comment:36 Changed 21 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:37 Changed 20 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These needs_revision, tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if somebody does the necessary revision.

Note: See TracTickets for help on using tickets.