Some web caches (such as Farahavar's previous cache), pass on the X-Your-IP-Address-Is header from one directory document to multiple clients. This causes the clients to guess the wrong IP address as their address.
I think we should add one or more of the following headers to every directory response:
Pragma: no-cache tells HTTP 1.0 compliant caches to disable caching entirely. (This will also disable caching for HTTP 1.1 caches unless we provide a more generous Cache-Control header, like the one below.)
Connection: close X-Your-IP-Address-Is tells HTTP 1.1 caches to never send out the X-Your-IP-Address-Is header, even to the first client requesting the document.
Cache-Control: no-cache="X-Your-IP-Address-Is" tells HTTP 1.1 caches to not cache the header at all. Alternately, if the cache doesn't support the no-cache="" feature, it tells the cache not to cache the entire document. (This also causes the cache to attempt to revalidate the header, which might not be what we want, as Tor doesn't support cache revalidation.)
I don't know enough about how caches typically behave to recommend which ones.
See:
#16205 (moved) - bogus IP address / clock change from authority server
IMO cacheing should probably happen on URLs that are cacheable (ie, consensuses).
But when a cache stores a consensus, it also stores the X-Your-IP-Address header from either the cache's machine, or the original client that asked for the consensus.
Neither of these values of X-Your-IP-Address-Is are valid for any other clients, but caches will typically send them out with the cached response.
What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.
The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.
We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.
Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?
The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.
What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.
I think this is an excellent idea. As the HTTP headers of a naked dirport response are unauthenticated, they can be modified in transit, and we can't know either way.
The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.
A relay believes any directory mirror, not just the authorities. But if it doesn't know its IP address, it will only connect to authorities.
We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.
With the introduction of fallback directory mirrors in 0.2.8 (#15775 (moved)), the extra load for bootstrap begindirs will be shared among 100-250 high-uptime directory mirrors, rather than just the ~9 authorities.
After bootstrap, with the introduction of "dir servers for all" (#12538 (moved)) in 0.2.8, it will be shared among almost all relays.
So I think we can do begindirs for all directory fetches. We might want to fix #17848 (moved) at the same time, otherwise clients and relays won't know if they have an existing connection to a directory server, and load balancing will suffer.
Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?
It has the relay's IPv4 address.
(Although it's somewhat orthogonal, we'd like to have some way for relays to learn their IPv6 addresses, too. This would be somewhat easier to do by adding a HTTP header, rather than changing the format of a NETINFO cell. See #5940 (moved).)
The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.
It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.
Trac: Milestone: Tor: 0.2.8.x-final to Tor: 0.2.9.x-final
What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.
I think this is an excellent idea. As the HTTP headers of a naked dirport response are unauthenticated, they can be modified in transit, and we can't know either way.
The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.
A relay believes any directory mirror, not just the authorities. But if it doesn't know its IP address, it will only connect to authorities.
We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.
With the introduction of fallback directory mirrors in 0.2.8 (#15775 (moved)), the extra load for bootstrap begindirs will be shared among 100-250 high-uptime directory mirrors, rather than just the ~9 authorities.
After bootstrap, with the introduction of "dir servers for all" (#12538 (moved)) in 0.2.8, it will be shared among almost all relays.
So I think we can do begindirs for all directory fetches.
We made clients always use begindir in 0.2.8 in #18483 (moved).
We might want to fix #17848 (moved) at the same time, otherwise clients and relays won't know if they have an existing connection to a directory server, and load balancing will suffer.
Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?
It has the relay's IPv4 address.
(Although it's somewhat orthogonal, we'd like to have some way for relays to learn their IPv6 addresses, too. This would be somewhat easier to do by adding a HTTP header, rather than changing the format of a NETINFO cell. See #5940 (moved).)
Actually, the NETINFO format supports IPv6. If it doesn't work when you connect to a relay's IPv6 ORPort, that's a bug.
The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.
Correcting summary to match current spelling of the header in code: X-Your-Address-Is.
Trac: Username: jryans Summary: Tell caches to remove X-Your-IP-Address-Is from Tor Directory documents to Tell caches to remove X-Your-Address-Is from Tor Directory documents Cc: N/Atojryans@gmail.com
I've used local testing with Chutney to verify that it's working correctly.
We do appear to receive both IPv4 and IPv6 addresses as expected via NETINFO cells. We should also update the control spec to note a new method for IP address changes, but I'll wait until we discuss the approach here first.
Trac: Username: jryans Status: assigned to needs_review