Opened 4 years ago

Last modified 7 months ago

#12377 needs_revision defect

get_interface_address6() behaviour iff all interface addresses are internal

Reported by: rl1987 Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay, address-detection, review-group-34, 034-triage-20180328, 034-removed-20180328
Cc: rl1987@…, catalyst Actual Points:
Parent ID: Points: 1
Reviewer: dgoulet Sponsor:

Description

First, let us assume that all network interfaces for IP host that runs Tor instance are internal as judged by tor_addr_is_internal() function.

There is the following code in get_interface_address6() function.

  /* Try to do this the smart way if possible. */
  if ((addrs = get_interface_addresses_raw(severity))) {
    int rv = -1;
    SMARTLIST_FOREACH_BEGIN(addrs, tor_addr_t *, a) {
      if (family != AF_UNSPEC && family != tor_addr_family(a))
        continue;
      if (tor_addr_is_loopback(a) ||
          tor_addr_is_multicast(a))
        continue;

      tor_addr_copy(addr, a);
      rv = 0;

      /* If we found a non-internal address, declare success.  Otherwise,
       * keep looking. */
      if (!tor_addr_is_internal(a, 0))
        break;
    } SMARTLIST_FOREACH_END(a);

    SMARTLIST_FOREACH(addrs, tor_addr_t *, a, tor_free(a));
    smartlist_free(addrs);
    return rv;
  }

Caller will get the last entry from a interface address smartlist. Is this okay?

Child Tickets

Change History (43)

comment:1 Changed 4 years ago by nickm

Milestone: Tor: 0.2.6.x-final

comment:2 Changed 4 years ago by rl1987

Cc: rl1987@… added

comment:3 Changed 4 years ago by nickm

Keywords: tor-relay 026-triaged-1 026-deferrable added

comment:4 Changed 4 years ago by dgoulet

According to the comment, it's doing it's job but I'm starting to wonder if it's actually used right.

/** Set *<b>addr</b> to the IP address (if any) of whatever interface
 * connects to the Internet. 

In connection.c, function client_check_address_changed(), it's used to get the "last interface ip" (which get_interface_address6() provides) and see if we've seen it before by looking into outgoing_addrs.

Let's assume we have eth0 with one public IPv6 address, 1::cafe

get_interface_address6() will return that address since it fits the criteria and is the first valid public IP we see. So outgoing_addrs is updated with it and we live happily after. Lets add a second public usable IPv6, 2::cafe.

Now calling again get_interface_address6() should return the first valid one which is 1::cafe (considering here alphabetical order) thus disregarding the new IP and making client_check_address_changed() failing to notice.

Lots of pieces here, I might be missing a check thus need review :). If it's true, maybe the solution is to keep a list of all valid IPs of any usable interface and validate our state that way.

comment:5 Changed 4 years ago by nickm

Parent ID: #12376

comment:6 Changed 4 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.7.x-final
Status: newneeds_review

comment:7 Changed 4 years ago by nickm

Status: needs_reviewnew

comment:8 in reply to:  4 Changed 4 years ago by yawning

Replying to dgoulet:

Lots of pieces here, I might be missing a check thus need review :). If it's true, maybe the solution is to keep a list of all valid IPs of any usable interface and validate our state that way.

So if the purpose of the routine is "get the address used to reach the internet", then keeping a list like that isn't a great solution. The "correct" way to do this is non-portable, but there's "correct" ways for all the platforms we care about (and can fall back to the hack as needed...).

Under Linux, you want a rtnetlink(7) socket.
Under *BSD, this information is available from sysctl (net.route.0.0.dump)
Under Windows, you want GetIpForwardTable() (https://msdn.microsoft.com/en-us/library/aa365953%28v=vs.85%29.aspx).

The high level "right" way is to use platform specific calls to dump the routing table, find the default route and the associated interface, and return the interface IP address.

A brief description of how the linux solution would work:

  1. Open a rtnetlink socket (AF_NETLINK, PF_ROUTE).
  2. Query the entire routing table with RTM_GETROUTE, and find the default route and it's associated interface index (RTA_IIF, RTA_OIF).
  3. Query the list of interfaces to get the address RTM_GETADDR. You may be able to get away with looking at the RTA_SRC attribute in the response from 2, it's been a while so I don't remember off the top of my head.
  4. Profit!

It's a decent chunk of code, especially if you do all platforms, but it should be fun to write.

Last edited 4 years ago by yawning (previous) (diff)

comment:9 Changed 4 years ago by nickm

Status: newassigned

comment:10 Changed 4 years ago by nickm

Keywords: 027-triaged-1-out added

Marking triaged-out items from first round of 0.2.7 triage.

comment:11 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.???

Make all non-needs_review, non-needs_revision, 027-triaged-1-out items belong to 0.2.???

comment:12 Changed 4 years ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.7.x-final

Pulling this back into 0.2.7 by request.

comment:13 Changed 4 years ago by nickm

If this is a matter of the code behaving not-too-badly but the tests being not-quite-right, then can we arrange to have the tests detect this situation, and tt_skip() rather than failing?

comment:14 in reply to:  13 Changed 4 years ago by yawning

Replying to nickm:

If this is a matter of the code behaving not-too-badly but the tests being not-quite-right, then can we arrange to have the tests detect this situation, and tt_skip() rather than failing?

It's a matter of the tests being correct, the code being flat out wrong, with the tests being new and always failing because the code is wrong and has always been wrong.

The 2 values the tests are comparing are the "smart way" of getting the external interface address which isn't routing table aware (which breaks triggering test failure), and the kludgy way (open a UDP socket destined for an external host, which the kernel assigns to the correct interface based on the internal routing table) which is correct.

The correct thing to do here would be to make the "determine our address with external connectivity based on enumerating interfaces" be routing table aware, but that requires writing 3 different implementations (at a minimum).

comment:15 Changed 4 years ago by yawning

Till this is fixed (correctly) I suggest merging something like:

https://github.com/Yawning/tor/compare/bug12377_workaround

This disables tests that fail due to this bug. The tests are new, the bug is old. My comment #8 has the correct way to fix get_interface_address6().

comment:16 Changed 4 years ago by yawning

This is an example of how to rewrite all of this code, to do the mostly right thing on Linux (For anything more recent than kernel 2.2 definitions of "more recent"). This supercedes the getifaddr code (which is broken), and is superior to the UDP trick since it doesn't make a random UDP socket.

Adapting it to fit into tor is left as an exercise for the student:
https://gist.github.com/Yawning/c70d804d4b8ae78cc698

The same socket type can also be used to have the kernel tell us when the IP address/routing table changed, which may be useful to some people.

Edit: I lied about leaving it to the student.

NB: Yes the code is ugly, I sort of threw it together. No I will not do Darwin or Windows.

Last edited 4 years ago by yawning (previous) (diff)

comment:17 Changed 4 years ago by yawning

First pass at fixing this, at least for some systems:
https://github.com/Yawning/tor/compare/bug12377_linux

Quirks:

  • Because RTM_GETADDR on Linux sux, it has to dump every single address, of every single interface.
  • It will always return the primary address associated with the interface, not any aliases or non-prefered IPv6 addresses.
  • It uses the presence of a default route to determine which interface's address to return.
  • It only works on Linux for obvious reasons.
  • It carves out a new AF_NETLINK socket per request, and does blocking I/O. Not ideal, but probably ok, since this isn't anything critical path, and something is obviously horrifically broken if the kernel doesn't respond to the inquiries immediately.

This can be improved one step further, by ripping out the poll driven address determination code and just having the kernel send events on an AFN_NETLINK socket, but that's obviously more involved.

TODO: Needs more comments, though how it works should be obvious.

comment:18 Changed 4 years ago by nickm

merged the workaround for now; but let's also get the additional stuff reviewed

comment:19 Changed 3 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:20 Changed 3 years ago by teor

My code in #17027 that blocks all interfaces' publicly routable IPv4 and IPv6 addresses on multihomed exits, uses get_interface_address6_list(), a new function based on refactoring get_interface_address6().

Previous patches will likely need to be manually merged into get_interface_address6_list or get_interface_address6. I am happy to help with this.

comment:21 Changed 3 years ago by nickm

Points: small

comment:22 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final
Status: assignednew

Turn most 0.2.8 "assigned" tickets with no owner into "new" tickets for 0.2.9. Disagree? Find somebody who can do it (maybe you?) and get them to take it on for 0.2.8. :)

comment:23 Changed 2 years ago by isabela

Points: small1

comment:24 Changed 2 years ago by isabela

Keywords: isaremoved added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

comment:25 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:26 Changed 22 months ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:27 Changed 17 months ago by catalyst

Cc: catalyst added
Severity: Normal

comment:28 Changed 17 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:29 Changed 17 months ago by nickm

Keywords: 027-triaged-in added

comment:30 Changed 17 months ago by nickm

Keywords: 027-triaged-in removed

comment:31 Changed 17 months ago by nickm

Keywords: 027-triaged-1-out removed

comment:32 Changed 17 months ago by nickm

Keywords: 026-triaged-1 removed

comment:33 Changed 17 months ago by nickm

Keywords: isaremoved removed

comment:34 Changed 17 months ago by nickm

Keywords: 026-deferrable removed

comment:35 Changed 17 months ago by nickm

Keywords: address-detection added
Status: newneeds_review

comment:36 Changed 8 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.4.x-final

Move all needs_review tickets without a release to 0.3.4

comment:37 Changed 8 months ago by nickm

Keywords: review-group-34 added

comment:38 Changed 7 months ago by dgoulet

Reviewer: dgoulet

Reviewer week 03/16th

comment:39 Changed 7 months ago by dgoulet

Status: needs_reviewmerge_ready

I got this branch to merge in latest master. I modified several things to make it merge and also to our coding standards. Make check is happy now.

I'm happy with this branch. Linux only and I'm also curious to see it pass on Jenkins with BSD/OS X/Windows of this world.

@nickm, I think it would be nice if you go over it at least in terms of code structure/compat layer, if it is satisfying to you.

Branch: ticket12377_034_01
PR: https://github.com/torproject/tor/pull/28

comment:40 Changed 7 months ago by teor

Status: merge_readyneeds_revision
Version: Tor: unspecified

Functional review:

This code breaks the calls to get_interface_address() and get_interface_address6() in resolve_my_address(), because they rely on its promise to "Prefer public IP addresses to internal IP addresses." But this code will return a private IP addres if it is the default route.

I'm also not sure how this code fixes the actual bug we are trying to fix. We already keep a list of interface addresses in client_check_address_changed(), as suggested by dgoulet in comment 4.

One simple way to make the existing address check more reliable is to make get_interface_address6() return the *first* internal address in the list, not the *last* internal address. We should make this change anyway, because it will benefit macOS, BSD, and Windows.

Design review:

I don't understand the overall design of this API.

If netlink works, we should use it to produce an ordered list of IP addresses, which has the default gateway as the first entry. Then we should use the first entry when a single address is requested, and all entries when a list of addresses is requested. The existing code already implements this logic for the other address resolution methods. To use the existing code, we need to make the netlink code return a list, and use that list as the first alternative in get_interface_addresses_raw().

Also, this code only deals with the Linux case. Please open tickets for BSD/macOS and Windows functionality if you want to close this ticket.

Merging this code may break some relay address configs, because it changes address resolution order to use netlink rather than the first address returned by getifaddrs(). So we need to write a note to relay operators at the top of the 0.3.4 release notes.

Code review:

Why check the __linux__ macro, when you can just check for the relevant headers?

Some of the functions are missing comments.

This code never changes the value of ret, the logic is probably inverted. Please check for other instances of this bug.

if (ret == 0 && result.found)
    ret = 0;

comment:41 Changed 7 months ago by nickm

Keywords: 034-triage-20180328 added

comment:42 Changed 7 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:43 Changed 7 months ago by dgoulet

Milestone: Tor: 0.3.4.x-finalTor: unspecified

I'm moving this to Unspecified. Yawning is the original author of this patch so unless he picks it up or someone else, we have no actionable items for now :S ... I also don't think this is super critical to get in.

Note: See TracTickets for help on using tickets.