Opened 7 months ago

Closed 5 months ago

#25055 closed defect (implemented)

string_is_valid_hostname() returns true for IPv4 addresses

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version: Tor: 0.3.2.1-alpha
Severity: Normal Keywords: IPv6, IPv4, tor-dns, 032-backport, 033-must, review-group-34, 033-triage-20180320, 033-included-20180320
Cc: Actual Points:
Parent ID: #25036 Points: 1
Reviewer: mikeperry Sponsor:

Description

If it is meant to return true for all IP addresses, it should return true for IPv6 as well.
If it's meant to return true just for hostnames, it should reject IPv4 addresses.

a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic

https://tools.ietf.org/html/rfc1123#page-13

Then we should check every use of this function, to make sure it is being used the right way.

Child Tickets

Change History (33)

comment:1 Changed 7 months ago by arma

It is meant to return true for everything that exits will let you (try to) connect to.

Maybe that means changing the word "hostname" to a word that doesn't make some people think it can't be an IP address.

It also sounds like allowing ipv6 addresses is the right thing to do.

comment:2 in reply to:  1 Changed 7 months ago by arma

Replying to arma:

Maybe that means changing the word "hostname" to a word that doesn't make some people think it can't be an IP address.

string_is_valid_destination_name()?

comment:3 in reply to:  1 Changed 7 months ago by dgoulet

Replying to arma:

It is meant to return true for everything that exits will let you (try to) connect to.

Maybe that means changing the word "hostname" to a word that doesn't make some people think it can't be an IP address.

Hmmm the way I understand it on how it is used in the code, it can *not* take an IP address. It has to be a DNS name. Also, every unit tests for this function is testing for a name, never an address.

It also sounds like allowing ipv6 addresses is the right thing to do.

So this function is doing what it should imo except returning true for an IPv4. One simple test would be to pass the value through inet_pton and if it returns -1, we know we do *not* have an IP address?

We have specific functions to check for an IPv4/IPv6, we should use those.

So maybe what we need is new function that does what arma proposes that is anything valid that you can connect to (DNS name + IP) ?

Last edited 7 months ago by dgoulet (previous) (diff)

comment:5 Changed 6 months ago by teor

Status: needs_reviewneeds_revision

Please update the function comment for string_is_valid_hostname(), to say that it only accepts hostnames that end in an alphabetical TLD.
Comment the two blocks within the inner loop, and explain why they're different.

Add unit tests for alphabetical and non-alphabetical TLDs.
Add a unit test containing a punycode TLD, and modify the function to accept it, and modify the function comment. (Hint: the TLD can contain dashes.)

Explain why it's ok to use ctype.h, because the previous code didn't.
Do we use it elsewhere in Tor?
Is it available and consistent on all supported platforms?
Is this code called when parsing votes or creating the consensus?

comment:6 Changed 6 months ago by teor

Please don't use ctype.h, use TOR_ISDIGIT() and friends instead.

comment:7 Changed 6 months ago by rl1987

Status: needs_revisionneeds_review

comment:8 Changed 6 months ago by rl1987

I believe the above is addressed now.

comment:9 Changed 6 months ago by teor

I just read RFC 1123 Section 2.1.
https://tools.ietf.org/html/rfc1123#page-13

I was wrong, it doesn't say that the TLD must be alphabetical:

   // Last label of a hostname is required to be alphabetic according to
   // RFC 1123 Section 2.1.

Instead, it says:

The syntax of a legal Internet host name was specified in RFC-952
      [DNS:4].  One aspect of host name syntax is hereby changed: the
      restriction on the first character is relaxed to allow either a
      letter or a digit.  Host software MUST support this more liberal
      syntax.

...

      Whenever a user inputs the identity of an Internet host, it SHOULD
      be possible to enter either (1) a host domain name or (2) an IP
      address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
      the string syntactically for a dotted-decimal number before
      looking it up in the Domain Name System.

Therefore, rather than trying to detect punycode or alphabetical endings, let's use this logic:

  1. If it is a valid IPv4 or IPv6 address, it is not a hostname
  2. Otherwise, if it has alphanumeric, -, or _, in the right format, it is a hostname.

Being lax makes Tor more future proof to domain name format changes.
And it confirms to the RFC.

comment:10 Changed 6 months ago by teor

Status: needs_reviewneeds_revision

comment:11 in reply to:  9 Changed 6 months ago by arma

Replying to teor:

  1. If it is a valid IPv4 or IPv6 address, it is not a hostname

I don't understand why we're making this complicated.

This string_is_valid_hostname() thing is only used in our socks handshake, right?

If we define hostname to be "anything that is a valid destination", then everything becomes simple, right?

comment:12 Changed 6 months ago by teor

As long as the code works, I don't mind how it's done.

But let's not break as soon as IETF slightly changes the IDN or other DNS formats, please.

comment:13 in reply to:  9 Changed 6 months ago by rl1987

Therefore, rather than trying to detect punycode or alphabetical endings, let's use this logic:

  1. If it is a valid IPv4 or IPv6 address, it is not a hostname
  2. Otherwise, if it has alphanumeric, -, or _, in the right format, it is a hostname.

Being lax makes Tor more future proof to domain name format changes.
And it confirms to the RFC.

Technically IP address strings are not hostnames, but we do allow them to be used in SOCKS requests. That's why I introduced string_is_valid_dest that checks destination validity by branching out to tor_inet_pton and string_is_valid_hostname. We want string_is_valid_hostname to do exactly what the function name says - DNS hostname validation.

I think supporting punycode TLDs is a good idea. Currently all entries in of valid TLDs [0] are either alphabetic or punycode.

[0] http://data.iana.org/TLD/tlds-alpha-by-domain.txt

comment:14 Changed 6 months ago by rl1987

What am I missing? What are the action points to get this finished?

comment:15 in reply to:  14 Changed 6 months ago by teor

Replying to rl1987:

What am I missing? What are the action points to get this finished?

Replying to teor:

let's not break as soon as IETF slightly changes the IDN or other DNS formats, please.

Please make the hostname validation check for alphanumeric, hyphen, and underscore in every component.
Do not check for the precise punycode format as of 2018, that's fragile, and will break if the IDN format is changed, or a new DNS format is added.

If you wish, you can exclude IPv4 addresses, which would otherwise match the new hostname check.

comment:16 Changed 6 months ago by rl1987

Status: needs_revisionneeds_review

Please take one more look. If it's good now, I'll create new branch and cleanup git history.

comment:17 Changed 6 months ago by teor

Keywords: 032-backport must-033 added
Milestone: Tor: 0.3.4.x-finalTor: 0.3.3.x-final
Status: needs_reviewneeds_revision
Version: Tor: 0.3.2.1-alpha

The code looks good.

Please keep the unit tests from the two commits you reverted.
Punycode should pass, as should partially numeric TLDs.
You can add a test for a fully numeric TLD if you'd like, too.

Once you've cleaned up the branch and added the unit tests, please flip this ticket into merge_ready.

comment:18 Changed 6 months ago by teor

Parent ID: #25036

comment:19 Changed 6 months ago by rl1987

Status: needs_revisionmerge_ready

comment:20 Changed 6 months ago by nickm

Status: merge_readyneeds_revision

The tor_assert(len > 0) check in string_is_valid_dest() makes me quite nervous. There's nothing in the documentation for that function that says you can't call it with "" as an argument. It would make much more sense to me if the function just returns 0 when len == 0.

I don't understand why you changed the check do {} while loop from result to result > 0.

comment:21 Changed 6 months ago by nickm

Keywords: 033-must added; must-033 removed

s/must-033/033-must/

comment:22 Changed 6 months ago by rl1987

Status: needs_revisionneeds_review

Fixed the above two issues.

comment:23 Changed 6 months ago by nickm

   size_t len = strlen(string);
 
-  tor_assert(string);
-  tor_assert(len > 0);
+  if (string == NULL)
+    return 0;
+
+  len = strlen(string);

Looks like you're doing strlen(string) before you check it for NULL, and after.

comment:24 Changed 6 months ago by rl1987

Fixed.

comment:25 Changed 6 months ago by nickm

Keywords: review-group-34 added

comment:26 Changed 5 months ago by dgoulet

Reviewer: mikeperry

Reviewer week 03/19th

comment:27 Changed 5 months ago by nickm

Keywords: 033-triage-20180320 added

Marking all tickets reached by current round of 033 triage.

comment:28 Changed 5 months ago by nickm

Keywords: 033-included-20180320 added

Mark 033-must tickets as triaged-in for 0.3.3

comment:29 Changed 5 months ago by mikeperry

Status: needs_reviewmerge_ready

Hrmm. Downside of tons of reviewers: I have a weak preference that the hostname be treated strictly. I think that permitting more things on the socksport itself is fine, but that our function names should reflect current RFC notions, and not leave wiggle room for future potential changes, since in other cases uses of this function may end up causing bugs. This would mean that string_is_valid_hostname() would enforce full RFC hostname strictness, but then string_is_valid_dest() would or that together with string_is_valid_ipv4, string_is_valid_ipv6, and another string_is_valid_extra()...

But my preference is only a weak one. Everything else about this code looks good to me now. I'm going to mark this merge_ready. If someone else decides to agree with me, feel free to change to needs_revision.

comment:30 Changed 5 months ago by nickm

mikeperry -- how would you feel if I merged this, but renamed string_is_valid_hostname to string_is_valid_socks_argument?

comment:31 Changed 5 months ago by nickm

or possibly string_is_valid_hostname_or_ip

comment:32 Changed 5 months ago by mikeperry

nickm: Naming this appropriately is a good idea. I like string_is_valid_nonrfc_hostname() and a warning comment. Technically the function allows things like luck.y13 (which is explicitly tested in the unit tests), but that is not any kind of valid DNS name or IP.

comment:33 Changed 5 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

Rebased onto maint-0.3.3, made the change (in b504c854d34a938943f68c3036840f10a84fcea4) and merged forward.

I also added a redundant check in d4bf1f6c8eb08c39def69c839515afe475bf0a6b because I am afraid of C. ;)

Note: See TracTickets for help on using tickets.