Opened 10 years ago

Closed 7 years ago

Last modified 7 years ago

#933 closed enhancement (fixed)

MapAddress for domains?

Reported by: Aleph0 Owned by: mwenge
Priority: Medium Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.3.16-alpha
Severity: Keywords:
Cc: Aleph0, nickm, mwenge, grarpamp Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

Would it be non-trivial to have the MapAddress option behave like TrackHostExits? That is, if an entry is like
MapAddress .rapidshare.com .rapidshare.com.exitpoint.exit
have it applied to rs1xx.rapidshare.com, rs223dt.rapidshare.com, rs28tl2.rapidshare.com...

I already have such a line in my TORRC file; on start Tor doesn't complain, but apparently that line doesn't have any
effect.

My goal is to have all connections to rapidshare.com exit through the same exit point (that I've already defined with
a separate MapAddress line for rapidshare.com), regardless if I'm redirected.

The alternative would be to add 5000-odd MapAddress lines specifying every possible hostname given Rapidshare's
current naming convention (rs[0-9]+[a-z]+[1-9]?.rapidshare.com) , but I don't know how Tor would handle such a big
config file; my connections drop (flaky company proxy) often enough already...

I already have TrackHostExits enabled for both rapidshare.com and .rapidshare.com, so when rapidshare.com redirects
me to whatever.rapidshare.com Tor will reuse the existing circuit.

If the connection drops halfway, my download manager will remember having been redirected and will retry on
whatever.rapidshare.com. However, if it takes more than TrackHostExitsExpire seconds, Tor will assign this stream to
a new circuit with a different exit point (and Rapidshare will claim I've shared my account and lock it... T_T ).

Thanks for your consideration.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (31)

comment:1 Changed 10 years ago by nickm

Sounds like a neat idea; if anybody wants to do this in 0.2.2.x, it would be a useful thing.

comment:2 Changed 9 years ago by mwenge

Cc: mwenge added
Owner: set to mwenge
Status: newassigned

comment:3 Changed 9 years ago by mwenge

Status: assignedneeds_review

comment:4 Changed 9 years ago by nickm

Description: modified (diff)

Looks neat! Will we ever want to match anything in a form other than "Replace a suffix starting with a dot with this other suffix starting with a dot?" I don't think so.

The doxygen comment on addressmap_match_superdomains() doesn't actually say what it does with the match if it finds one. It should take a const char *, not a char *. Also, it should probably return 0 rather than 1.

The check for ".exit" in addressmap_rewrite seems off. There is no real reason to disallow "MapAddress .com .com.big-web-cache.com", even if "big-web-cache.com" doesn't end in exit. Also, Why not allow "MapAddress . .everythinggoesoverthis.exit" ?

IOW, I am proposing that the option syntaxes should be more like:

MapAddress a.b.c d.e.f # This is what we have now
MapAddress .a.b.c d.e.f # Replaces any address ending with .a.b.c with d.e.f
MapAddress .a.b.c .d.e.f # Replaces the .a.b.c at the end of any addr with .d.e.f

So in the first case, "x.a.b.c" is unaffected, in the second case "x.a.b.c" turns into "d.e.f", and in the third case "x.a.b.c" turns into "x.d.e.f". No special handling for .exit is needed.

Parenthetically, do you think this "prefix dot" business is too subtle? Maybe the syntax should be "*.a.b.c" instead of ".a.b.c". Do you think that would confuse people less?

Throughout the code, it seems fragile to have the strings in addressmap_entry_t have a special meaning when the key/value starts with ".". Maybe it would be better to strip the "." when generating them and instead have a flag field added to addressmap_entry_t to indicate whether it should be a suffix match or an exact match?

comment:5 Changed 9 years ago by nickm

Milestone: post 0.2.1.xTor: 0.2.3.x-final

Marking this for 0.2.3.x-final. There's code; we should fix it up and merge it.

comment:6 Changed 8 years ago by mwenge

I will address nick's comments over the next week or two. Would enjoy finishing this patch off.

comment:7 in reply to:  4 Changed 8 years ago by mwenge

Replying to nickm:

Looks neat! Will we ever want to match anything in a form other than "Replace a suffix starting with a dot with this other suffix starting with a dot?" I don't think so.

The doxygen comment on addressmap_match_superdomains() doesn't actually say what it does with the match if it finds one. It should take a const char *, not a char *. Also, it should probably return 0 rather than 1.

I fixed this. But am not passing address as const - since I now need to alter it if the from and to addresses are wildcarded.

The check for ".exit" in addressmap_rewrite seems off. There is no real reason to disallow "MapAddress .com .com.big-web-cache.com", even if "big-web-cache.com" doesn't end in exit.


Yes, got rid of this.

Also, Why not allow "MapAddress . .everythinggoesoverthis.exit" ?

Mm, missed this. My feeling was that this configuration is more likely to be a typo than anything else.

IOW, I am proposing that the option syntaxes should be more like:

MapAddress a.b.c d.e.f # This is what we have now
MapAddress .a.b.c d.e.f # Replaces any address ending with .a.b.c with d.e.f
MapAddress .a.b.c .d.e.f # Replaces the .a.b.c at the end of any addr with .d.e.f

Implemented this.

So in the first case, "x.a.b.c" is unaffected, in the second case "x.a.b.c" turns into "d.e.f", and in the third case "x.a.b.c" turns into "x.d.e.f". No special handling for .exit is needed.

Parenthetically, do you think this "prefix dot" business is too subtle? Maybe the syntax should be "*.a.b.c" instead of ".a.b.c". Do you think that would confuse people less?

Syntax now allows a wildcard.

Throughout the code, it seems fragile to have the strings in addressmap_entry_t have a special meaning when the key/value starts with ".". Maybe it would be better to strip the "." when generating them and instead have a flag field added to addressmap_entry_t to indicate whether it should be a suffix match or an exact match?

I kind of did this. I haven't stripped the leading '.' from the 'to' mapping but do use an is_wildcard flag to avoid repeated byte-compares.

comment:8 Changed 8 years ago by nickm

Almost there!

The code to handle the * case of config_register_addressmaps seems to allow too much: It skips over the * starting "*example.com", not just the * starting "*.example.com".

I think that the current syntax is still too hard to read: Having .mit.edu be distinct from mit.edu will IMO be too easy for people to screw up. Can't we just have the * be mandatory, so that "a.b.c" stays the same, but "." becomes "*" and ".example.com" becomes "*.example.com"?

Now that we can have more than one rule apply to the same address, we should be really explicit in the documentation about what we do when that happens. I think the rule is, "If any rule applies, we apply whichever rule was listed first, then check again and apply the one listed first, and so on." Is that right? (Do we have a unit test for this case?)

Minor style:

  • We initialize null pointers to NULL, not 0. ( addressmap_match_superdomains)

comment:9 Changed 8 years ago by Sebastian

Status: needs_reviewassigned

Moving out of needs review, wants an updated patch.

comment:10 in reply to:  8 Changed 8 years ago by mwenge

Replying to nickm:

Almost there!

The code to handle the * case of config_register_addressmaps seems to allow too much: It skips over the * starting "*example.com", not just the * starting "*.example.com".

Yes it does, fixed.

I think that the current syntax is still too hard to read: Having .mit.edu be distinct from mit.edu will IMO be too easy for people to screw up. Can't we just have the * be mandatory, so that "a.b.c" stays the same, but "." becomes "*" and ".example.com" becomes "*.example.com"?

OK - expressions with '.example.com' are now ignored.

Now that we can have more than one rule apply to the same address, we should be really explicit in the documentation about what we do when that happens. I think the rule is, "If any rule applies, we apply whichever rule was listed first, then check again and apply the one listed first, and so on." Is that right? (Do we have a unit test for this case?)

Added a few notes to the man page and made the test for the recursion more explicit.

Minor style:

  • We initialize null pointers to NULL, not 0. ( addressmap_match_superdomains)

Fixed.

comment:12 Changed 8 years ago by grarpamp

It would also be useful to be able to map FQDN's, CIDR's and IP
lists to "A list of identity fingerprints, nicknames, country
codes and address patterns of nodes" as in ExitNodes. Such as for
sites or services that only allow access from say address space in
the US, ie: {us}. Round robin within the resulting set of nodes
would be fine, and with NEWNYM doing its thing within the set.

comment:13 Changed 8 years ago by nickm

Status: assignedneeds_review

Throwing this into needs_review where it belongs. whoops

comment:14 Changed 8 years ago by nickm

Okay, here's what needs to get done IMO:

  • The manpage needs to be made to build right.
  • The lookup algorithm needs to stop being O(N) in the number of addressmap entries: if we're scanning over the whole strmap, there's no point in having a strmap, after all.
  • I think the *. prefix needs to be mandatory.

I'm rebasing the branch onto master to get conflicts out of the way early, then hacking those things up now.

comment:15 Changed 8 years ago by nickm

Oh; also the strstr in addressmap_match_superdomains was broken: It would find the first matching string, not the last.

comment:16 Changed 8 years ago by nickm

And it looks like the *. prefix _was_ mandatory; no need for changes there.

I have a version of this now in bug933_nm_rebased ; another person should probably have a look at it before I merge it.

comment:17 Changed 8 years ago by grarpamp

Who would I have to bribe (heh) to get at least the simple
CIDR notation (as described in the links above) added to this?
1.2.3.4/18 1.2.3.4/18.fingerprint.exit

comment:18 Changed 8 years ago by nickm

grarpamp -- fist off, I'd suggest opening another ticket. This one is for domains, honest. We've had really bad luck in the past with keeping a ticket open but changing what it means after the original issue is resolved. Then you'd need to figure out (or somebody else would figure out) how this would interact with DNS resolution. :)

comment:19 Changed 8 years ago by grarpamp

Oh, I though someone from Tor suggested I put CIDR here a year or so ago, ok, thx, will do.

comment:20 Changed 8 years ago by grarpamp

Forked to: MAPADDRESS for IP ranges (CIDR, etc)
https://trac.torproject.org/projects/tor/ticket/3982

comment:21 Changed 8 years ago by arma

The changes file talks about ".torproject.org" rather than "*.torproject.org"

man page line says "transform to" (double space)

style issue, we prefer spaces between args, e.g. "ADDRMAPSRC_TRACKEXIT,0,0" has no room to breathe.

comment:22 Changed 8 years ago by arma

Does "mapaddress *.torproject.org torproject.org" cause addressmap_register() to receive address and new_address both as "torproject.org" and thus trigger this clause:

 * If <b>new_address</b> is NULL, or equal to <b>address</b>, remove
 * any mappings that exist from <b>address</b>.

?

comment:23 Changed 8 years ago by nickm

Just pushed fixes for known issues above. The branch needs another autosquash before merge. Review if you have time?

comment:24 Changed 7 years ago by nickm

Resolution: Noneimplemented
Status: needs_reviewclosed

Tested more, reviewed again, merged at last!

comment:25 Changed 7 years ago by mr-4

Having just been made aware of this feature (thanks Nick!), does this match cover only the beginning portion of the address, or is this sort of standard for the whole address?

In other words, does "node*.*.com" matches "node12.domain.com", "nodeA.another-domain.com" etc?

Another question related to that: is this a "standard" matching where I could use "?" to match a single character/symbol as well or is this not allowed/implemented?

One last question: could this be expanded to cover RegEx-specified address?

Thanks!

comment:26 in reply to:  25 Changed 7 years ago by nickm

Replying to mr-4:

Having just been made aware of this feature (thanks Nick!), does this match cover only the beginning portion of the address, or is this sort of standard for the whole address?

In other words, does "node*.*.com" matches "node12.domain.com", "nodeA.another-domain.com" etc?

No; it only supports matching whole names. *.domain.com matches domain.com, xyz.domain.com, and w.xyz.domain.com.

Another question related to that: is this a "standard" matching where I could use "?" to match a single character/symbol as well or is this not allowed/implemented?

No.

One last question: could this be expanded to cover RegEx-specified address?

Probably, though that would be another ticket, and we'd probably want to see a compelling use case.

comment:27 Changed 7 years ago by grarpamp

Priority: minornormal
Resolution: implemented
Status: closedreopened
Version: 0.2.0.33Tor: 0.2.3.16-alpha

MapAddress wildcarding does not seem to be working right.
Note the following three issues via the controller...

1) When using the man documented format to enter the map:

mapaddress *.example.com=*.example.com.<fingerprint>.exit
512 syntax error: invalid address '*.example.com.<fingerprint>.exit'

With no map, I did not check the streams.

2) When chancing upon an alternate format that returns 250:

mapaddress *.example.com=<fingerprint>.exit
250 *.example.com=<fingerprint>.exit

This results in these streams. Which appear to leak badly in that
the first stream to a destination does not honor the exit.

172506 NEW 0 www.example.com:443 PURPOSE=USER
172507 NEW 0 www.example.com:443 PURPOSE=USER
172506 SUCCEEDED 121217 64.34.110.174:443
172507 SUCCEEDED 121219 64.34.110.174.<fingerprint>.exit:443

Note that entering the following related map syntax does not
SUCCEED, which is weird given the above streams:

mapaddress foo.bar.com=<fingerprint>.exit

This one does, which is documented expectation:

mapaddress foo.bar.com=foo.bar.com.<fingerprint>.exit

3) The map in (2) is not removable by the syntax of setting the
source (left) equal to (=) the destination (right):

mapaddress *.example.com=*.example.com
512 syntax error: invalid address '*.example.com'


Seems the solution may/should be...

  • Adopt the syntax documented in (1), removing that acting in (2).
  • Fix the stream issue shown in (2).
  • Allow removal by syntax in (3).
  • Document the map removal syntax in the control spec page.
  • Check that removal from torrc and sighup works (as that is the

form of the various options in the man page).

  • Lint the inputted map to reject '=<fingerprint>.exit' form.

comment:28 Changed 7 years ago by grarpamp

Adding applicable environment...

trackhostexits .
trackhostexitsexpire 604800
allowdotexit 1
signal NEWNYM

comment:29 Changed 7 years ago by nickm_mobile

Could you see how much of this you can reproduce on the current Tor maint-0.2.3 branch?

In 0.2.3.17-beta, we fixed #3490, which was related to MaPAddress and .exit. That introduced/exposed another bug in that code, #6211, which should be fixed in git. It's quite possible there are still more bugs, but 0.2.3.16-alpha is not a great place to try to find them.

Also, when you find a bug in a feature, it's easier on my workflow if you open a new ticket rather than reopening the "implement the feature" ticket. Otherwise it's hard to use the bugtracker to, well, track bugs.

comment:30 Changed 7 years ago by grarpamp

Resolution: fixed
Status: reopenedclosed

comment:31 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.