If any of these third-party libraries happen to work, then rewriting tor-fw-helper itself is an easy job. We could do #5213 (closed) and make flash proxy immediately more useful (flashproxy-client already knows how to use tor-fw-helper thanks to #9033 (closed)).
Trac: Summary: go versions of the upnp / pmp libraries? to Rewrite tor-fw-helper in Go (or another memory-safe language) Parent: N/Ato#5213 (closed) Keywords: N/Adeleted, flashproxy added
This is something I've been meaning to do, so I'll take the ticket.
As a side note NAT-PMP is "tricky" due to the spec not having a way to discover the router. This isn't "hard", but Go doesn't have ways to do it by default and pulling the default route from the routing table isn't something that's portable, so it would probably require either fork/exec or native code since the required platform dependent magic requried here isn't possible to write in pure Go.
uPNP has a discovery mechanism as part of the protocol, so the initial version will target that.
Trac: Status: new to assigned Owner: N/Ato yawning
I wrote my own UPnP client, because of licensing, code quality/auditability concerns, and not-invented-here reasons.
Works as far as I can tell, but flashproxy will need code changes because it assumes leases are permanet. I could change the lease time to "1 week" (max allowed in UPnP 1.1, 1.0 allows permanent), but certain routers exhibit really broken behavior when the UPnP port mapping table gets filled up (in some cases, requiring a factory reset). The lease time was chosen somewhat arbitrarily based around how tor invokes tor-fw-helper.
NAT-PMP is not supported yet. It turns out that the Go runtime supports netlink sockets, and I have Go code to query the default route's gateway on Linux (that works), but I don't have a router that supports NAT-PMP yet. Windows will more than likely require calling into native code, and I haven't looked at what Darwin will require here yet.
I had to disable ufw on my local box to test it. For the UDP multicast based discovery process I currently bind to a random port, but maybe this should be fixed to make it easier for users to deal with "local firewall blocks the M-SEARCH responses".
Next steps from me would be going and buying an AirPort base station so I can test NAT-PMP, and getting it working at least on Linux. I might be able to also do Darwin as long as "do what you would do on FreeBSD" is how you query the routing table. Not sure how much of a deployment blocker NAT-PMP support is, since our current alternative is "scary library code".
One of the reasons that we wrote tor-fw-helper was to have NAT-PMP support. It is a very simple protocol. It would really be a shame if after all these years, we simply returned to what we had with Vidalia many years ago.
One of the reasons that we wrote tor-fw-helper was to have NAT-PMP support. It is a very simple protocol. It would really be a shame if after all these years, we simply returned to what we had with Vidalia many years ago.
Indeed, the protocol is simple. Which is why it says "not supported yet". I could have blindly plugged in the random third party library and said "well, this should work on Linux, but I haven't tested it", but I didn't because that would be a shitty thing to do. What would be even more of a shame is if "Yawning hasn't bought a base station, installed a FreeBSD VM, written/tested the PF_ROUTE code, blindly written the Windows GetIpForwardTable() code and found a victim to test it, yet" stalled things indefinitely or kept me from posting about progress.
Ok, I have a router that supports NAT-PMP, so I'm writing the client code now. I also found out some interesting things about miniupnpd, and fixed "compatibility with broken non-RFC compliant HTTP servers in existing routers" in the uPNP code[0].
The NAT-PMP RFC makes threatening noises about uPNP implementations that crash if the lease duration isn't set to 0, so I'll probably go change the code to always issue "indefinite" leases when talking uPNP 1.0, even though using it like how flashproxy wants to (randomized port) will clog up the uPNP lease table and cause catastrophic failure modes in other uPNP implementations[1]. I figure eventual failure is better than immediate, but there is no good answer here apart from (only use one port, pray) given the current architecture.
I'm starting to think that the whole "call a helper once in a while to extend the lease" isn't that great of a design in view of how broken all the various uPNP routers out there are (especially given that the original tor-fw-helper has no method of cleaning up existing leases), and that this would be much better off as a daemon, but going to such an architecture makes the code considerably more complex (as it needs to be resilient to router reboots), but fixing that is a longer term thing that can wait till after feature parity has been reached.
Ok, I just pushed NAT-PMP support, since the protocol is trivial. It only works on Linux at the moment because:
Clients always send their NAT-PMP requests to their default gateway,as learned via DHCP [RFC2131], or similar means.
Dumping the routing table is not portable, and currently only the Linux code to pull the default gateway from the routing table is written. This does not mean I don't plan on doing it, just that I haven't made it work on other platforms YET.
I went and changed the UPnP lease duration to "0" which will has the potential to make UPnP unusable after a while due to the table getting full, but it beats crashing routers as soon as the request is sent. NAT-PMP uses 7200 second (2 hour) leases as recommended in the RFC.
TODO:
Add verbose logging to the NAT-PMP implementation, and handle when the router changes the external port (on conflict).
Write getGateway() for *BSD and Windows.
Add command line options for purging entries, so people can attempt to clean up the mess that using this with UPnP makes (the original has the same problem).
General code cleanups, though it should be legible and at least appears to work.
I went and wrote code to deal with NAT-PMP routers returning a different external port than requested since it's allowed by the spec and happens in the wild.
Turns out miniupnpd crashes when you try to remove mappings via NAT-PMP (Removing them via UPnP works fine). I'm not sure if it's a quirk with the version on my test device or a freak accident, but this doesn't look good for enhancing our helper to allow for any sort of graceful cleanup, at least when NAT-PMP is in the picture (Since I assume crashing NAT-PMP stacks running on routers in the wild falls under "Unacceptable Behavior").
At least stale NAT-PMP mappings only persist for 2 hours...
(Do we really want to deal with angry users who have shitty router firmware that misbehaves in interesting ways from a support perspective?)
Turns out miniupnpd crashes when you try to remove mappings via NAT-PMP (Removing them via UPnP works fine). I'm not sure if it's a quirk with the version on my test device or a freak accident, but this doesn't look good for enhancing our helper to allow for any sort of graceful cleanup, at least when NAT-PMP is in the picture (Since I assume crashing NAT-PMP stacks running on routers in the wild falls under "Unacceptable Behavior").
Looked into it further, still not sure if crashing is going to be a consistent thing, but deleting mappings is broken. What the RFC says:
A client MAY also send an explicit packet to request deletion of a mapping that is no longer needed. A client requests explicit deletion of a mapping by sending a message to the NAT gateway requesting the mapping, with the Requested Lifetime in Seconds set to zero. The Suggested External Port MUST be set to zero by the client on sending, and MUST be ignored by the gateway on reception.
So a correct "delete this specific mapping yo" request has "Internal Port" set, "Suggested External Port" set to 0, and "Requested Port Mapping Lifetime in Seconds" set to 0. Simple enough, the router is supposed to have a table of mappings based on the client address/ports.
if(eport==0) eport = iport; /* TODO: accept port mapping if iport ok but eport not ok * (and set eport correctly) */ if(lifetime == 0) { /* remove the mapping */ if(iport == 0) { /* Yawning: Not taken */ } else { /* To improve the interworking between nat-pmp and * UPnP, we should check that we remove only NAT-PMP * mappings */ r = _upnp_delete_redir(eport, proto); /*syslog(LOG_DEBUG, "%hu %d r=%d", eport, proto, r);*/ if(r<0) { syslog(LOG_ERR, "Failed to remove NAT-PMP mapping eport %hu, protocol %s", eport, (proto==IPPROTO_TCP)?"TCP":"UDP"); resp[3] = 2; /* Not Authorized/Refused */ } } eport = 0; /* to indicate correct removing of port mapping */
So, sending RFC compliant removal requests means that miniupnpd will misbehave at best the moment the interal port is not the external port (as likely in the use case that caused me to look into this). I assume what is happening is that _upnp_delete_redir() with a bogus external port (since the daemon is just assuming it's identical to the internal port) is blowing up, but even if the stack didn't crash, the mapping won't get removed.
Can't reproduce the crash on my test device, oh well.
The code's still incorrect, and I'm not going to take "the RFC is new" as an excuse here since looking at the first IETF draft (dating back to 2005), the correct behavior there is identical to what is in the RFC.
Yay, miniupnp's master has the bug fixed. Too bad we can't rely on that being what our end users have.
Pushed a few more changes to go-fw-helper, to support:
Verbose logging when using NAT-PMP.
Dumping the list of port forwarding entries (UPnP only).
Removing port forwarding entries (UPnP only, because the bug in "old" miniupnpd makes removal unsafe).
TODO:
BSD/Windows support for getGateway().
Code cleanups.
What is in master now should be useable by most people though, because NAT-PMP is an Apple-ism. Unless someone tells me otherwise, I will at least make NAT-PMP work on *BSD, and leave writing the one function to allow it to work on Windows to someone who actually has a Windows box.
Pushed more changes. Most notable is that NAT-PMP should work on all of the BSDs assuming that it's possible to dump the routing table via sysctl. I tested the code on FreeBSD 9.3-p3, so I'm fairly confident it works on Darwin, which is the main driving factor behind even supporting that OS in the first place.
TODO:
Windows implementation of getGateway().
Shed a single tear as no one does the deployment side, this codebase gets forgotten and we are still without a firewall helper.
I'll do the Windows part today sometime. Go figure the Windows box I acquired out of a bin for $100 is a bit on the slow side, and updating the OS has taken all night so far. It should take longer to get a Go compiler and MinGW on the thing than it will to write and debug the code.
flashproxy needs a code change to invoke the helper at least once every 2 hours since that's the lowest lease time go-fw-helper can end up requesting (when NAT-PMP is used).
The standard deterministic build integration needs to be done in the Tor Browser descriptors, though this is easy-ish since we already ship go code and there are no additional external dependencies.
The default torrc shipped with Tor Browser needs to change to tell flashproxy about the helper.
For people running relays:
No idea. I guess packages for distributions? Expert bundles?
I was kind of being facetious when I made the comment since one of dcf and myself will make it work for PTs. There's a few nasty-ish caveats that go-fw-helper inherited from tor-fw-helper related to not having a application code side supported mechanism for cleaning up leases that we need to think about as well.
On a positive note, I debugged/tested the Windows getGateway() implementation I drycoded while waiting for the craptop to update itself, and it works, though the error that GetBestRoute() displays when there's no network connection isn't useful (But it does error out properly so, meh).
I also caught a "does not compile on 32 bit systems" issue since I tested on 32 bit Win 7.
Tenatively setting as needs-review since NAT-PMP works/should work on all the platforms we ship bundles for now (I say should because I only tested on FreeBSD and not Darwin).
Here are my comments and questions from reading the code. Aside, it's stupendous what you've singlehandedly accomplished with this code.
Typo: "failed to external address"
$ ./go-fw-helper -v -l...V: NAT-PMP: failed to external address: connection timed out
It's nice if natclient.New's comment describes legal values for protocol; i.e. "UPnP" and "NAT-PMP", and that "unspecified" means an empty string. On that note, main.go has
protocol := "auto"
But "auto" appears not to have any special meaning, and anyway it is overriden with an empty string by flag.StringVar.
opRespOffset = 128 ... if h.op != opExternalAddress+opRespOffset {
I think it would be more clear that you are checking the most-significant bit if it were
opRespFlag = 0x80 ... if h.op != opExternalAddress|opRespFlag {
Why does issueRequest return interface{} instead of *requestMappingResp? It looks like it can return some raw byte contents if an unexpected response is received, is that weird?
What is this stuff at the bottom of some files?
var _ packetReq = (*externalAddressReq)(nil)var _ packetReq = (*requestMappingReq)(nil)
What happens in natclient/natpmp/getgateway_windows.go if the primary route is IPv6? It looks like it at least won't crash; does it still do anything meaningful?
Technically, it seems like httpu should do RoundTrip rather than Do, because RoundTrip is for a single transaction and Do "follows policy (e.g. redirects, cookies, auth)." That is, you might have to do less HTTP if you introduce the UDP stuff at a lower layer (RoundTripper rather than Client). But if it's at all awkward to do, then it doesn't matter; I'm sure what's there will work perfectly well.
I was confused by this in natclient/natpmp/packet.go:
{{{
opRespOffset = 128
...
if h.op != opExternalAddress+opRespOffset {
}}}
I think it would be more clear that you are checking the most-significant bit if it were
{{{
opRespFlag = 0x80
...
if h.op != opExternalAddress|opRespFlag {
}}}
The NAT-PMP RFC defines responses the way it's laid out in the code in their packet diagrams (Eg: "OP = 128 + 0"), though I may do what you suggested here.
What is this stuff at the bottom of some files?
{{{
var _ packetReq = (*externalAddressReq)(nil)
var _ packetReq = (*requestMappingReq)(nil)
}}}
It makes it immediately obvious that the concrete types do not implement the base interface, as errors in the file where the concrete types are defined. It's a development/debugging convenience for when the base interface changes for any reason to make it immediately obvious that I forgot to change the concrete implementations.
What happens in natclient/natpmp/getgateway_windows.go if the primary route is IPv6? It looks like it at least won't crash; does it still do anything meaningful?
I'm not sure here as MSDN doesn't say what happens. I assume an error is returned since the function definition and data structures involved all specify IP addresses as DWORDs. NAT-PMP as a protocol explicitly does not support IPv6 (PCP does, but the protocol is more complex), so any behavior here including (unlikely) getting a garbage return address is ok since the actual discovery will fail.
Technically, it seems like httpu should do RoundTrip rather than Do, because RoundTrip is for a single transaction and Do "follows policy (e.g. redirects, cookies, auth)." That is, you might have to do less HTTP if you introduce the UDP stuff at a lower layer (RoundTripper rather than Client). But if it's at all awkward to do, then it doesn't matter; I'm sure what's there will work perfectly well.
httpu's Do returns a slice of responses while the runtime's doesn't as well, with the idea that the caller knows how to deal with getting multiple responses, so the choice of Do here is more of a "trying to make it easy to understand what is going on" rather than "implementing an interface". I think things will be awkward regardless of what I do here, so I'm inclined to leave it as is.
Here are my comments and questions from reading the code. Aside, it's stupendous what you've singlehandedly accomplished with this code.
Typo: "failed to external address"
{{{
$ ./go-fw-helper -v -l
...
V: NAT-PMP: failed to external address: connection timed out
}}}
Fixed.
It's nice if natclient.New's comment describes legal values for protocol; i.e. "UPnP" and "NAT-PMP", and that "unspecified" means an empty string. On that note, main.go has
{{{
protocol := "auto"
}}}
But "auto" appears not to have any special meaning, and anyway it is overriden with an empty string by flag.StringVar.
What's the return value of ListOfPortMappings? What does each individual returned string look like?
Comment added. I could/should provide a helper that formats the response in the way that is expected, but since UPnP is the only protocol that will support this interface, having the actual formatting live there is probably fine for now.
Why does issueRequest return interface{} instead of *requestMappingResp? It looks like it can return some raw byte contents if an unexpected response is received, is that weird?
There were 3 things that can be returned from that function:
*requestMappingResp
*externalAddressResp
In all cases the opcode in the common header is validated to correspond to the request that was issued, and the calling code uses a type assertion then does what it needs to with the actual response. I could change this to return a response interface, but I'll still need the type assertion(s), so I don't see it as buying much.
I changed the return the raw response packet to return an error, since it was there as a catchall that I used during development.
Posting this from a go-fw-helper–enabled flashproxy Tor Browser. I suppose it would stop working after a while because flashproxy-client is not renewing the mappings, but for now it's working great.
What I did:
1.
```
cp ~/go-fw-helper/go-fw-helper Browser/TorBrowser/Tor/tor-fw-helper
Allowed UDP from the gateway router and TCP port 9000 in my local firewall.
So, let's get this into Tor Browser. BTW I still think you should name your version tor-fw-helper and just replace the current program of that name. Like (py)obfsproxy, the implementation language isn't a useful thing to have in a program's name.
Posting this from a go-fw-helper–enabled flashproxy Tor Browser. I suppose it would stop working after a while because flashproxy-client is not renewing the mappings, but for now it's working great.
Depends on which protocol it happened to use. If it's UPnP it should work indefinitely, as long as the router doesn't reboot/misbehave. I could crank up the default lease time for NAT-PMP as a short term workaround (the current default value is from the RFC though) as well.
What I did:
1.
{{{
cp ~/go-fw-helper/go-fw-helper Browser/TorBrowser/Tor/tor-fw-helper
}}}
2. Edit Browser/TorBrowser/Data/Tor/torrc-defaults:
{{{
ClientTransportPlugin flashproxy exec ./TorBrowser/Tor/PluggableTransports/flashproxy-client --register --port-forwarding-helper TorBrowser/Tor/tor-fw-helper --log flashproxy-client.log --unsafe-logging :0 :9000
}}}
3. Allowed UDP from the gateway router and TCP port 9000 in my local firewall.
Ooof. As something I should document somewhere, the UPnP documentation mentions that certain routers do not support the internal port and the external port being different. I'm not sure what the failure mode would be like in this case (given how awful some of the UPnP implementations in the wild are, I would assume the worst).
So, let's get this into Tor Browser. BTW I still think you should name your version tor-fw-helper and just replace the current program of that name. Like (py)obfsproxy, the implementation language isn't a useful thing to have in a program's name.
Hmm. Ok. I can rename the repo sometime soon, I still need to figure out where in the tp.o git repository tree this codebase should live in as well (and if we have plans to do anything with the old codebase).
Posting this from a go-fw-helper–enabled flashproxy Tor Browser. I suppose it would stop working after a while because flashproxy-client is not renewing the mappings, but for now it's working great.
Depends on which protocol it happened to use. If it's UPnP it should work indefinitely, as long as the router doesn't reboot/misbehave. I could crank up the default lease time for NAT-PMP as a short term workaround (the current default value is from the RFC though) as well.
It's UPnP on Netgear WNR1000v3. I left the browser closed all night and the mapping is still there in the morning, so I guess you are right. Leave the NAT-PMP lease time alone; we have to solve that problem in flashproxy-client anyway.
What I did:
1.
{{{
cp ~/go-fw-helper/go-fw-helper Browser/TorBrowser/Tor/tor-fw-helper
}}}
2. Edit Browser/TorBrowser/Data/Tor/torrc-defaults:
{{{
ClientTransportPlugin flashproxy exec ./TorBrowser/Tor/PluggableTransports/flashproxy-client --register --port-forwarding-helper TorBrowser/Tor/tor-fw-helper --log flashproxy-client.log --unsafe-logging :0 :9000
}}}
3. Allowed UDP from the gateway router and TCP port 9000 in my local firewall.
Ooof. As something I should document somewhere, the UPnP documentation mentions that certain routers do not support the internal port and the external port being different. I'm not sure what the failure mode would be like in this case (given how awful some of the UPnP implementations in the wild are, I would assume the worst).
Ah indeed. I should probably still document that caveat somewhere, even though strictly speaking it's a "the router firmware sucks" problem.
On a related note, how well does flashproxy tolerate the external port reported as success being different from the one that's requested? (Eg: You request a port mapping between 192.0.2.10:9000 <-> 0.0.0.0:9000, but the router establishes 192.0.2.10:9000 <-> 0.0.0.0:10000)
Right now the helper assumes "it doesn't" even though this can happen with NAT-PMP, and treats that as a failure and removes the mapping that was created. If it will do the right thing based on the stdout chatter, then (after checking if tor handles it), I'll adjust the code as appropriate (The internal port returned over stdout on success will always be what is expected).
There's a way to get UPnP to have this kind of behavior as well, but I'm not sure how widely deployed support for WANIPConnection version 2 is, or how reliable it would be.
If tor doesn't handle it, then I probably won't bother.
On a related note, how well does flashproxy tolerate the external port reported as success being different from the one that's requested? (Eg: You request a port mapping between 192.0.2.10:9000 <-> 0.0.0.0:9000, but the router establishes 192.0.2.10:9000 <-> 0.0.0.0:10000)
It won't work at all. I guess it could, by watching the output of tor-fw-helper, but it doesn't.
On a related note, how well does flashproxy tolerate the external port reported as success being different from the one that's requested? (Eg: You request a port mapping between 192.0.2.10:9000 <-> 0.0.0.0:9000, but the router establishes 192.0.2.10:9000 <-> 0.0.0.0:10000)
It won't work at all. I guess it could, by watching the output of tor-fw-helper, but it doesn't.
Ok I shall leave things as is then for now at least.
I went and renamed the executable/package to tor-fw-helper and bumped the version number up to 0.3 to be higher than the existing codebase. I moved the git repo in github as well, sorry for the inconvenience (https://github.com/Yawning/tor-fw-helper).
The big question in my opinion is to see whether anybody actually actually wants this functionality. It won't work for relays (since consumer NAT sucks), so we need to evaluate if there are a significant number of cases where we should have Tor launch it.
I've added a patch to remove the old tor-fw-helper and tell people where to look for one not tied to our old code. Let's see if we can find a way to tell whether anyone uses the new one?
Acking the branch. Not sure if pointing people at my rewrite vs "setup port forwarding by hand". Flashproxy probably still wants something like this, but consumer grade router issues make deployment a bad idea.
Before merging maybe e-mail some lists to see if people still use it?