It would be nice to do this so the value passed to the ExtORPort was correct for better metrics. A few ways this could be done, off the top of my head:
Set X-Forwarded-For. The "standard" layout of this field doesn't include the port, but since it's unofficial, there's nothing stopping us from adding it. This would require us to secure the link between the reflector and the meek-server instance separately, which means TLS.
Set a custom header (Eg: Meek-Forwarded-For), with a encrypted/encoded IP/Port pair. Less overhead than bringing TLS into the picture. I would use something like a Base64 encoded NaCl crypto_secretbox. Key management here may be an issue, though it depends on who runs the bridge and reflector (The other method has cert management to deal with so this isn't a strict minus IMO).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Can you be more specific about the metrics you're looking for? The one I can think of is user graphs that are broken down by pluggable transport and geoip, but there aren't graphs like that and they aren't planned: #10218 (moved).
I'm a little opposed to adding a custom header, for a few reasons:
It can be considered a feature that the bridge doesn't know client IPs. From the client point of view, "screw your metrics, I want my anonymity."
We don't actually control the reflector code on any platform other than Google. When we use a CDN, the CDN just adds whatever headers it wants. Actually it seems that CloudFront already adds X-Forwarded-For.
As I understand it, Tor could be modified to have its clients send a NETINFO cell containing the address. That's what we would want to do if metrics are really important, because it would generalize across other transports.
Can you be more specific about the metrics you're looking for? The one I can think of is user graphs that are broken down by pluggable transport and geoip, but there aren't graphs like that and they aren't planned: #10218 (moved).
That was what I wanted, though karsten's concerns are still valid (See the PT component of Sponsor S for why I care about this all of a sudden).
I'm a little opposed to adding a custom header, for a few reasons:
It can be considered a feature that the bridge doesn't know client IPs. From the client point of view, "screw your metrics, I want my anonymity."
We don't actually control the reflector code on any platform other than Google. When we use a CDN, the CDN just adds whatever headers it wants. Actually it seems that CloudFront already adds X-Forwarded-For.
As I understand it, Tor could be modified to have its clients send a NETINFO cell containing the address. That's what we would want to do if metrics are really important, because it would generalize across other transports.
Hmm, all valid reasons for not using a custom header. I would guess that most CDN platforms would set X-Forwarded-For, and if we wanted to use that information in meek-server, adding the header in the GAE go code would be trivial. I'll think more about #10218 (moved).
Hmm, all valid reasons for not using a custom header. I would guess that most CDN platforms would set X-Forwarded-For, and if we wanted to use that information in meek-server, adding the header in the GAE go code would be trivial. I'll think more about #10218 (moved).
You're probably right that all CDNs make the information available somehow. You don't want to use the client port, even if it is available, because a stream is made of multiple HTTP requests and the port is changing all the time. You would want to derive the port from the session-ID somehow.
If you dig through Psiphon's history on meek-client,
This is starting to look more important, now that meek users are a larger fraction of total bridge users. According to my eyeballing of this graph, it's around 20% now. (meek/(Any PT + Default OR protocol) = 5000/(17500+5000) = 22%.)
That means up to 20% of bridge users are having their country recorded incorrectly, which will affect https://metrics.torproject.org/userstats-bridge-country.html.
I have had this branch running on the meek-azure bridge for about a day. Here are the bridge stats. By both dirreq-v3-reqs and bridge-ips, us comes first, followed by ru and cn. It's surprising that there are so many meekers from the U.S.
Here are stats from before the patch. Note here the discrepancy between dirreq-v3-req and bridge-ips. The reason that bridge-ips is so low is that it deduplicates client IPs, and before the patch it appears that all connections come from a small number of CDN IP addresses.
For comparison, here are the stats for the other default bridges. meek-google requests seem to come from a very small number of IP addresses, less than 8.
I was curious as to what fraction of clients send X-Forwarded-For. On the meek-azure bridge, it is about 99.5%.
I temporarily applied [0001-Count-fraction-of-hosts-with-X-Forwarded-For.patch this patch], which counts the number meek sessions that started with an X-Forwarded-For request. It prints stats after every 1000 new sessions. I let it run for 5000 sessions (it took 20 minutes to get there).
useraddr: total 5000 X-Forwarded-For 4978 good 0 bad RemoteAddr 22 good 0 bad parse 5000 good 0 bad
This means, that out of 5000 requests, 4978 contained X-Forwarded-For or Meek-IP, and in all of those we were able to extract the client IP address from the header. For the remaining 22 requests, we fell back to the old behavior of using the source IP address (RemoteAddr). We were able to parse the resulting IP address in all 5000 cases.
Should we also set X-Forwarded-For/Meek-IP in the nginx, php, and wsgi reflectors? These are the ones that people are meant to set up on their own.
I think hardly anyone is using these at the moment. But if they do, we'll get inaccurate stats. On the other hand, I suppose it is more likely that people will make mistakes setting them up (such as not using an HTTPS origin) and in that case we would want not to forward the IP address.
Here's a graph showing how the graphs of meek users and US users begins to diverge after the merging of this patch.
The vertical lines are:
2015-12-14: ticket merged into meek-azure
2015-12-20: ticket merged into meek-google and meek-amazon.
We recently found out that meek-amazon hadn't actually upgraded to a version including this patch, so it was still counting essentially all users as being from the U.S. Since yesterday it has been upgraded.
Here is an updated graph, this time including cn in addition to us. The three vertical lines show the dates when the azure, google, and amazon bridges merged this patch in order.
I'm confused by why the cn line is pretty much flat since the merging of this patch. When I look at the dirreq stats for these bridges, about 12% of requests come from cn. Not sure why it's not reflected in the graph.
Here is a picture that graphs directory requests, as suggested in #18167 (moved). This graph makes more sense to me. Here, the countries are broken down per bridge. You can easily see the dates when each bridge merged this ticket, #13171 (moved), and switched from nearly 100% us to a mixture of countries. The top 5 countries are us, ru, de, gb, and cn.