In #5040 (moved), Karsten suggested to add yet another line for measuring obfsbridge stats.
He wants a dirreq-v3-transport line with the exact same format as bridge-ip-transports, that counts consensus fetches instead of direct connections. This will improve the granularity of bridge statistics, and it will help us count users accurately in scenarios like flashproxy (where each client is actually a flashproxy bridge).
This means that we should be considering the GEOIP_CLIENT_NETWORKSTATUS_V2 and GEOIP_CLIENT_NETWORKSTATUS events in this case, instead of GEOIP_CLIENT_CONNECT.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Type: defect to enhancement Priority: normal to minor Keywords: pt tor-bridge flashproxy deleted, pt tor-bridge flashproxy 026-triaged-1 026-deferrable added
I implemented this feature (and the tightly related feature to track the number of consensus downloads by IP version) in branch task-8786 in my public repository. It's yet untested except for unit tests (and we should only test it with PublishServerDescriptor 0 in the public network).
Review much appreciated. I'll happily review another branch (of comparable size) in exchange.
Trac: Severity: N/Ato Normal Sponsor: N/AtoN/A Status: assigned to needs_review
I don't know the code, but I read through the patch.
The patch uses the "%d" format specifier for the output of round_to_next_multiple_of in geoip_get_requests_by_transport_history, but uses the "%u" format specifier in geoip_get_requests_by_version_history.
It surprised me that geoip_get_requests_by_version_history returns NULL when both the v4 and v6 counts are 0, instead of returning "v4=0,v6=0". But I guess it matches what geoip_get_requests_by_transport_history does when all counts are 0.
I don't know the code, but I read through the patch.
Thanks! Much appreciated! (Please let me know if you have a branch for me to review, and I'll take a look.)
The patch uses the "%d" format specifier for the output of round_to_next_multiple_of in geoip_get_requests_by_transport_history, but uses the "%u" format specifier in geoip_get_requests_by_version_history.
Agreed. Changed to "%u" in both cases.
It surprised me that geoip_get_requests_by_version_history returns NULL when both the v4 and v6 counts are 0, instead of returning "v4=0,v6=0". But I guess it matches what geoip_get_requests_by_transport_history does when all counts are 0.
Right, I think returning a count of 0 has little value. But I went even further and changed "v4=8,v6=0" to just "v4=8", because that "v6=0" doesn't provide much value, either.
Changes pushed as separate commit to the same branch. Thanks again!
Hello. The patch looks reasonable. I see it hasn't got any real world testing yet. Would you like me to run this on my bridge for 2-3 days and see what kind of results we get?
Quick update: asn was so kind to run this branch on LeifEricson, one of the bridges shipped with Tor Browser. After 24 hours that bridge reported the following lines:
Obviously, all directory requests are being attributed to connections and coming in via IPv4. While the latter is at least plausible, the former is clearly wrong. This is caused by passing NULL as third parameter to geoip_note_client_seen in the following part of directory.c:
if (1) { struct in_addr in; tor_addr_t addr; if (tor_inet_aton((TO_CONN(conn))->address, &in)) { tor_addr_from_ipv4h(&addr, ntohl(in.s_addr)); geoip_note_client_seen(GEOIP_CLIENT_NETWORKSTATUS, &addr, NULL, time(NULL));
I'm not sure yet how we can learn the transport name at this point. IIRC, conn is a dir_connection_t, but in case of tunneled directory requests there are also an edge_connection_t and an or_connection_t involved. And in order to learn the transport, we'll have to check the channel_t to invoke its get_transport_name function.
Here's a wild idea: we use conn->dirreq_id to find the circuit_t in circuit_get_globel_list() with matching dirreq_id, and then we can learn the transport via TO_OR_CIRCUIT(circ)->p_chan. Totally untested, of course.
Speaking of tunneled vs. direct directory requests, there's also the possibility that a directory request is made to the DirPort, and in that case we couldn't say it's using transport <OR>. Maybe we'll have to introduce a new transport <DIR> to count those directory requests in dirreq-v3-transport. I don't expect there to be many such requests, but if it's easy to be correct here, let's try that.
Regarding dirreq-v3-version, when looking at the code above it's unclear whether the distinction between IPv4 and IPv6 actually works. Without testing, I'd say that we should try passing (TO_CONN(conn))->address to geoip_note_client_seen rather than messing with it first.
All the above is a mix of note-to-self and request-for-help. If something of the above seems obviously wrong or right to somebody here, please do let me know. Otherwise, I'll hack on this in a few days.
There, I implemented my idea above and gave it some testing using Chutney. Not sure if it's the most elegant approach, but I think it works. See the updated branch here.
asn, mind reviewing the change and giving this branch another chance on your bridge, ideally with PublishServerDescriptor 0 being set and you and I meeting in #tor-dev 24.000000000 hours later?
There, I implemented my idea above and gave it some testing using Chutney. Not sure if it's the most elegant approach, but I think it works. See the updated branch here.
asn, mind reviewing the change and giving this branch another chance on your bridge, ideally with PublishServerDescriptor 0 being set and you and I meeting in #tor-dev 24.000000000 hours later?
Patch looks like it could work. I'm now running it on my bridge.
Ideally we would want to functionify all the extra code and put it on a separate function, to not pollute directory_handle_command_get() further.
Also, I'm still wondering if there is an O(1) way to get from dir_connection_t to channel_t by following links (maybe through connection_t). Because the current approach is O(n) where n is the number of open circuits. Not sure if this is premature optimization or not. Nick or Andrea would know.
Not premature optimization, but I do have a plan to make our lookups of this kind much faster without so many of the silly tricks we've done in the past.
nickm, thanks a lot for the review! I worked on those suggestion and pushed them to the same branch. But let me start with another fix that I started working on before reading your review:
tl;wr: My branch (commit cda77b4) is broken, though only in a way that it logs something inaccurate, not in a harmful way. I believe that neither the bridges' configurations nor the extended OR port implementation are to blame. I wrote a fix for this.
After adding extensive logging to all code having to do with dirreq_id and letting it run for hours, I found the issue with my branch (commit cda77b4). Here's what happened:
The client sends a BEGIN_DIR cell on its circuit to the bridge.
The bridge assigns dirreq_id 9779 to the circuit and the circuit's p_chan, opens an edge connection and a dir connection, and assigns the same dirreq_id 9779 to both new connections. It then sends back a cell (CONNECTED?) to the client and waits for the client to send its GET request via that circuit.
The client sends another BEGIN_DIR cell on the same circuit.
The bridge assigns dirreq_id 9780 to the circuit and its p_chan, overwriting previous value 9779. It also creates a new edge connection and a new dir connection and sets their dirreq_id to 9780, but that's irrelevant here.
The client sends the GET request to the dir connection with dirreq_id 9779, which is a request for a consensus.
The bridge attempts to find circuit with dirreq_id 9779 to find out the transport name of its p_chan, but there is no circuit with that dirreq_id anymore, because it was overwritten with 9780 above. The bridge falls back to counting this request as coming in via the transport, which is wrong. It should probably count it as ?>, but really we should fix this bug and count it as obfs3 in this case.
The client sends a second GET request for the bridge's server descriptor to the dir connection with dirreq_id 9780.
I'm also pasting the logs below:
Feb 22 14:34:48.463 [notice] [...]Feb 22 14:34:48.673 [notice] XXX8786 Assigning new dirreq_id 9779 to circuit with p_circ_id 3181807694 and non-zero dirreq_id 9776 and to p_chan with global_identifier 5 and previous dirreq_id 9776.Feb 22 14:34:48.673 [notice] XXX8786 Assigning dirreq_id 9779 to new edge connection.Feb 22 14:34:48.674 [notice] XXX8786 Assigning dirreq_id 9779 to new dir connection.Feb 22 14:34:48.674 [notice] XXX8786 Assigning new dirreq_id 9780 to circuit with p_circ_id 3181807694 and non-zero dirreq_id 9779 and to p_chan with global_identifier 5 and previous dirreq_id 9779.Feb 22 14:34:48.674 [notice] XXX8786 Assigning dirreq_id 9780 to new edge connection.Feb 22 14:34:48.674 [notice] XXX8786 Assigning dirreq_id 9780 to new dir connection.Feb 22 14:34:48.675 [notice] XXX8786 Request for /tor/status-vote/current/consensus-microdesc/1AFF35+2C2591+E34739.Feb 22 14:34:48.675 [notice] XXX8786 Counting directory request for /tor/status-vote/current/consensus-microdesc/1AFF35+2C2591+E34739 with dirreq_id 9779 as <OR>, because we didn't find a circuit with matching dirreq_id.Feb 22 14:34:48.676 [notice] XXX8786 Request for /tor/server/authority.Feb 22 14:34:53.461 [notice] [...]
The bug here is that the dirreq_id-assigning code does not take concurrent directory requests over the same circuit into account. We can expect that the stats based on dirreq_id, in particular "dirreq-v3-tunneled-dl" lines, are partially broken, too. Fortunately, we're not using those lines for anything critical. That's the bad news.
The good news is that I wrote a fix for counting requests by transport. Please see commit a743f3a in the updated branch task-8786 in my public repository. The commit message says what was fixed. Pasting here, though it overlaps with the description above:
The previous approach to look up the transport name of a directory request by going through the list of circuits and finding the circuit with same dirreq_id was broken. That dirreq_id may already have been overwritten by a newly arriving BEGIN_DIR cell before the HTTP GET request that we're about to include in the stats arrives and gets parsed. This issue also affects "dirreq-v3-tunneled-dl" lines, but fixing those is independent of implementing this new feature. The new approach is to store the transport name in the locally created dir connection when processing the BEGIN_DIR cell. That way it cannot be changed by a subsequent BEGIN_DIR cell on the same circuit. The downside is a new char* in dir_connection_t, the upside is that we can avoid iterating over all circuits.
nickm, I also tried to address your suggestions in subsequent commit 1508787.
I would like to get this branch tested on asn's and mrphs' bridges and on an IPv6 bridge before getting it merged. It also contains a temp commit for testing that must be reverted before merging! Leaving this ticket at needs_revision until that is all done. Deferring until 0.2.9 is fine by me.
Trac: Summary: Add extra-info line that tracks the number of consensus downloads of each pluggable transports to Add extra-info line that tracks the number of consensus downloads over each pluggable transport Cc: athena toN/A Keywords: flashproxy deleted, metrics privcount needs-design added
I don't think I'll find the time to build this in the foreseeable future. And maybe we'll be using PrivCount for these statistics by whenever I'd find the time again. I'm unassigning this ticket from myself, to have a more accurate view on what tickets I'm actually working on.
Trac: Status: needs_revision to assigned Owner: karsten toN/A