Opened 6 years ago

Closed 4 years ago

#5040 closed enhancement (implemented)

Make public bridges add obfsproxy stats to their extra-info descriptors

Reported by: karsten Owned by: asn
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: pt tor-bridge flashproxy SponsorL SponsorF20131101 wants-mocking
Cc: arma, asn, nickm, dcf@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Roger is planning to install obfsproxy on moria, probably pointing to a separate Tor process being a private bridge, and publish its address to get some testers for obfsproxy.

We discussed adding obfsproxy stats to Tor, so that we can keep track of obfsproxy usage over time. However, this requires the extended OR port to be implemented, or Tor wouldn't know much about clients connecting to obfsproxy.

The new obfsproxy stats code would go into Tor 0.2.4.x anyway, but setting up an external proxy is already possible with 0.2.2.x. So, we wouldn't get all statistics anyway.

We could make obfsproxy write some stats to disk and ask people we know to give us those files, but that's probably not worth the effort.

We should discuss adding obfsproxy statistics to 0.2.4.x together with the extended OR port. Once we have that we should distribute images with public bridges running 0.2.4.x, with obfsproxy, and with obfsproxy stats.

Child Tickets

TicketTypeStatusOwnerSummary
#7751defectclosedtorspec: Announce name of active pluggable transport through the Extended ORPort
#8045defectclosedtorspec: Make public bridges add obfsproxy stats to their extra-info descriptors

Change History (31)

comment:1 Changed 6 years ago by asn

Do we care about bridges giving out separate statistics for "normal" usage, and for pluggable transport usage? Can this harm anonymity? If we like doing this, we should spec. it.

The implementation plan for providing client geoip statistics to bridge authorities could be something like: When using Extended ORPort, everytime we receive a client address through USERADDR, pass it to geoip_note_client_seen() like we are currently doing in connection_or_set_state_open().

comment:2 in reply to:  1 Changed 6 years ago by karsten

Replying to asn:

Do we care about bridges giving out separate statistics for "normal" usage, and for pluggable transport usage? Can this harm anonymity? If we like doing this, we should spec. it.

A fine question. I'd say let's do this similar to how we're planning to extend IPv6 bridge usage statistics: 1) we count all connecting bridge users by country, regardless of IPv4 or IPv6, and 2) we add a new statistics how many users are IPv4 users and how many are IPv6 users. See #5055.

In the context of obfsproxy statistics, 1) is what you describe:

The implementation plan for providing client geoip statistics to bridge authorities could be something like: When using Extended ORPort, everytime we receive a client address through USERADDR, pass it to geoip_note_client_seen() like we are currently doing in connection_or_set_state_open().

For 2), I could imagine a stats line like this:

bridge-transport or=NN,obfs2=NN,...

The NN parts would be unique addresses connecting via a given transport. These sets could overlap if a user connects via multiple transports, but that would be fine. The stats would tell us when a given transport sees increasing/decreasing use.

I think 2) is somewhat harder to implement in geoip.c, because we'd have to add transport information to the data structure holding unique IP addresses.

What these stats would not tell us is when a given transport stops working in a given country. That's similar how the IPvX stats won't tell us when a given country blocks a given IP version. I think that's okay. Other thoughts?

comment:3 Changed 6 years ago by asn

Component: ObfsproxyTor Bridge
Keywords: pt added

(This is a Tor Bridge ticket.)

comment:4 Changed 6 years ago by nickm

Milestone: Tor: 0.2.4.x-final

comment:5 Changed 5 years ago by nickm

Keywords: tor-bridge added

comment:6 Changed 5 years ago by nickm

Component: Tor BridgeTor

comment:7 Changed 5 years ago by phobos

Keywords: SponsorL added

comment:8 Changed 5 years ago by arma

Keywords: SponsorF20131101 added

comment:9 Changed 5 years ago by asn

Note to self: We will probably need to pass the transport name through the Extended ORPort to provide transport-specific statistics.

comment:10 Changed 5 years ago by asn

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

comment:11 Changed 5 years ago by asn

Status: newneeds_review

Please see branch bug5040 in https://git.torproject.org/user/asn/tor.git.

BTW, in the geoip.c code, I only consider direct connections as pluggable transport clients (I ignore directory requests).

comment:12 Changed 5 years ago by asn

Karsten, should I be counting GEOIP_CLIENT_NETWORKSTATUS or GEOIP_CLIENT_NETWORKSTATUS_V2 geoip actions, instead of GEOIP_CLIENT_CONNECT ? That is, should I be counting the number of consensus fetches, or the number of direct connections? In your last email, you seem to be implying the former.

comment:13 Changed 5 years ago by karsten

13:36:31 < karsten> asn: can we learn the number of requested 
                    consensuses by transport?
13:37:07 < karsten> (I don't know very much about the extended OR 
                    port spec.)
13:57:01 < asn> karsten: I think so. I will have to look into this.

comment:14 Changed 5 years ago by dcf

Cc: dcf@… added
Keywords: flashproxy added

comment:15 Changed 5 years ago by asn

In the dev meeting, Karsten suggested to add yet another line for measuring obfsbridge stats.
He wants a dirreq-v3-transport line with the exact same format as bridge-ip-transports, that counts consensus fetches instead of direct connections. This will improve the granularity of bridge statistics, and it will help us count users accurately in scenarios like flashproxy (where each client is actually a flashproxy bridge).

This means that we should be considering the GEOIP_CLIENT_NETWORKSTATUS_V2 and GEOIP_CLIENT_NETWORKSTATUS events in this case, instead of GEOIP_CLIENT_CONNECT.

comment:16 Changed 5 years ago by asn

To get this branch merged sooner, I made a new trac ticket (#8786) for the dirreq-v3-transport line. Let's keep this ticket for bridge-ip-transports.

comment:17 Changed 5 years ago by nickm

It looks like this is based on some version of #4773 ? What commits should I be looking at here?

comment:18 Changed 5 years ago by asn

Yes, this is based on bug4773_rebase since the Extended ORPort codebase was required.

You are interested in the commits after bug4773_rebase. That is, from (and including) 26e5db3118e7b46deb287c4ff9460d28c83c01b1 to d355e8372602d4f2826e9df2afa5a1b536f436d0.

comment:19 Changed 4 years ago by nickm

Status: needs_reviewneeds_revision

Okay. This review is based on "git log -p --reverse 26e5db3118e7b46deb28..d355e8372602d4f2826e9df2afa5a1b536f436d0"

General:

  • This needs an accompanying torspec patch.

26e5db3118e7b46deb287c4ff9460d28c83c01b1

  • Use tor_memdup_nul_terminate() or whatever it's called for connection_or_handle_cmd_transport.
  • The ext_or_transport field needs to document what the format is for the "name of the pt obfuscating this connection".
  • What happens if an IP connects twice with different transports? What *should* happen? (Does the unit test check that?)

bca5b2ce343d1a34ac7168fb4d4a74334e367244:

  • The smartlist_string_isin() check will make the algorithm O(n)2. Why not use "if (val == 1)"?
  • Does transport_counts hold an signed intptr_t, or an unsigned uintptr_t, or a value in the range of 'unsigned'? You're using it both ways.
  • You don't need to use i as an index; just use "transport_name_sl_idx", the implicit index that you get for free.
  • Actually, why not forget the business with "," vs "" in the loop, and instead pass "," as the joining string to smartlist_join_strings?

Looks okay otherwise.

comment:20 in reply to:  19 Changed 4 years ago by asn

Status: needs_revisionneeds_review

Replying to nickm:

Okay. This review is based on "git log -p --reverse 26e5db3118e7b46deb28..d355e8372602d4f2826e9df2afa5a1b536f436d0"

General:

  • This needs an accompanying torspec patch.

For the torspec branch see tickets #7751 and #8045.

26e5db3118e7b46deb287c4ff9460d28c83c01b1

  • Use tor_memdup_nul_terminate() or whatever it's called for connection_or_handle_cmd_transport.

I couldn't find tor_memdup_nul_terminate() or any other related function. Do you maybe remember the price name?

  • The ext_or_transport field needs to document what the format is for the "name of the pt obfuscating this connection".
  • What happens if an IP connects twice with different transports? What *should* happen? (Does the unit test check that?)

bca5b2ce343d1a34ac7168fb4d4a74334e367244:

  • The smartlist_string_isin() check will make the algorithm O(n)2. Why not use "if (val == 1)"?
  • Does transport_counts hold an signed intptr_t, or an unsigned uintptr_t, or a value in the range of 'unsigned'? You're using it both ways.
  • You don't need to use i as an index; just use "transport_name_sl_idx", the implicit index that you get for free.
  • Actually, why not forget the business with "," vs "" in the loop, and instead pass "," as the joining string to smartlist_join_strings?

Looks okay otherwise.

Refetch my bug5040 branch for fixes on the rest of your comments.

BTW, do you think that clientmap_entry_hash() can be more elegant? My new changes look a bit bulky there.

comment:21 Changed 4 years ago by nickm

Keywords: wants-mocking added
Status: needs_reviewneeds_revision

Notes for Nick: I'm happy to make these changes myself before merging, once #8949 is merged. I'm just copying these here so I don't lose them.

  • HEY NICK! Remember that this is based on bug4773_rebase!
  • check test coverage on this.
  • connection_ext_or_handle_cmd_transport should use tor_memdup_nulterm.
  • clientmap_entries_eq should use strcmp_opt.
  • The hash function also needs to look at transport_name if that's now part of the key
  • There should be a clientmap_entry_free() function.
  • geoip_get_transport_history -- I'm not too happy about using intptr_t when accumulating results but unsigned int when formatting them.

Notes for George:

  • What's up with the "<?\?>" string? Shouldn't that be "<?
    ?>" ? But why ?\? in the first place?

comment:22 Changed 4 years ago by nickm

What's up with the "<?\?>" string? Shouldn't that be "<?
?>" ? But why ?\? in the first place?

George informs me that it's supposed to be "<??>", but that ??> is a trigraph.

To hell with trigraphs.

comment:23 Changed 4 years ago by nickm

The branch "bug5040_4773_rebase" in my public repository has all the changes above I want to make in this, and in 4773. It's rebased to master. The big remaining issue is writing unit tests to exercise the 119 new lines that aren't tested.

comment:24 Changed 4 years ago by nickm

I rebased again, to get more testing machinery. Now it's "bug5040_4773_rebase_2".

comment:25 Changed 4 years ago by asn

New tests in bug5040_4773_rebase_2 look good.

Also check out my nickm-bug5040_4773_rebase branch in https://git.torproject.org/user/asn/tor.git, for some easy DOCDOC fixes.

comment:26 Changed 4 years ago by nickm

cherrypicked your nickm-bug5040_4773_rebase onto bug5040_4773_rebase_2; refactored a little there.

What I most want tests for here is the code in ext_orport.c

comment:27 Changed 4 years ago by nickm

I'm up to ~59% test coverage for ext_orport.c in bug5040_4773_rebase_2, and aiming higher.

I've found my first bug already: when authentication fails and we write [00], we don't hold the connection open until that's flushed.

comment:28 Changed 4 years ago by nickm

Found a few more iffy things:

  • When we call connection_or_change_state when transitioning to being an OR connection, the current state is a junky EXT_OR_CONN_STATE value.
  • We don't appear to adjust conn->address when we set conn->addr.
  • We never send the CONTROL value.

Now it's at 93% coverage for ext_or_port.c. We can and should get the remaining failing cases tested, but this is a good start. The branch is still "bug5040_4773_rebase_2"

comment:29 Changed 4 years ago by asn

Status: needs_revisionneeds_review

Read your unit tests. Great job!

I made a small branch that fixes the errors you found. You can see it in bug5040_fix_test_bugs at https://git.torproject.org/user/asn/tor.git.

I set the connection state to 0 when we transition. The other solution would be to create a new state value for this occasion (EXT_ORPORT_TRANSITIONING or something) and set it to that.

We don't send the CONTROL command, because the TransportControlPort is not implemented yet. Also, no pluggable transports support the CONTROL command at the moment.

comment:30 Changed 4 years ago by asn

Hey Nick,

it seems that when you merged the branches of #5040 and #4773 together, you merged the wrong #4773 branch. I had a branch named bug4773_rebase which fixed some of your comments and it was not merged in your bug5040_4773_rebase_2.

In any case, I merged those fixes in your latest branch and resolved all conflicts. I also wrote a few more tests. Please see my branch bug5040_4773_rebase_2 in https://git.torproject.org/user/asn/tor.git.

comment:31 Changed 4 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Aaaaand, merged!

Note: See TracTickets for help on using tickets.