Preserving hashed IP addresses in sanitized bridge descriptors

added component::metrics/collector owner::karsten priority::medium resolution::implemented status::closed type::enhancement labels

Christian and I discussed this approach some more. Christian is concerned that someone might brute force the secret. The attacker could set up a few bridges, remember their IP addresses and bridge identities, look up the sanitized descriptors in our archives, and try out which secret leads to the same 10.x.x.x address in our descriptors. This attack could be performed offline. He suggests using a much longer secret and changing it regularly.

I somewhat dislike the idea of changing the secret regularly, because it means we cannot compare the sanitized IP addresses of multiple intervals easily. But we're probably safer by changing it, e.g., monthly. Using a longer secret, say, 40 or 60 bytes (or even longer?), is a fine idea, too.

Trac:
Points: N/A to N/A

Ian suggests on or-dev to use a 31 byte long secret here. The idea is to fit IP address, bridge identity, and secret in one SHA block which is 447 bits long. The IP address is 32 bits, the bridge identity is 160 bits, so that we have 255 bits left, or 31 bytes because we're byte-aligned.

Ian also suggests using SHA-256 instead of SHA-1, mostly because SHA-1 shouldn't be used for anything new at this point.

Yesterday I finished the implementation of hashed IP addresses in metrics-db (#2505 (moved)). I also sanitized some old bridge descriptors from 2008 with the new algorithm last night.

Here's an early analyis of sanitized bridge descriptors containing IP address hashes. The idea of the analysis is to compare unique IP addresses of a bridge compared to the number of statuses that contain this bridge.

There are two graphs in the attachment. The first graph shows a scatter plot of unique IP addresses and days of operation. Only bridges with 24 hours of operation are shown. There is an accumulation of points at the lower left of the graph which are bridges with only a few days of bridge operation. These bridges are probably not as useful for bridge users, because they are unavailable most of the time. In contrast to that, the accumulation of points with almost 30 days of operation and only very few unique IP addresses indicates stable bridges on static IP addresses that are probably most useful for bridge users. Points close to the dashed line indicate bridges that change their IP address once a day. Points above the dashed line are probably not as useful for clients, too, because they change their IP address more than once per day. These bridges are only useful if bridge users download new bridge descriptors for known bridges from the bridge authority.

The second graph shows the cumulative fraction of bridges having a given number of unique IP addresses per day. Again, the dashed line indicates bridges on dynamic IP addresses that change their IP address once a day. Two thirds of the bridges either have static IP addresses or change their address at most once a day. This leaves us with one third of bridges changing their IP address more often than that.

The next steps are:

Update the specification-like description of our the sanitizing process here.
Post the sanitized descriptors from November 2008 to or-dev for others to look.
Sanitize the 2.5 years of descriptors that we have once again and make them available on the metrics website.

I'm planning to do the first two items today and publish the sanitized descriptors next Tuesday (assuming the sanitizing process finishes by then).

Trac:
Pointsdone: N/A to N/A
Actualpointsdone: N/A to N/A
Status: new to assigned
Actualpoints: N/A to N/A

Trac:

Scatter plot: Unique IP addresses of bridges running at least 24 hours in Nov 2008

Trac:

Cumulative distribution: Unique IP addresses per day of bridges running at least 24 hours in Nov 2008

The technical report is updated, and or-dev has another mail from me on the topic. Starting to sanitize descriptors using the new algorithm...

New tarballs are available and announced on tor-dev. Closing.

Trac:
Resolution: N/A to implemented
Status: assigned to closed

closed

mentioned in issue #2505 (moved)

Preserving hashed IP addresses in sanitized bridge descriptors

Child items ...

Activity