Opened 3 years ago

Last modified 4 weeks ago

#15469 new enhancement

Remove data structure containing unique IP address sets

Reported by: karsten Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay, privacy, research
Cc: nickm, Sebastian, atagar, beastr0@… Actual Points:
Parent ID: #7532 Points:
Reviewer: Sponsor:

Description

Relays keep a data structure of unique connecting IP addresses for statistics and for informational purposes.

We should consider removing that data structure. There's a privacy risk in gathering unique IP address sets in memory and in reporting aggregate statistics based on them. If we don't need these statistics, we should stop reporting them and stop gathering the underlying data.

The main (and only?) data structure containing unique IP address sets is clientmap in src/or/geoip.c. If we remove that data structure, we would also have to remove:

  1. the dirreq-v3-ips line from extra-info descriptors,
  2. all "bridge statistics" including bridge-stats-end, bridge-ips, bridge-ip-versions, and bridge-ip-transports lines from extra-info descriptors,
  3. all "entry node statistics" including entry-stats-end and entry-ips from extra-info descriptors,
  4. the log line "Heartbeat: In the last %d hours, I have seen %d unique clients.", and
  5. the CLIENTS_SEEN controller event.

1 and 3 are not used. 2 is used by Metrics to estimate the number of daily bridge users, and we'd need to implement #8786 before removing bridge statistics. atagar thinks that 4 was added by Sebastian a few years back, so that relay operators with certain simple use cases don't need to open a control port and run something like arm. 5 is used by arm for one of its dialogs, and atagar thinks it's not the end of the world to lose that.

Thoughts?

Child Tickets

Change History (14)

comment:1 Changed 3 years ago by nickm

Keywords: tor-relay privacy SponsorR added
Milestone: Tor: 0.2.7.x-final
Parent ID: #7532
Priority: normalmajor

We can do much better here, in fact. We can retain an estimate of unique IPs without keeping a map of client IPs.

(I am guessing that SponsorR might be interested here since they are interested in safe stats for hidden services, and a general safe stats infrastructure could be quite useful.)

Oluwakemi Hambolu and Richard Brooks (both of Clemson) have drawn my attention to a couple of estimation techniques that could help a lot here. Have a look at these papers:

http://www.mathcs.emory.edu/~cheung/papers/StreamDB/Probab/1985-Flajolet-Probabilistic-counting.pdf
http://arxiv.org/pdf/math/0608176.pdf
http://agl.cs.unm.edu/~forrest/publications/final-2007.pdf

and this MS thesis:

http://tigerprints.clemson.edu/cgi/viewcontent.cgi?article=2987&context=all_theses

And also see "Attack Tolerant Privacy Preserving Statistics using Probabilistic Counting" (forthcoming; I have permission to share a copy).

See also ticket #7532

comment:3 Changed 3 years ago by nickm

I've started implementing something here in an "approx_counting" branch. No tests yet, probably many bugs, not yet used in the IP-counting code.

comment:5 Changed 3 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:6 Changed 2 years ago by nickm

Keywords: SponsorR removed
Sponsor: SponsorR

Bulk-replace SponsorR keyword with SponsorR sponsor field in Tor component.

comment:7 Changed 2 years ago by dgoulet

Keywords: research added
Milestone: Tor: 0.2.8.x-finalTor: 0.2.???

comment:8 Changed 2 years ago by nickm

Severity: Normal

Also see "HyperLogLogs", which are apparently another name for one of these techniques (thanks to Moritz for bringing this to our attention a couple of years back, and to the person who reminded me tonight).

comment:9 Changed 2 years ago by dgoulet

Sponsor: SponsorRSponsorR-can

Move those from SponsorR to SponsorR-can.

comment:10 Changed 16 months ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:11 Changed 15 months ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:12 Changed 9 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:13 Changed 9 months ago by dgoulet

Sponsor: SponsorR-can

comment:14 Changed 4 weeks ago by beastr0

Cc: beastr0@… added
Note: See TracTickets for help on using tickets.