Opened 9 years ago

Closed 3 years ago

#2866 closed task (wontfix)

Analyze bridges in the "reserved" bucket

Reported by: karsten Owned by:
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I'm interested in learning whether keeping a certain fraction of bridges unassigned, that is not distributing them via email or HTTP, is a good idea. AIUI, the idea was to have a small set of fresh bridges in case we come up with a new distribution channel or want to give out fresh bridges manually. This idea might fail if people who run a bridge that ends up in the unallocated pool decide that their bridge is not being useful. They might turn off their bridge or delete their keys in order to get a new fingerprint and end up in another pool. If many people do so, we might better allocate all bridges to pools directly and start a new pool whenever there's a new distribution channel. Given the high churn of bridges, we might have a sufficient set of fresh bridges in that pool very soon. Also, if we want to give out bridges manually, we might give out bridges from the other pools which may have higher uptime than bridges in the unallocated pool. Allocating all bridges also means we don't have to explain to bridge operators why their bridge is also useful even if it doesn't have any users right now.

(This description comes from #2372 where we started with the same question, but then focused on making the pool assignments publicly available. Now that we have the assignments we can focus on this question again.)

Child Tickets

Attachments (1)

pool_counts.png (16.8 KB) - added by peer 7 years ago.

Download all attachments as: .zip

Change History (13)

comment:1 Changed 8 years ago by arma

Component: MetricsAnalysis

comment:2 Changed 8 years ago by arma

This is now an even more interesting question, since we started mailing the reserved bridges out to our two contacts. (I haven't asked the two contacts how much use they're making of the bridges.)

comment:3 Changed 8 years ago by arma

Cc: chiiph added

Tomas: this is a great ticket for somebody to work on.

See the later paragraphs in https://trac.torproject.org/projects/tor/ticket/2372#comment:1 for other hints about good questions to answer here.

comment:4 Changed 8 years ago by karsten

Owner: karsten deleted
Status: newassigned

I'm not really working on this ticket. Re-assigning to the None guy for now. Tomás, if you're working on it, please grab it.

comment:5 Changed 8 years ago by chiiph

Cc: chiiph removed
Owner: set to chiiph

Yes, I'm slowly working on this one.

comment:6 Changed 7 years ago by chiiph

Owner: chiiph deleted

Sadly, I don't have time for this anymore, so I'll leave it unassigned.

comment:7 Changed 7 years ago by peer

Status: assignedneeds_review

Based on the bridge pool assignments for December 2012, there does not appear to be large differences in approximate uptimes based on category (email/https/unallocated).

December 2012 has 1477 (expected: 1488) assignment files, with timestamps roughly 30 minutes apart. There were no empty assignment files.

Uptime was approximated by the continuous presence of the bridge's hashed fingerprint. Each time unit (in the summary below) would be about 30 minutes. If a bridge identifier disappeared and reappeared, two uptime entries would be generated. If multiple category assignments for an identifier exist, only the first entry would be taken into account.

Based on uptime entry counts, bridges did not appear to change categories between consecutive assignment files. However, there does appear to be slightly fewer uptime entries for the unallocated pool relative to its fraction (0.13 versus > 0.15).

              Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
all           1.00    2.00    5.00   35.57   14.00 1470.00 
email         1.00    2.00    5.00   35.22   14.00 1470.00 
https         1.00    2.00    5.00   34.94   13.00 1453.00 
unallocated   1.00    2.00    5.00   38.84   13.00 1469.00 

If this analysis is on the right track, the code can be added to the repository.

comment:8 in reply to:  7 ; Changed 7 years ago by karsten

Status: needs_reviewneeds_revision

Replying to peer:

Based on the bridge pool assignments for December 2012, there does not appear to be large differences in approximate uptimes based on category (email/https/unallocated).

December 2012 has 1477 (expected: 1488)

Sounds like we missed a few bridge pool assignment files when collecting them using metrics-db. If this turns out to be problematic for your analysis, I can have a look why that happened.

assignment files, with timestamps roughly 30 minutes apart. There were no empty assignment files.

Uptime was approximated by the continuous presence of the bridge's hashed fingerprint. Each time unit (in the summary below) would be about 30 minutes. If a bridge identifier disappeared and reappeared, two uptime entries would be generated.

I'm unclear why you count uptime sessions of the same bridge as distinct entries. I'd think that bridges in the email or https buckets have higher overall uptime in the considered months than unallocated buckets.

If multiple category assignments for an identifier exist, only the first entry would be taken into account.

In theory, bridges are not re-assigned to other buckets.

Based on uptime entry counts, bridges did not appear to change categories between consecutive assignment files. However, there does appear to be slightly fewer uptime entries for the unallocated pool relative to its fraction (0.13 versus > 0.15).

The fact that there are fewer entries for the unallocated pool may indicate that these bridges don't come back as often as bridges in the other pools. See my comment above about counting uptime sessions of the same bridge more than once.

If this analysis is on the right track, the code can be added to the repository.

Sure, please post a metrics-tasks branch here, and I'll merge it. Thanks!

Changed 7 years ago by peer

Attachment: pool_counts.png added

comment:9 in reply to:  8 Changed 7 years ago by peer

Replying to karsten:

Sounds like we missed a few bridge pool assignment files when collecting them using metrics-db. If this turns out to be problematic for your analysis, I can have a look why that happened.

December 2012 was examined because the data was relatively clean. Aside from the missing files, there were no files during that month with zero entries, but intact header information.

Regarding data, there might have been corruption in April 2012 with times 2012-04-17 07:00:24 and 2012-04-29 01:00:13 where the assignment pool for one entry each appears to be truncated to one character.

I'm unclear why you count uptime sessions of the same bridge as distinct entries. I'd think that bridges in the email or https buckets have higher overall uptime in the considered months than unallocated buckets.

As mentioned and as illustrated in the pool counts plot, there is some noise and there are files with zero entries. Using sessions seemed to be a reasonable proxy as the same measure was used across the pools.

In theory, bridges are not re-assigned to other buckets.

Good to know. The proportion of bridges in the unallocated pool appears to decrease over time. Have the proportions changed?

The fact that there are fewer entries for the unallocated pool may indicate that these bridges don't come back as often as bridges in the other pools. See my comment above about counting uptime sessions of the same bridge more than once.

Will consider cumulative uptime along with dropping the files with zero entries. The session counts are fairly similar across the pools, diverging for mean, Q3, and max.

comment:10 in reply to:  8 Changed 7 years ago by peer

Replying to karsten:

Sure, please post a metrics-tasks branch here, and I'll merge it. Thanks!

The initial version of the code is at https://bitbucket.org/peer_zero/metrics-tasks/commits/985797b7585edf217fb84f15a772dfee .

comment:11 Changed 7 years ago by peer

Status: needs_revisionneeds_review

If based on cumulative uptime for 2012 (again, approximated based on the number of pool assignment files that contain the hashed fingerprint), the uptime distributions are similar, with higher median, Q3, and max values for the unallocated pool. However, the unallocated pool does have fewer entries (5275 distinct fingerprints) than the email (10585) or https (10491) pools. The entry count might be affected by the decreased number of bridges in the unallocated pool after mid-2012.

              Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
email          1.0     2.0    11.0   607.9   101.0 17300.0 
https          1.0     2.0    12.0   620.8   106.5 17310.0 
unallocated    1.0     2.0    14.0   831.8   149.5 17310.0

Code: https://bitbucket.org/peer_zero/metrics-tasks/commits/a08ca1c2a671f87f12793482ba61bd30752cfa18

comment:12 Changed 3 years ago by karsten

Resolution: wontfix
Severity: Normal
Status: needs_reviewclosed

I just stumbled across this ticket while looking for something else. We stopped collecting bridge pool assignments in January 2015, so there's no point in doing this analysis anymore. Closing.

Note: See TracTickets for help on using tickets.