Opened 7 months ago

Closed 5 months ago

Last modified 5 months ago

#33008 closed enhancement (implemented)

Display a bridge's distribution bucket

Reported by: phw Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Relay Search Version:
Severity: Normal Keywords: s30-o24a1, anti-censorship-roadmap-2020Q1 metrics-team-roadmap-2020Q1
Cc: arma, cohosh, gaba Actual Points:
Parent ID: #31281 Points: 2
Reviewer: cohosh Sponsor: Sponsor30-can

Description

Bridge operators often want to know what distribution bucket their bridge fell into. Since #29480, one can find out by inspecting our archived bridge pool assignments but that's cumbersome and not user friendly. We should instead show the bucket on the bridge's relay search page. How can we get this done?

Child Tickets

Attachments (3)

mockup.jpeg (247.8 KB) - added by phw 7 months ago.
Relay Search mockup
bridgedb-info-page.png (176.1 KB) - added by phw 5 months ago.
BridgeDB info page
bridgedb-info-page-1.png (152.5 KB) - added by phw 5 months ago.
Revised BridgeDB info page

Download all attachments as: .zip

Change History (36)

comment:1 Changed 7 months ago by arma

I am excited about this one, because it's the culmination of all the back-end work: just having the data in a data set is a good step zero, but giving it to users (bridge operators) in a usable way is where it all needs to lead.

comment:2 Changed 7 months ago by karsten

Here are the steps for getting this done (with very rough estimates with 1 point == 1 workday):

  1. Decide where to add bridge distribution information on Relay Search. For example, would we simply want to display something like "https ip=4,6 ring=2 transport=websocket,fte,obfs3,scramblesuit,obfs4", or would we want to structure that information somehow for the user or leave out less relevant parts? (0.5 points)
  2. Specify one or more fields to be added to Onionoo's bridge details documents, following requirements from the first step above. (0.25 points)
  3. Extend Onionoo to fetch bridge pool assignments from CollecTor, store the latest bridge pool assignment for each bridge in its details status document, and write this information to the bridge's details document. (1 point)
  4. Add bridge distribution information to Relay Search as specified in the first step, using the data from the updated Onionoo protocol version as specified in the second step. (0.25 points)

I can do steps 2 and 3, and irl should do (or review) steps 1 and 4.

comment:3 Changed 7 months ago by phw

Once this is done, we shouldn't forget to update our documentation. We want to tell people how they can learn their bridge's distribution bucket, and what they can expect for a given bucket.

comment:4 Changed 7 months ago by irl

This sounds like lots of things coming together nicely, we should close this loop.

On step 1, could an anti-censorship person come up with an MS paint quality mock up of what data should go where? I'm not familiar with how this works exactly or what bridge operators may have for context.

Happy to do step 1 with that input, and step 4. I can also review 2 and 3, some of the input on step 1 should feed into step 2.

Changed 7 months ago by phw

Attachment: mockup.jpeg added

Relay Search mockup

comment:5 Changed 7 months ago by arma

I agree, this mockup is what I was going to suggest too.

And then the "https" should be a link to an anchor in an external document that the anti-censorship team maintains, which has anchors for each distribution strategy.

(Actually phw, do you want the word 'bucket' or should we go with something like 'strategy' so we won't be explaining to everybody why it's a bucket?)

comment:6 in reply to:  2 Changed 7 months ago by phw

Replying to karsten:

  1. Decide where to add bridge distribution information on Relay Search. For example, would we simply want to display something like "https ip=4,6 ring=2 transport=websocket,fte,obfs3,scramblesuit,obfs4", or would we want to structure that information somehow for the user or leave out less relevant parts? (0.5 points)


I suggest only displaying the bucket, which would be https in your example. We're already displaying the supported transport protocols in a separate field and I don't think it's useful to expose the BridgeDB ring. That said, here's a simple mockup:

Relay Search mockup

comment:7 Changed 6 months ago by karsten

Status: newneeds_review

Okay, if it's just the distributor bucket/strategy that we want to display, the Onionoo side is relatively simple. Please review commit 633aa3e in my metrics-web task-33008 branch for step 2 above and commit ff2db94 in my Onionoo task-33008 branch for step 3 above.

comment:8 in reply to:  5 ; Changed 6 months ago by phw

Replying to arma:

And then the "https" should be a link to an anchor in an external document that the anti-censorship team maintains, which has anchors for each distribution strategy.


That's a good idea. This could be a new page on BridgeDB, e.g., bridges.torproject.org/info.

(Actually phw, do you want the word 'bucket' or should we go with something like 'strategy' so we won't be explaining to everybody why it's a bucket?)


I would suggest "bridge distribution mechanism".

comment:9 in reply to:  8 Changed 6 months ago by arma

Replying to phw:

(Actually phw, do you want the word 'bucket' or should we go with something like 'strategy' so we won't be explaining to everybody why it's a bucket?)


I would suggest "bridge distribution mechanism".

Sounds great to me.

comment:10 Changed 6 months ago by gaba

Keywords: anti-censorship-roadmap-2020Q1 added

comment:11 in reply to:  7 Changed 6 months ago by irl

Status: needs_reviewmerge_ready

Replying to karsten:

Okay, if it's just the distributor bucket/strategy that we want to display, the Onionoo side is relatively simple. Please review commit 633aa3e in my metrics-web task-33008 branch for step 2 above and commit ff2db94 in my Onionoo task-33008 branch for step 3 above.

LGTM.

The bridge model will need to be extended in relay search, and an extra row on the table. If you want to have a go at that it should be an easy change.

comment:12 Changed 6 months ago by gaba

Keywords: metrics-team-roadmap-2020Q1 added

comment:13 Changed 6 months ago by karsten

Points: 32

Onionoo patch merged, released, and deployed. metrics-web patch rebased and extended by that easy change to put into Relay Search, and deployed. Please give it a try!

What remains is that "https" and the other distribution mechanisms link to an anchor in an external document. As far as I can see, that page would need anchors for "email", "https", "moat", and "unallocated". Ideally, anchors would be the exact strings as these distribution mechanisms, like https://bridges.torproject.org/info#unallocated.

Should that be a new ticket, or is creating that page a quick task on your side?

comment:14 in reply to:  13 Changed 5 months ago by phw

Replying to karsten:

Should that be a new ticket, or is creating that page a quick task on your side?


It's a relatively quick task, so let's use this ticket. I'll try to create a page over the next few days.

Changed 5 months ago by phw

Attachment: bridgedb-info-page.png added

BridgeDB info page

comment:15 Changed 5 months ago by phw

Reviewer: cohosh
Status: merge_readyneeds_review

This commit adds an info page to BridgeDB. Cecylia, can you please review the patch?

Below is a screenshot of what the page looks like. Note that the target audience for this page is primarily bridge operators.

BridgeDB info page

comment:16 Changed 5 months ago by cohosh

This is some cool work!

Here are my comments on the website:

  • Is this the main https://bridges.torproject.org page? If so, the steps for adding bridges are gone and it's unclear to me what the press to actually get bridges from this page.I'd suggest keeping the steps and adding this extra info at the very bottom of the page.
  • "Unallocated" isn't a very simple or descriptive word to describe that bucket. Can we use "private" instead? Perhaps this is too late in the game to change it, but it seems a bit contradictory since these bridges are allocated to the unallocated bucket.
  • This corresponds a bit to the point above, but we could change the description of the HTTPS bucket to be more clear and include a link to the page where you actually submit your request.
  • There's repeated information on this page between the description of the Email bucked and the section on I need an alternative way of getting bridges! below it. Can we condense these into the same section? And it would be great if the resulting section had a mailto: link.
  • This is a nit, but there is some mixing of second and third person between the old and new content on this page. I think this is fine, but should be done intentionally.

comment:17 Changed 5 months ago by cohosh

Status: needs_reviewneeds_revision

comment:18 in reply to:  16 Changed 5 months ago by computer_freak

Replying to cohosh:

  • "Unallocated" isn't a very simple or descriptive word to describe that bucket. Can we use "private" instead? Perhaps this is too late in the game to change it, but it seems a bit contradictory since these bridges are allocated to the unallocated bucket.

How about "unreleased" or "unpublished" or "reserve" ?

Last edited 5 months ago by computer_freak (previous) (diff)

comment:19 in reply to:  16 ; Changed 5 months ago by phw

Status: needs_revisionneeds_review

Replying to cohosh:

  • Is this the main https://bridges.torproject.org page? If so, the steps for adding bridges are gone and it's unclear to me what the press to actually get bridges from this page.I'd suggest keeping the steps and adding this extra info at the very bottom of the page.


No, this page will live at bridges.torproject.org/info. For now, only Relay Search will link to it, so BridgeDB users won't see it. In the future, we can use the new /info page to add additional documentation.

  • "Unallocated" isn't a very simple or descriptive word to describe that bucket. Can we use "private" instead? Perhaps this is too late in the game to change it, but it seems a bit contradictory since these bridges are allocated to the unallocated bucket.


Yes, I see your point. I don't like "private" because we already use that term for bridges that don't report themselves to the authority. I like computer_freak's suggestion of "reserved" but I actually prefer keeping "unallocated" because the cost of changing this term seems to outweigh the benefit of using a somewhat more descriptive term.

I wonder what Karsten thinks?

  • This corresponds a bit to the point above, but we could change the description of the HTTPS bucket to be more clear and include a link to the page where you actually submit your request.


Good idea, done.

  • There's repeated information on this page between the description of the Email bucked and the section on I need an alternative way of getting bridges! below it. Can we condense these into the same section? And it would be great if the resulting section had a mailto: link.


Right, that's because BridgeDB includes a short FAQ section at the bottom of each page. I agree that we don't want that here, so I made the embedding of the FAQ conditional. I also added a mailto: link.

  • This is a nit, but there is some mixing of second and third person between the old and new content on this page. I think this is fine, but should be done intentionally.


I believe this is fixed, now that we removed the FAQ?

To make the review easier, I addressed your feedback in a separate patch, which I will later squash:
https://github.com/NullHypothesis/bridgedb/commit/b39e576eff8ac5ea9436fa5239a53f5edac11911

comment:20 in reply to:  19 ; Changed 5 months ago by karsten

Replying to phw:

Replying to cohosh:

  • "Unallocated" isn't a very simple or descriptive word to describe that bucket. Can we use "private" instead? Perhaps this is too late in the game to change it, but it seems a bit contradictory since these bridges are allocated to the unallocated bucket.


Yes, I see your point. I don't like "private" because we already use that term for bridges that don't report themselves to the authority. I like computer_freak's suggestion of "reserved" but I actually prefer keeping "unallocated" because the cost of changing this term seems to outweigh the benefit of using a somewhat more descriptive term.

I wonder what Karsten thinks?

My initial thought was that we shouldn't change the term, because bridge pool assignment files contain it and because Onionoo includes it in its response.

But I think we need to consider something else here. Bridge operators can request in their torrc file how their bridge is going to be distributed. Recognized methods are: "none", "any", "https", "email", "moat".

Maybe we'll have to say "None" here rather than "Unallocated"?

Note that case doesn't matter in case of configuring this in the torrc file. "HTTPS" is accepted just like "https" or "hTtPs" are. So it's fine to write "HTTPS".

To make this even more complicated, it turns out that a non-zero number of bridges does not have BridgeDB distribution information:

  • 553 moat
  • 505 https
  • 191 email
  • 76 none
  • 37 unallocated

The 37 "unallocated" bridges are the ones we're talking about above.

But I'm not yet sure why those 76 bridges are not included in any distributor, not even the "unallocated" distributor. It could be that they're too new (bridge pool assignment files are only synced once per day at UTC midnight). It could have other reasons like older tor versions.

In any case it seems possible that a bridge will show up with "none" in Relay Search, and we might have to provide information on BridgeDB's information page what that means. In a way these bridges are truly unallocated.

Hmm. Hmm.

comment:21 in reply to:  19 Changed 5 months ago by cohosh

Replying to phw:

To make the review easier, I addressed your feedback in a separate patch, which I will later squash:
https://github.com/NullHypothesis/bridgedb/commit/b39e576eff8ac5ea9436fa5239a53f5edac11911

Thanks, these changes look good to me. As a further suggestion, I'd also suggest changing the HTTPS text to be something like:
"... hands out bridges over this website. To get bridges, go to <link>, enter your preferences, and solve the CAPTCHA."

this provides some more detail to the HTTPS instructions, and I find "this website" to be less ambiguous than "the site you're looking at". Maybe we could ask antonela her thoughts on this.

comment:22 in reply to:  20 Changed 5 months ago by phw

Replying to karsten:

But I think we need to consider something else here. Bridge operators can request in their torrc file how their bridge is going to be distributed. Recognized methods are: "none", "any", "https", "email", "moat".


If a bridge sets BridgeDistribution none in its config file, BridgeDB will discard the bridge's descriptor. Bridges may end up in the "unallocated" bucket if they set BridgeDistribution any (which is the default), in which case BridgeDB may toss them into "unallocated".

But I'm not yet sure why those 76 bridges are not included in any distributor, not even the "unallocated" distributor. It could be that they're too new (bridge pool assignment files are only synced once per day at UTC midnight). It could have other reasons like older tor versions.


We encourage people to set BridgeDistribution none if they want their bridge to show up on Relay Search, but don't want BridgeDB to distribute it. Most of our default bridges fall into that category.

In any case it seems possible that a bridge will show up with "none" in Relay Search, and we might have to provide information on BridgeDB's information page what that means. In a way these bridges are truly unallocated.


Oh, you are right. That's a great point that I had not considered. Now that we have both "unallocated" and "none", it seems more important to rename "unallocated" to "reserved". It doesn't seem too difficult to change every occurrence of "unallocated" in BridgeDB. How is the Metrics side looking?

comment:23 Changed 5 months ago by karsten

Okay, I agree that we should distinguish five bridge distribution mechanisms in Relay Search with links to BridgeDB's information page:

  • "HTTPS", "Email", and "Moat";
  • "Reserved": also known as "unallocated" in bridge pool assignment files which most bridge operators will never hear about; and
  • "None": either not distributed by BridgeDB as requested by the bridge operator, or distributed via one of the four other mechanisms but too new for Relay Search to know. (The info page should probably mention both possibilities.)

If this makes sense, we can tell Relay Search to display these terms (using this capitalization) rather than the raw strings it receives from bridge pool assignment files.

Regarding a possible change to BridgeDB to actually rename these strings in bridge pool assignment files, I'd rather want to avoid that. There's not really a spec for bridge pool assignment files where we could write down when we changed "unallocated" to "reserved" and why. Soon we'd forgot why we renamed this string and whether "unallocated" and "reserved" are actually the same thing or not. It's a bit like onion service directories still using relay flag "HSDir" rather than "OSDir". Historically, "unallocated" was the correct term when the only alternatives were to allocate a bridge to the HTTPS or Email distributor. It's just a bit less correct since there's now another alternative to really not assign a bridge to any distributor and instead drop it.

comment:24 in reply to:  23 ; Changed 5 months ago by phw

Replying to karsten:

Okay, I agree that we should distinguish five bridge distribution mechanisms in Relay Search with links to BridgeDB's information page:

  • "HTTPS", "Email", and "Moat";
  • "Reserved": also known as "unallocated" in bridge pool assignment files which most bridge operators will never hear about; and
  • "None": either not distributed by BridgeDB as requested by the bridge operator, or distributed via one of the four other mechanisms but too new for Relay Search to know. (The info page should probably mention both possibilities.)

If this makes sense, we can tell Relay Search to display these terms (using this capitalization) rather than the raw strings it receives from bridge pool assignment files.


Yes, this sounds good to me. BridgeDB's new info page will have anchors for all (https, moat, email, reserved, none), for example: bridges.torproject.org/info#https. Does this work for you?

Regarding a possible change to BridgeDB to actually rename these strings in bridge pool assignment files, I'd rather want to avoid that. There's not really a spec for bridge pool assignment files where we could write down when we changed "unallocated" to "reserved" and why. Soon we'd forgot why we renamed this string and whether "unallocated" and "reserved" are actually the same thing or not. It's a bit like onion service directories still using relay flag "HSDir" rather than "OSDir". Historically, "unallocated" was the correct term when the only alternatives were to allocate a bridge to the HTTPS or Email distributor. It's just a bit less correct since there's now another alternative to really not assign a bridge to any distributor and instead drop it.


Ok, then I would suggest calling it "Reserved" on both Relay Search and on BridgeDB's info page but leaving it as "unallocated" in the bridge pool assignment. I'll explain this discrepancy on the info page to minimise confusion.

Changed 5 months ago by phw

Attachment: bridgedb-info-page-1.png added

Revised BridgeDB info page

comment:25 Changed 5 months ago by phw

I addressed Karsten's and Cecylia's feedback in a new commit. Here's what the info page now looks like:

Revised BridgeDB info page

comment:26 in reply to:  24 ; Changed 5 months ago by karsten

Replying to phw:

Replying to karsten:

  • "None": either not distributed by BridgeDB as requested by the bridge operator, or distributed via one of the four other mechanisms but too new for Relay Search to know. (The info page should probably mention both possibilities.)

Your latest screenshot doesn't say anything about that second possibility of assignment information not being propagated between services yet. I could imagine that impatient new bridge operators will ask why their bridge ended up in the None bucket. If you left this note out on purpose, maybe in order to keep things short, that's fine by me.

If this makes sense, we can tell Relay Search to display these terms (using this capitalization) rather than the raw strings it receives from bridge pool assignment files.


Yes, this sounds good to me. BridgeDB's new info page will have anchors for all (https, moat, email, reserved, none), for example: bridges.torproject.org/info#https. Does this work for you?

Yes, this works. I have a patch here that I can deploy when that page exists. Please let me know when that is the case, and I'll deploy it.

comment:27 in reply to:  26 Changed 5 months ago by phw

Replying to karsten:

Replying to phw:

Replying to karsten:

  • "None": either not distributed by BridgeDB as requested by the bridge operator, or distributed via one of the four other mechanisms but too new for Relay Search to know. (The info page should probably mention both possibilities.)

Your latest screenshot doesn't say anything about that second possibility of assignment information not being propagated between services yet. I could imagine that impatient new bridge operators will ask why their bridge ended up in the None bucket. If you left this note out on purpose, maybe in order to keep things short, that's fine by me.


Yes, that's an oversight. How about this:

Bridges whose distribution mechanism is "None" are not distributed by BridgeDB. It is the bridge operator's responsibility to distribute their bridges to users. Note that on Relay Search, a freshly set up bridge's distribution mechanism says "None" for a while. Be a bit patient, and it will then change to the bridge's actual distribution mechanism.


Do we have an approximate time frame within which Relay Search should go from "None" to the bridge's actual distribution mechanism?

comment:28 Changed 5 months ago by karsten

The new text looks good to me!

The delay between BridgeDB assigning a new bridge to a distributor and Relay Search learning about it is roughly linearly distributed from 1 to 25 hours. For example, the bridge pool assignments file written by BridgeDB at 2020-03-16T00:01:45Z was archived by CollecTor at 2020-03-17T00:09:00Z and would be processed by Onionoo at about 2020-03-17T00:45:00Z. That's the worst case scenario, though. How about you write something vague like "usually within one day" and keep the "be patient" part? :)

comment:29 in reply to:  28 Changed 5 months ago by phw

Replying to karsten:

The delay between BridgeDB assigning a new bridge to a distributor and Relay Search learning about it is roughly linearly distributed from 1 to 25 hours. For example, the bridge pool assignments file written by BridgeDB at 2020-03-16T00:01:45Z was archived by CollecTor at 2020-03-17T00:09:00Z and would be processed by Onionoo at about 2020-03-17T00:45:00Z. That's the worst case scenario, though. How about you write something vague like "usually within one day" and keep the "be patient" part? :)


Perfect, thanks Karsten.

Cecylia, can you please review the following three commits?
https://github.com/NullHypothesis/bridgedb/compare/b39e576...enhancement/33008

comment:30 Changed 5 months ago by cohosh

Status: needs_reviewmerge_ready

The changes look good, phw! I like the new phrasing.

My only comment is a super unimportant nit that a few lines have broken weirdly.

comment:31 in reply to:  30 Changed 5 months ago by phw

Replying to cohosh:

The changes look good, phw! I like the new phrasing.


Thanks! Merged in f23c021 and now live here.

My only comment is a super unimportant nit that a few lines have broken weirdly.


Ah, right. The file bridgedb.pot is automatically generated by running python setup.py extract_messages. We can ignore that because we never edit bridgedb.pot manually.

Karsten, I'm leaving this ticket open for the remaining work on your side, ok?

comment:32 Changed 5 months ago by karsten

Resolution: implemented
Status: merge_readyclosed

Deployed on metrics.tp.o, too. Closing. Thanks!

comment:33 Changed 5 months ago by computer_freak

I feel very pleasured that my morning-brainfart to call it "reserved" already is a thing now!

Thank you all for being such an open community:)

Note: See TracTickets for help on using tickets.