Opened 22 months ago

Closed 4 weeks ago

#18329 closed enhancement (implemented)

Let bridges indicate when they don't want BridgeDB to distribute their address

Reported by: karsten Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-bridge, stem
Cc: asn, dcf, isis, mrphs Actual Points:
Parent ID: Points: .2 remaining
Reviewer: isis Sponsor: SponsorM

Description

Right now, bridges can decide whether they want to be a public bridge that gets distributed via BridgeDB or a private bridge that only gets used by clients who learn its address via some other, private channel. The default is that a bridge is a public bridges, unless it sets PublishServerDescriptor 0 in its torrc file. This works fine with respect to BridgeDB not distributing private bridges. But a lesser known problem is that a bridge that doesn't publish its descriptor also does not contribute to bridge usage statistics on Metrics that are based on bridge extra-info descriptors.

The major use case that comes to mind is a bundled bridge whose address is shipped together with Tor Browser or another application. In the past we tried to remind operators of these bridges to also publish descriptors, so that their statistics are included on Metrics. But it turns out that some censors, who carefully scrape bridge addresses from BridgeDB, do not extract bridge addresses from the various bundles. Still, bundled bridges see a large number of bridge users and we should really include them in the statistics.

Another use case could be private bridges that somebody sets up for themselves and their friends. Maybe these operators would be fine contributing to the statistics if that doesn't automatically mean they need to share their bridge with other users.

I think this feature is relatively easy to build. We would need:

  • a new descriptor line "bridgedb off", or something even more intuitive and extensible, that tells BridgeDB that this bridge's address should not be distributed,
  • a new torrc option or extension of an existing option, maybe "PublishServerDescriptor bridge-auth" or, again, something more intuitive, to include the line above in the descriptor, and
  • an extension of BridgeDB to ignore bridges with this line when parsing descriptors.

Child Tickets

Change History (57)

comment:1 Changed 22 months ago by isis

Definitely related to, and possibly a duplicate of #13727 and #13504.

comment:2 Changed 22 months ago by dcf

I completely forgot that we had a discussion on this topic back in 2014.

https://lists.torproject.org/pipermail/tor-dev/2014-October/007610.html

I'm trying to remember what caused me to bring it up. At the time, we were in the middle of active probing research, and we definitely didn't want the public connecting to our honeypot bridges, so we set PublishServerDescriptor 0. I don't know why I would have cared about metrics for that experiment, though. Maybe it was some other reason.

Somewhat related are #7349 and https://lists.torproject.org/pipermail/tor-dev/2014-December/008028.html. A bridge might only want at PT port exposed through BridgeDB, because the presence of an easily detectable ORPort connection could blow the cover of, say, obfs4 on the same IP address.

The use case for this time is pretty narrow: testing whether censors scrape default bridges from the bundles, and how long it takes. The hypothesis that if a censor can scrape BridgeDB, then it can scrape bundles (therefore there is no reason to keep bundle bridges out of BridgeDB) is a reasonable one, but we can't assume it's true while running an experiment to test it. (It also would not surprise me if it is false: think of the times you unnecessarily elaborate engineering because you didn't adequately understand the problem; censors are not all world-class and they have these constraints too.) For this use case, I think the workaround of setting PublishServerDescriptor 1 while firewalling closed the ORPort is good enough.

My position is: all the Tor Browser default bridges should publish statistics, or else we are greatly undercounting users (the same goes for any bridge with a lot of users). Whether the Tor Browser default bridges also appear in BridgeDB is mostly immaterial, except in special cases like we're doing now. There are "default bridges" other than those in Tor Browser, for example those in Orbot; they don't have to be all the same or get burned at the same time. If we require all default bridges to be present in BridgeDB, and a censor is capable of scraping BridgeDB, then BridgeDB is a common weak link that will eventually get all the bridges blocked; whereas if they are not in BridgeDB, then it's as if we have different pools of bridges, the same way BridgeDB has smtp, http, etc. pools where the compromise of one doesn't lead to the compromise of another.

comment:3 Changed 22 months ago by mrphs

Cc: mrphs added

comment:4 Changed 16 months ago by nickm

Milestone: Tor: 0.3.0.x-final

comment:5 Changed 13 months ago by arma

Status: newneeds_review

My feature18329 branch implements this feature.

There's a new BridgeDistribution torrc option, and it passes along its argument into the new bridge-distribution-request line in the bridge descriptor.

I've left it deliberately opaque what the syntax is for the string -- the man page suggests that "none" is a good way to tell bridgedb you don't want it to give out your bridge address.

I've tested it on a running bridge and examined the resulting bridge descriptor, and it seems to work as intended.

comment:6 Changed 13 months ago by nickm

Keywords: review-group-12 added

comment:7 Changed 13 months ago by nickm

Status: needs_reviewneeds_revision

Looks fine; needs a spec patch though.

Also, I think that promising some the semantics for "none" would be smart.

And we shouldn't actually merge this IMO until we get BridgeDB to respect it. Otherwise we'll be telling people that they can do something that won't actually work yet.

comment:8 Changed 13 months ago by nickm

Reviewer: nickm

comment:9 Changed 13 months ago by nickm

Keywords: review-group-12 removed

comment:10 Changed 12 months ago by nickm

Points: .2 remaining

comment:11 Changed 11 months ago by isis

Keywords: stem added

This should have a Stem patch to get BridgeDB to easily be able to respect the new line. Otherwise, IIRC, there's an "unrecogised_fields" dict for descriptors parsed with Stem, and I think currently this would be ending up in there.

comment:12 in reply to:  11 Changed 11 months ago by isis

Replying to isis:

This should have a Stem patch to get BridgeDB to easily be able to respect the new line. Otherwise, IIRC, there's an "unrecogised_fields" dict for descriptors parsed with Stem, and I think currently this would be ending up in there.


I've created #21177 for tracking this.

comment:13 Changed 11 months ago by dgoulet

Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final
Status: needs_revisionneeds_review

We are about to freeze 030 and the patch from arma looks good to me. I agree with Nick that once BridgeDB has that capability (and documented!), we should merge this.

In the meantime, I'm moving this to 031 because I *doubt* very much that all the things will lined up by tomorrow. Also, here is an attempt at the spec patch: ticket18329_01

comment:14 Changed 11 months ago by atagar

Hi David. For what it's worth a link is a lot handier than just saying it. Here's the patch...

https://gitweb.torproject.org/user/dgoulet/torspec.git/commit/?h=ticket18329_01

First thought is that this needs to formally define what 'info' looks like. See the other fields for examples.

comment:15 Changed 11 months ago by dgoulet

Status: needs_reviewneeds_revision

comment:16 Changed 9 months ago by isis

Hey, minor feedback: can we have the "info" be a uint_8 or a uint_16 instead of an arbitrary string? We're not really expecting there to be more than three or four distribution methods, plus "none", so "none" could just be "BridgeDistribution 0" and we can assign arbitrary integers to the others, like "1" means "distribute through however you want", 2" means "distribute through email", "3" means "distribute through https", and etc. for future distributors. Then the default is just "BridgeDistribution 1" and we tell people who want to turn it off to set it to 0, and all the other wingnuttier uses can be found if people RTM.

Last edited 9 months ago by isis (previous) (diff)

comment:17 Changed 9 months ago by nickm

Hm. We do other stuff by number (consensus methods, protocol versions), so I think there's a precedent for that. What's the rationale?

comment:18 Changed 9 months ago by nickm

Hm, you meant in the torrc too though? For that, usability does matter, though I may be forgetting the lesson of the bikeshed.

comment:19 in reply to:  18 Changed 9 months ago by isis

Replying to nickm:

Hm, you meant in the torrc too though? For that, usability does matter, though I may be forgetting the lesson of the bikeshed.

Yes, I meant in the torrc too, mostly under the assumption that most people that most bridge operators won't need to touch it. I'm no UX expert though, so I can default to whatever the results of the bikeshed were.

comment:20 Changed 9 months ago by nickm

In my role as bikeshed factotum, I'd suggest that we just go with names here, with Tor restricted to only accept recognized names if they appear in torrc.

comment:21 Changed 7 months ago by arma

Looks like we're trying to make this ticket more complicated than it is.

The feature18329 branch is almost entirely about the new torrc option, so yes, it is all user-facing. I think we should stick with words, like "none", "email", "https", which makes it easily extensible to more values, without messing with Tor at all, if the future brings us new distribution strategies.

(If we used numbers instead, we would need to let the user configure a number that this Tor version doesn't know about, so people can choose other distribution approaches later. But then you'd need to keep everybody synchronized on what the numbers mean, and people could still list numbers that you haven't picked a definition for yet. tl;dr, I don't think it would solve much to use numbers, and I think it would indeed hurt the usability.)

The BridgeDB behavior can be very simple: if the field is there and bridgedb recognizes the value in the field, do the right thing with it, else if the field is there and bridgedb doesn't recognize it, don't distribute that bridge.

I agree with Damian that we could be a bit clearer on the torspec side. I think what dgoulet was going for was the same spec as the Contact field has. From Tor's perspective, it is essentially an "anything goes" string. We could change it to say "set of 'key and optional value' fields" or something if we liked, but I think the only effect of trying to constrain it here will be producing bugs in stem later if we decide to change it. Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.

Nick, what did you mean by "with Tor restricted to only accept recognized names if they appear in torrc"? You're hoping that we teach Tor more about the possible values of the field, and that it says something like "nope, never heard of that one, you can't set it" for ones it doesn't know about? Does that mean that people running Tor on (say) version 0.3.1.x will need to upgrade in order to list a name that we started using while (say) Tor 0.3.2.x was current? Why would we do that?

comment:22 Changed 7 months ago by atagar

Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.

Hi Roger, the contact and platform fields would be particularly bad ones to emulate. They're the only fields in the spec that allow arbitrary bytes (they don't even need to be any form of recognizable text... and due to that in practice sometimes aren't).

It was a mistake to allow the contact and platform fields to be so nebulous. Hopefully we won't do that with any new ones. ;)

comment:23 in reply to:  22 Changed 7 months ago by arma

Replying to atagar:

It was a mistake to allow the contact and platform fields to be so nebulous. Hopefully we won't do that with any new ones. ;)

What was the mistake?

comment:24 Changed 7 months ago by atagar

The mistake was letting these fields be so ill defined that they can be anything. Even garbage that is completely nonsensical and unparseable.

comment:25 Changed 7 months ago by Sebastian

I'm strongly considering a proposal where any document we produce must be valid utf-8. This has a pretty small fallout currently (only 2 relays' descriptors would be rejected). I think we should just fix that mistake once and for all.

comment:26 Changed 7 months ago by atagar

I'd love that. Definitely has my vote! :P

comment:27 in reply to:  25 Changed 7 months ago by isis

Replying to Sebastian:

I'm strongly considering a proposal where any document we produce must be valid utf-8. This has a pretty small fallout currently (only 2 relays' descriptors would be rejected). I think we should just fix that mistake once and for all.


+1

comment:28 Changed 7 months ago by catalyst

Cc: catalyst added

The character set stuff should probably move to #18938?

comment:29 in reply to:  21 Changed 7 months ago by isis

Cc: catalyst removed

Replying to arma:

[…]

I don't think it would solve much to use numbers, and I think it would indeed hurt the usability.


That's fair; you've got me convinced that the numbers thing would be bad usability-wise.

However, I'm still against "any bytes, any length, yolo."

The BridgeDB behavior can be very simple: if the field is there and bridgedb recognizes the value in the field, do the right thing with it, else if the field is there and bridgedb doesn't recognize it, don't distribute that bridge.


Why would we waste bandwidth passing around descriptors with junk in them? Why not just filter the junk during descriptor creation?

Nick, what did you mean by "with Tor restricted to only accept recognized names if they appear in torrc"? You're hoping that we teach Tor more about the possible values of the field, and that it says something like "nope, never heard of that one, you can't set it" for ones it doesn't know about? Does that mean that people running Tor on (say) version 0.3.1.x will need to upgrade in order to list a name that we started using while (say) Tor 0.3.2.x was current? Why would we do that?


Not to speak for Nick, but I think he meant just as you described: tor will say, "If you try to set a value which isn't in the list of things I recognise, I'm going to not include it in your descriptor."

To make this simpler moving forward, the whitelist should currently include the following recognised values: none, any, https, email, moat, and hyphae. The default is any.

This covers all current distributors, and the two which are planned for this year. (moat is the new distributor which sends Tor Launcher a CAPTCHA via a meek channel, and essentially acts exactly like the https distributor, but in XUL rather than HTML, in Tor Launcher. hyphae is the social distributor mentioned in #7520 and detailed here.) If we need to add a new distributor in the future — which I don't expect — you're correct that we'll need to get it added to tor's distributor whitelist. This is solvable by opening a ticket to get the new name added to the list, before starting work to build the distributor; this way, the name is likely recognised by the time the distributor is deployed.

I agree with Damian that we could be a bit clearer on the torspec side. I think what dgoulet was going for was the same spec as the Contact field has. From Tor's perspective, it is essentially an "anything goes" string. We could change it to say "set of 'key and optional value' fields" or something if we liked, but I think the only effect of trying to constrain it here will be producing bugs in stem later if we decide to change it. Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.


Here's draft text describing the torrc option:

"bridge-distribution-request" SP method NL

   [At most once]

   Each "method" describes how a Bridge address is distributed by BridgeDB. 
   Recognized methods are: "none", "any", "https", "email", "moat", "hyphae".
   If set to "none", BridgeDB will avoid distributing your bridge address.
   If set to "any", BridgeDB will choose how to distribute your bridge
   address. Choosing any of the other methods will tell BridgeDB to distribute
   your bridge via a specific method:

      * "https" specifies distribution via the web interface at
         https://bridges.torproject.org;
      * "email" specifies distribution via the email autoresponder at
        bridges@torproject.org;
      * "moat" specifies distribution via an interactive menu inside Tor
        Browser; and
      * "hyphae" specifies distribution via a cryptographically-secure,
        invitation-based system.

   (Default: "any")

Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)

comment:30 Changed 7 months ago by atagar

Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)

Hi Isis, looks good except that we should specify what characters future methods can consist of. For instance, can future method names have spaces in it?

Each "method" describes how...

Use of the word 'each' here confuses me since the field can only appear once, and according to this can have only one value on this line.

comment:31 Changed 7 months ago by isis

Cc: catalyst added

(Oops, I accidentally removed catalyst! Sorry!)

comment:32 in reply to:  5 Changed 7 months ago by dcf

Cc: catalyst removed

Replying to arma:

My feature18329 branch implements this feature.

There's a new BridgeDistribution torrc option, and it passes along its argument into the new bridge-distribution-request line in the bridge descriptor.

https://gitweb.torproject.org/arma/tor.git/commit/?h=feature18329&id=da46e73142bd522f8ac7dfc9d1f113c6281aea85

Do we care about potentially having different distribution settings for the potentially multiple transports that may be defined in torrc? If someone has both ServerTransportPlugin obfs3 and ServerTransportPlugin obfs4, say, there would be no way to give them different BridgeDistribution settings.

Currently some torrc options are keyed on transport name (e.g. ServerTransportPlugin, ServerTransportListenAddr, ServerTransportOptions). Ought BridgeDistribution to be like that too, or is it too niche a use case?

By transport name would be the finest possible granularity. It would be cool if you could run multiple instances of obfs4 with different BridgeDistribution options, but you can't actually express multiple instances of the same transport name in torrc (see #11211). So if someone wanted to do that, they would have to run multiple instances of tor with multiple fingerprints anyway.

comment:33 in reply to:  30 Changed 7 months ago by isis

Replying to atagar:

Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)

Hi Isis, looks good except that we should specify what characters future methods can consist of. For instance, can future method names have spaces in it?


Ah, good point. I'll specify printable ascii, no whitespace, plus - and _. So (KeywordChar | _)+ (and we'll optionally lowercase the letters input from the torrc). Does that sound okay?

Each "method" describes how...

Use of the word 'each' here confuses me since the field can only appear once, and according to this can have only one value on this line.


You're right, that should be taken out. (I first started writing something where the user could specify multiple comma-separated methods, e.g. email,https would choose email or https, but the logic for this is complicated and confusing, so I revised to allow just one "method".)

Here's the new draft:

    "bridge-distribution-request" SP Method NL                                  
                                                                                
        [At most once]                                                          
                                                                                
        The "Method" describes how a Bridge address is distributed by           
        BridgeDB. Recognized methods are: "none", "any", "https", "email",      
        "moat", "hyphae". If set to "none", BridgeDB will avoid distributing    
        your bridge address. If set to "any", BridgeDB will choose how to       
        distribute your bridge address. Choosing any of the other methods will  
        tell BridgeDB to distribute your bridge via a specific method:          
                                                                                
          - "https" specifies distribution via the web interface at             
             https://bridges.torproject.org;                                    
          - "email" specifies distribution via the email autoresponder at       
            bridges@torproject.org;                                             
          - "moat" specifies distribution via an interactive menu inside Tor    
            Browser; and                                                        
          - "hyphae" specifies distribution via a cryptographically-secure,     
            invitation-based system.                                            
                                                                                
        Potential future "Method" specifiers must be as follows:                
            Method = (KeywordChar | "_") +                                      
                                                                                
        (Default: "any")                                                        

comment:34 Changed 7 months ago by atagar

Thanks Isis, looks good to me.

comment:35 in reply to:  34 ; Changed 7 months ago by isis

Replying to atagar:

Thanks Isis, looks good to me.


Okay, thanks for reviewing! The torspec patch is in my ticket18329_02 branch (based on David's branch from above).

So the next step is to go over Roger's patch again and see what needs tweaking.

comment:36 Changed 6 months ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final

comment:37 Changed 3 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

Defer all needs_revision non-spec enhancements to 0.3.3.

comment:38 in reply to:  35 Changed 2 months ago by isis

Replying to isis:

Replying to atagar:

Thanks Isis, looks good to me.


Okay, thanks for reviewing! The torspec patch is in my ticket18329_02 branch (based on David's branch from above).


Hey nickm, do you think you could merge the spec patch so that atagar (or someone) can start on #21177? Thanks!

comment:39 Changed 2 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.2.x-final
Status: needs_revisionneeds_review

comment:40 Changed 2 months ago by nickm

Merged the spec patch, with an addendum about implementation status. Please let me know if it needs changes.

comment:41 Changed 7 weeks ago by nickm

Keywords: review-group-24 added

review-group-24 is now open.

comment:42 Changed 7 weeks ago by nickm

Owner: set to nickm
Status: needs_reviewaccepted

In my branch feature18329_029, I've rebased Roger's branch onto 0.2.9, and added some additional checks and documentation.

Most notably, I've made it so a bridge will always insert this kind of entry in its descriptor, whether or not the user has changed it from the default.

comment:43 Changed 7 weeks ago by nickm

Status: acceptedneeds_review

comment:44 Changed 7 weeks ago by nickm

Sponsor: SponsorM

comment:45 Changed 7 weeks ago by nickm

Reviewer: nickm

comment:46 Changed 7 weeks ago by isis

Reviewer: isis

Oh shoot. I also wrote a patch for this last week, but didn't update the ticket.

I'll gladly review and then see if there's anything left from my patch that's useful (maybe unittests are still a useful addition). If so, I'll pick those parts out and put them in a new commit on top.

comment:47 Changed 7 weeks ago by isis

BTW, this also makes it trivial to fix #16564, since DirAuths can check for a "bridge-distribution" line in descriptors submitted to them and not include in the consensus those which do have it.

comment:48 Changed 7 weeks ago by nickm

Great! Also feel free to declare that your patch is just plain better, if you think that's the right call.

comment:49 Changed 7 weeks ago by isis

Okay, I reviewed your code, nickm, and it LGTM. I made two new commits on top of it, in my feature18329_029 branch, one to steal a docstring from my patch, and another to steal the unittests I made. If you want to merge just your commit or also the tests/docs, either's fine by me.

comment:50 Changed 7 weeks ago by nickm

Looks good! One question: Are you sure about the commit "config: Only allow specified characters in BridgeDistribution lines."? The way I read dir-spec, I thought that "-" was included as an allowable KeywordChar.

Right now, after we resolve that issue, here's what I'm planning: I'll squash the fixup commit and we'll have a new branch. We can take the whole branch in 0.3.2 immediately, and if it works out, we can backport to 029 and earlier. But we probably won't make (and shouldn't try to make) the releases planned for tomorrow with this patch. Sound ok with you?

comment:51 Changed 7 weeks ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.1.x-final
Status: needs_reviewmerge_ready

Squashed, with commit removed, as feature18329_029_squashed.

Merging that to 0.2.8 and forward; marking for possible backport.

comment:52 Changed 7 weeks ago by nickm

Oops. I meant "merging that to 0.3.2" and forward.

comment:53 in reply to:  50 Changed 7 weeks ago by isis

Replying to nickm:

Looks good! One question: Are you sure about the commit "config: Only allow specified characters in BridgeDistribution lines."? The way I read dir-spec, I thought that "-" was included as an allowable KeywordChar.

Right now, after we resolve that issue, here's what I'm planning: I'll squash the fixup commit and we'll have a new branch. We can take the whole branch in 0.3.2 immediately, and if it works out, we can backport to 029 and earlier. But we probably won't make (and shouldn't try to make) the releases planned for tomorrow with this patch. Sound ok with you?


Yep, sounds good! Thanks!

comment:54 Changed 7 weeks ago by nickm

Keywords: review-group-24 removed

comment:55 Changed 4 weeks ago by nickm

Also here is an even smaller version we can backport to the very oldest supported versions: ticket18329_minimal_025

comment:56 in reply to:  55 Changed 4 weeks ago by isis

Replying to nickm:

Also here is an even smaller version we can backport to the very oldest supported versions: ticket18329_minimal_025


LGTM!

comment:57 Changed 4 weeks ago by nickm

Resolution: implemented
Status: merge_readyclosed

Merged the minimal version to 0.2.5 and forward to 0.3.1.

Note: See TracTickets for help on using tickets.