Opened 8 years ago

Closed 7 years ago

#4297 closed enhancement (fixed)

Teach bridgedb how to handle descriptors with IPv6 addresses

Reported by: ln5 Owned by: aagbsn
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords: ipv6
Cc: aagbsn, arma Actual Points:
Parent ID: #4563 Points:
Reviewer: Sponsor:

Description

Proposal 186 defines a new descriptor syntax.
Bridgedb needs to understand that.
Related to #3563.

Child Tickets

TicketTypeStatusOwnerSummary
#5947defectclosedaagbsnBridgeDB should parse "a lines" from networkstatus-bridges
#5948defectclosedaagbsnBridgeDB should respond with the same address:port to each request

Change History (19)

comment:1 Changed 8 years ago by karsten

Type: taskenhancement

I just looked at Bridges.py and ran a quick test to see what BridgeDB does when giving it a bridge descriptor or bridge network status as described in proposal 186. The result is that BridgeDB simply ignores the "or-address" and "a" lines as described in proposal 186. BridgeDB continues to give out the bridge's IPv4 address.

That's good news in that we can deploy the proposal 186 changes without worrying about BridgeDB.

But of course, if we want BridgeDB to give out IPv6 addresses, we'll have do what the ticket subject says.

comment:2 Changed 7 years ago by ln5

Owner: set to ln5
Status: newaccepted

comment:3 Changed 7 years ago by ln5

Parent ID: #3563

comment:4 Changed 7 years ago by aagbsn

Status: acceptedneeds_review

progress update:

code:

https://github.com/aagbsn/bridgedb/tree/4297-ipv6-bridges

live examples, (populated with fake bridges):

https://tor.extc.org
https://tor.extc.org/ipv6

note that address/port combinations returned by bridgedb are selected from the or-address lines of the bridge. The randomly generated examples are fully populated (8 or-address lines, 16 portspec entries), so the bridge lines appear random but actually aren't. Perhaps this is the wrong approach. Comments?

remaining:

ipv6 FILE_BUCKETS, ipv6 client connections (untested), better tests for class PortList

blocking:

sanity check/code review.
Can we allocate a VM for soft-launch/testing?

comment:5 in reply to:  4 Changed 7 years ago by aagbsn

Can we allocate a VM for soft-launch/testing?

ponticum.tpo (thanks Peter!)

comment:6 Changed 7 years ago by ln5

Parent ID: #3563#4563

<ln5> karsten, aagbsn: do you know the current status of #4297 [12:57]
<karsten> ln5: when I last tested it, it gave out ipv6 bridges via https, but

not via email. [12:58]

<karsten> also, it took the ipv6 addresses from or-address lines, not from a

lines (which are not contained in statuses), [12:59]

<ln5> karsten: can i quote you in the ticket?
<karsten> and I think bridgedb didn't make ipv6 addresses persistent in its

database. I might be wrong about the latter.

<karsten> sure

comment:7 Changed 7 years ago by ln5

Cc: aagbsn added

Aaron, I think you should own this ticket rather than me.
Please grab it if you agree.

comment:8 Changed 7 years ago by aagbsn

Owner: changed from ln5 to aagbsn
Status: needs_reviewaccepted

comment:9 Changed 7 years ago by ln5

Is branch 4297-ipv6-bridges-rebased-2 of user/aagbsn/bridgedb.git the right thing to test?

comment:10 Changed 7 years ago by aagbsn

Yes, that's the most recent work.

However, during the course of development for #5027 (continuing from #4097, and not in parallel) several bugs were found and fixed in the #5027 branch.

e.g.

  master
        \__4097-ipv6-bridges
                            \__5027-allocate-bridges-by-country

What needs to happen:

  1. Cleanup/backport of fixes will need to occur if 4097 is to be deployed in advance of 5027. This was started; those -rebased* branches are work-in-progress.
  2. Read ipv6 addresses from "a" lines, rather than or-address lines. I don't think there are any such 'a' lines in networkstatus-bridges or bridge-descriptors yet. Is that right?
  3. Make ipv6 addresses persistent in BridgeDB's database. The one place where the Bridge address seems to matter is in Bucket.py. Presently BridgeDB does not store ipv6 addresses in its database; probably an oversight. One solution would be to add a new table in BridgeDB's database for or-addresses in order to accommodate variable-length or-addresses.Presently Bucket.dumpBridges() just writes an address:port on each line, and each line represents a single bridge. Bucket.dumpBridges() could be modified to write multiple lines per bridge. Will it be a problem that a single bridge may be represented by multiple lines without any indication that this is the case?

comment:11 in reply to:  10 Changed 7 years ago by aagbsn

Replying to aagbsn:

However, during the course of development for #5027 (continuing from #4097, and not in parallel) several bugs were found and fixed in the #5027 branch.

Whoops, that should be #4297, not #4097 throughout

comment:12 in reply to:  10 ; Changed 7 years ago by karsten

Cc: arma added

Replying to aagbsn:

  1. Make ipv6 addresses persistent in BridgeDB's database. The one place where the Bridge address seems to matter is in Bucket.py. Presently BridgeDB does not store ipv6 addresses in its database; probably an oversight. One solution would be to add a new table in BridgeDB's database for or-addresses in order to accommodate variable-length or-addresses.Presently Bucket.dumpBridges() just writes an address:port on each line, and each line represents a single bridge. Bucket.dumpBridges() could be modified to write multiple lines per bridge. Will it be a problem that a single bridge may be represented by multiple lines without any indication that this is the case?

Storing IPv6 addresses in the database probably makes sense.

With respect to file buckets, hmm. We don't use file buckets right now, do we? That means we'll have to speculate how a hypothetical user would want the files to look like. I could imagine that a single line per bridge with "<ipv4 address:port>[ <ipv6 address:port>]*" would be a useful format, but I'm not even a hypothetical user. The single-line-per-bridge format also makes sense for #5482 where we're going to add stability and reachability information to file buckets, and those are per-bridge, too. What does arma think about this all?

comment:13 in reply to:  12 ; Changed 7 years ago by aagbsn

Replying to karsten:

Replying to aagbsn:

  1. Make ipv6 addresses persistent in BridgeDB's database. The one place where the Bridge address seems to matter is in Bucket.py. Presently BridgeDB does not store ipv6 addresses in its database; probably an oversight. One solution would be to add a new table in BridgeDB's database for or-addresses in order to accommodate variable-length or-addresses.Presently Bucket.dumpBridges() just writes an address:port on each line, and each line represents a single bridge. Bucket.dumpBridges() could be modified to write multiple lines per bridge. Will it be a problem that a single bridge may be represented by multiple lines without any indication that this is the case?

Storing IPv6 addresses in the database probably makes sense.

With respect to file buckets, hmm. We don't use file buckets right now, do we? That means we'll have to speculate how a hypothetical user would want the files to look like. I could imagine that a single line per bridge with "<ipv4 address:port>[ <ipv6 address:port>]*" would be a useful format, but I'm not even a hypothetical user. The single-line-per-bridge format also makes sense for #5482 where we're going to add stability and reachability information to file buckets, and those are per-bridge, too. What does arma think about this all?

What do we do with bridges that listen on multiple ports or multiple addresses? (Or both?) Do you mean, they should be on a single line? Do we want to give out all the listening addresses and ports to a single client? Doesn't that circumvent the whole point of having multiple addresses and ports per bridge?

We want to avoid a scenario where single bridge operator could represent a majority of bridges by listening on a few thousand ports. For that reason, BridgeDB does not treat each address:port as a bridge, but selects a valid address:port from the bridge returned by the bridge distributor (https, email). Perhaps we should do something similar here, and write a single line per bridge, along with stability and reachability information. Unfortunately, that information could be different for each address and the current implementation does not ensure that the same requesting (ip, email) will get the same address:port (Hmm. #5948 )

BridgeDB will also need a patch to support 'is blocked' status for each valid address (or even address:port, as a compact representation or in a database - 65535 ports * 4 bytes * 8 address lines could add up in a hurry) #5949

comment:14 in reply to:  13 ; Changed 7 years ago by karsten

Replying to aagbsn:

What do we do with bridges that listen on multiple ports or multiple addresses? (Or both?) Do you mean, they should be on a single line? Do we want to give out all the listening addresses and ports to a single client? Doesn't that circumvent the whole point of having multiple addresses and ports per bridge?

We're talking about buckets here, right? That means we export bridges in the unallocated ring to a file to be mailed to people distributing them somehow. I don't know if these people would prefer a single line per bridge with all addresses/ports or one line per address/port.

Note that the number of addresses/ports per bridge is limited. Proposal 186 says there can be at most additional 8 addresses times 16 ports. Linus' implementation only allows for 1 additional address with 1 port, AFAIK.

We want to avoid a scenario where single bridge operator could represent a majority of bridges by listening on a few thousand ports. For that reason, BridgeDB does not treat each address:port as a bridge, but selects a valid address:port from the bridge returned by the bridge distributor (https, email). Perhaps we should do something similar here, and write a single line per bridge, along with stability and reachability information.

Without knowing how bucket files will be used, I could imagine that selecting 1 IPv4 and 1 IPv6 address per bridge would be sufficient for most use cases.

Unfortunately, that information could be different for each address and the current implementation does not ensure that the same requesting (ip, email) will get the same address:port (Hmm. #5948 )

So, staying in the bucket case, two subsequent runs shouldn't include different addresses for the same bridge in the file. We could simply pick the first address for any given IP version or transport.

BridgeDB will also need a patch to support 'is blocked' status for each valid address (or even address:port, as a compact representation or in a database - 65535 ports * 4 bytes * 8 address lines could add up in a hurry) #5949

I wouldn't worry too much about database size here. But you're right that blocking information should be at bridge:address:port detail. If that makes things too complex, BridgeDB could only look at the bridge or bridge:address part.

comment:15 in reply to:  14 ; Changed 7 years ago by aagbsn

Replying to karsten:

Replying to aagbsn:

What do we do with bridges that listen on multiple ports or multiple addresses? (Or both?) Do you mean, they should be on a single line? Do we want to give out all the listening addresses and ports to a single client? Doesn't that circumvent the whole point of having multiple addresses and ports per bridge?

We're talking about buckets here, right? That means we export bridges in the unallocated ring to a file to be mailed to people distributing them somehow. I don't know if these people would prefer a single line per bridge with all addresses/ports or one line per address/port.

I believe they should get a list of lines that can be fed into a Tor client. Cut-n-paste, keep it simple.

Note that the number of addresses/ports per bridge is limited. Proposal 186 says there can be at most additional 8 addresses times 16 ports. Linus' implementation only allows for 1 additional address with 1 port, AFAIK.

Correct me if I'm wrong, but doesn't the spec provide for port ranges?

      or-address SP ADDRESS ":" PORTLIST NL

      ADDRESS = IP6ADDR | IP4ADDR
      IPV6ADDR = an ipv6 address, surrounded by square brackets.
      IPV4ADDR = an ipv4 address, represented as a dotted quad.
      PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
      PORTSPEC = PORT | PORT "-" PORT
      PORT = a number between 1 and 65535 inclusive.

BridgeDB #4297 supports port ranges, at any rate.

We want to avoid a scenario where single bridge operator could represent a majority of bridges by listening on a few thousand ports. For that reason, BridgeDB does not treat each address:port as a bridge, but selects a valid address:port from the bridge returned by the bridge distributor (https, email). Perhaps we should do something similar here, and write a single line per bridge, along with stability and reachability information.

Without knowing how bucket files will be used, I could imagine that selecting 1 IPv4 and 1 IPv6 address per bridge would be sufficient for most use cases.

Yes. Most bridges probably wont listen on multiple ports at first -- although it would be handy if the tor cloud images support multiple listening ports and/or addresses -- especially considering that more and more cloud providers offer ipv6 connectivity.

Unfortunately, that information could be different for each address and the current implementation does not ensure that the same requesting (ip, email) will get the same address:port (Hmm. #5948 )

So, staying in the bucket case, two subsequent runs shouldn't include different addresses for the same bridge in the file. We could simply pick the first address for any given IP version or transport.

That means that a bridge that gets (un)assigned to the bucket distributor will not utilize any of the additional addresses. Although, if the first address in the list gets blocked, it could be marked 'as blocked' and the next address in the list selected (if available).

BridgeDB will also need a patch to support 'is blocked' status for each valid address (or even address:port, as a compact representation or in a database - 65535 ports * 4 bytes * 8 address lines could add up in a hurry) #5949

I wouldn't worry too much about database size here. But you're right that blocking information should be at bridge:address:port detail. If that makes things too complex, BridgeDB could only look at the bridge or bridge:address part.

BridgeDB presently only looks at the bridge, because bridges only had one address:port.

I don't think it's too complex, but this enhancement shouldn't block deployment of #4297 if it turns out to be harder than anticipated.

comment:16 in reply to:  15 ; Changed 7 years ago by karsten

Replying to aagbsn:

I believe they should get a list of lines that can be fed into a Tor client. Cut-n-paste, keep it simple.

Makes sense.

Correct me if I'm wrong, but doesn't the spec provide for port ranges?

      or-address SP ADDRESS ":" PORTLIST NL

      ADDRESS = IP6ADDR | IP4ADDR
      IPV6ADDR = an ipv6 address, surrounded by square brackets.
      IPV4ADDR = an ipv4 address, represented as a dotted quad.
      PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
      PORTSPEC = PORT | PORT "-" PORT
      PORT = a number between 1 and 65535 inclusive.

That's an older version of the proposal/spec. The current dir-spec.txt doesn't allow the PORT "-" PORT part anymore.

comment:17 in reply to:  16 ; Changed 7 years ago by aagbsn

Replying to karsten:

Replying to aagbsn:

I believe they should get a list of lines that can be fed into a Tor client. Cut-n-paste, keep it simple.

Makes sense.

Correct me if I'm wrong, but doesn't the spec provide for port ranges?

      or-address SP ADDRESS ":" PORTLIST NL

      ADDRESS = IP6ADDR | IP4ADDR
      IPV6ADDR = an ipv6 address, surrounded by square brackets.
      IPV4ADDR = an ipv4 address, represented as a dotted quad.
      PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
      PORTSPEC = PORT | PORT "-" PORT
      PORT = a number between 1 and 65535 inclusive.

That's an older version of the proposal/spec. The current dir-spec.txt doesn't allow the PORT "-" PORT part anymore.

Oh dear. I thought that was a nice feature and fun to implement. Is it likely to come back in the future?

comment:18 in reply to:  17 Changed 7 years ago by karsten

Replying to aagbsn:

Replying to karsten:

That's an older version of the proposal/spec. The current dir-spec.txt doesn't allow the PORT "-" PORT part anymore.

Oh dear. I thought that was a nice feature and fun to implement. Is it likely to come back in the future?

Probably not. See Nick's commit where he took port ranges out.

comment:19 Changed 7 years ago by aagbsn

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.