Opened 7 years ago

Closed 7 years ago

#4957 closed task (implemented)

Decide how to sanitize pluggable transport lines in bridge descriptors

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords:
Cc: asn, nickm, aagbsn Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We're providing sanitized versions of bridge descriptors in almost real-time on the metrics website and via rsync.

Once we enable bridges to include pluggable transport information in their server and/or extra-info descriptors, we need to come up with a way to sanitize the sensitive parts. We'll want to remove any keys contained in pluggable transport lines, okay. But maybe the fact that a bridge offers a specific pluggable transport is already sensitive? Maybe the fact that it offers any pluggable transport is sensitive?

This problem came up in #3589. Bridge clients aren't supposed to learn about pluggable transports contained in the bridge's extra-info descriptor. Neither the bridge authority nor the bridge gives out extra-info descriptors. But once the client knows the bridge's server descriptor it can easily look up the sanitized extra-info descriptor from the metrics archives. If we don't want the client to learn about the bridge's transports, we need to take that into account. If it helps, we can define new sanitizing rules for each pluggable transport there is.

So, there's a trade-off between revealing too much information and being able to analyze pluggable transport deployment. We'll probably want to run some analyses on pluggable transport deployment. We can only do that if the information is contained in the sanitized versions of bridge descriptors (because we don't use the original descriptors for analysis at all).

Child Tickets

Change History (6)

comment:1 Changed 7 years ago by karsten

Status: newneeds_information

asn, do you have any more information how transport lines will look like and how we might want to sanitize them? Ideally, we'd deploy the sanitizing code before the first bridge includes a transport line. Thanks!

comment:2 Changed 7 years ago by asn

Cc: aagbsn added
Status: needs_informationnew

Transport lines will look like this:

transport SP <methodname> SP <address:port> [SP arglist] NL

and there is also an optional field for supplemental data:

transport-info SP <methodname> [SP arglist] NL

I think you can ignore transport-info for now since it's not implemented and there are no transports that need it yet.

As far as sanitization is concerned, I'm not sure which approach is better. I'm also not completely sure how bridge descriptors are used; I assume they are used when analyzing bridge stats, and when a user wants to look at the descriptor of her bridge in atlas. Are there other use cases?

Some sanitization approaches:

a) No sanitization. Pluggable transports and their ports are dislosed to people who know a bridge.

b) Sanitization. Only display whether the bridge supports pluggable transports or not. Or maybe the number of transports it supports. Or maybe something else.

c) Paranoia. Don't display any pluggable transport-related information.

If I were to select one I would probably go with a). It's good both for analysis and for users who want to know more about their bridges.

I'm also not sold by the use case of a bridge operator who supports multiple transports, has a public bridge, and wants to hide some of her transports from her users. However, Tor users have many different use cases and I only know of a few, so if others think that b) or c) (or d)) are more reasonable (or support a larger range of use cases) I'm OK with it.

comment:3 in reply to:  2 Changed 7 years ago by karsten

Replying to asn:

Transport lines will look like this:

transport SP <methodname> SP <address:port> [SP arglist] NL

and there is also an optional field for supplemental data:

transport-info SP <methodname> [SP arglist] NL

I think you can ignore transport-info for now since it's not implemented and there are no transports that need it yet.

So, it looks like the contents of transport-info lines will be no more sensitive than the [SP arglist] part of transport lines, right? If we want to keep [SP arglist] in transport lines, we can as well keep transport-info lines, even if they're not in use yet.

As far as sanitization is concerned, I'm not sure which approach is better. I'm also not completely sure how bridge descriptors are used; I assume they are used when analyzing bridge stats, and when a user wants to look at the descriptor of her bridge in atlas. Are there other use cases?

Those are the two major use cases. I'm mainly interested in the bridge stats part, though. It would be good to see how widely the different transports are deployed and maybe be able to infer which of them are blocked or not.

Some sanitization approaches:

a) No sanitization. Pluggable transports and their ports are dislosed to people who know a bridge.

Note that everyone can learn the contents of sanitized bridge descriptors by downloading the tarballs or rsync'ing them from metrics. It's not just people who know a bridge who'll receive the sanitized descriptors.

If this a) includes leaving in the address part, I disagree. We should sanitize the address part in the same way how we sanitize bridge IP addresses. We can probably leave the port part in, because it might give us some hints whether a specific port works better than other ports for a given transport.

What does the arglist tell us that would be useful for statistical analysis? There are no shared secrets in that line, are there? If we take out the arglist part, I think we already decide against keeping transport-info lines in the future, because their only purpose seems to be to add another arglist to an existing transport.

b) Sanitization. Only display whether the bridge supports pluggable transports or not. Or maybe the number of transports it supports. Or maybe something else.

The simple fact that a bridge supports pluggable transports or the number of supported transports seems hardly useful for statistical analysis. What we could do is only keep transport SP <methodname> for each transport that a bridge supports. But I don't see yet how the sanitized address and (non-sanitized) port are sensitive information that we'd have to remove.

c) Paranoia. Don't display any pluggable transport-related information.

That's bad, because we should come up with some stats to show how successful pluggable transports are, if we can.

If I were to select one I would probably go with a). It's good both for analysis and for users who want to know more about their bridges.

I agree.

I'm also not sold by the use case of a bridge operator who supports multiple transports, has a public bridge, and wants to hide some of her transports from her users. However, Tor users have many different use cases and I only know of a few, so if others think that b) or c) (or d)) are more reasonable (or support a larger range of use cases) I'm OK with it.

Okay. Here's what I'm going to do, unless you or somebody else tells me it's a bad idea:

  • Sanitize transport lines by sanitizing the address part similar to how we sanitize other addresses and keeping the rest of the line unchanged.
  • Leave in transport-info lines without changing them at all.

Does that make sense? (Thanks!)

comment:4 Changed 7 years ago by karsten

After talking more to asn on IRC, we came up with a slightly more paranoid variant:

  • Sanitize transport lines by only keeping the transport SP <methodname> part.
  • Remove transport-info lines entirely.

Reasons are that it's yet unclear what the arglist part will contain in future transports. As a result, we also drop transport-info lines completely. asn further had concerns about not sanitizing the port part, so we left out <address:port>.

comment:5 Changed 7 years ago by karsten

Changes are implemented in metrics-db, but not yet deployed. Will deploy once stem can handle the new format (#6257).

comment:6 Changed 7 years ago by karsten

Resolution: implemented
Status: newclosed

Deployed and confirmed to be working. Closing. Yay! :)

Note: See TracTickets for help on using tickets.