wiki:doc/TorLauncherUX2016

A Usability Evaluation of Tor Launcher

Tor is an anonymity network that routes Internet traffic through a series of relays that make it difficult to observe the source and destination. Tor Browser is a modified Firefox browser with a built-in Tor client that is the recommended way to use Tor. Tor Launcher is a Tor Browser component that starts, stops, and otherwise controls the underlying Tor processes. Tor Launcher’s graphical user interface asks the user to configure bridges, pluggable transports, and proxies to make a connection to Tor. This is the object of our study.

This project evaluates how easy it is for a user to connect to Tor with qualitative (behavioral and attitudinal) methods, makes design changes according to Tor-specific considerations, and verifies that the changes helped with quantitative (measurement-oriented) methods.

Users take more than 10 minutes to connect to Tor if the public relays are censored and 50% of the users cannot connect to Tor if the interface’s hardcoded bridges are censored. The suggested design changes saves users a lot of time--over 20 minutes if they are in censored environments that censor hardcoded bridges.

Designing changes was especially challenging because of atypical, Tor-specific design considerations. You might think that it would be easiest for the user if the process of setting up the connection to Tor was completely automated, and you're right. But it doesn't account for the fact automating the process can put some users at risk or that the relays that work most reliably are magnitudes more expensive and limited in capacity (to name a few reasons). Anyone interested in doing UX work for Tor would especially benefit from reading about these considerations.

People:

  • Linda Lee (@linda): user experience researcher
  • David Fifield (@dcf): computer security researcher
  • Nathan Malkin: user experience researcher
  • Ganesh Iyer: user interface designer
  • Serge Egelman: project adviser
  • David Wagner: project adviser

Timeline:

  • Oct - Dec 2015: user research
  • Jan - Dec 2016: interface redesigns, experiments, paper writing
  • Jan - Mar 2017: paper acceptance, paper polishing
  • Jul 2017: paper presentation at PETS

Goals:

  • find out why and where users struggle connecting to Tor
  • make changes to the Tor Launcher interface to address those problems
  • measure the impact of those changes to evaluate if we should implement the changes

Design considerations

If you're going to only read one section, read this one: these don't require an understanding of our experimental methodologies.

User Consent

Since being revealed that using Tor comes with some risk, it's critical to give users agency over the situation. Letting them configure the components makes them realize, at the minimum, that they are attempting to connect to Tor.

With the current interface (before the proposed changes), over half the people make a mistake connecting to Tor. While it's easy to say that we would make less mistakes and could help them by choosing for them, taking on risk for the user without proper consent is not ethical.

Active adversaries

It would be easy to ask the user something like, "what country are you in?," and give them advice. Or ask "are you at risk?" or "do you know what you are doing?" to offer manual configuration while defaulting to automatic configuration. There are benign and malicious third-party versions of Tor Browser on the internet and not all users download Tor Browser from the official site or official mirror. Since these replicate the official version, having user input in ours will result in impersonations having user input as well--and they might steal the information and use it to profile user.

Passive adversaries

Assuming that there was a sufficiently powerful network adversary, a failed, non-obfuscated attempt to connect to Tor will tell a passive eavesdropper that the user is connecting to Tor. Depending on who the user is, this may or may not be dangerous. Most state-level governments are capable of this level of surveillance. Allowing the user to have control and having them leverage information that they have about their specific situation can reduce their risk.

Maintenance constraints

Network environments change all the time--what bridge was reachable in a country an hour ago might not be tomorrow. Keeping up with how and when each country censors Tor would require a lot of time and would require multiple people to do it as their full time job. Even if we could keep up with how and when each country censors Tor, we would need to push changes to account for this before any of the affected users connect to Tor, which is an unreasonable turnaround time.

Financial constraints

There are a set of relays which almost work all the time (meek bridges)--they are hard to block because blocking them would cause a lot of other sites to be blocked, and governments don't want to blanket-block all of them. We could leverage these to ensure that people connect to Tor the first time safely, then assign them something that would work afterward. But meek relays cost thousands of dollars to run a month at their current capacity (easily the most expensive relays to run), and using them more would cost even more money.


User pain points

We found out why and where users struggled with the interface with 1) an interface audit by user experience researchers, 2) live user observations of real-life people using this interface in a controlled setting, and 3) user answers to interview questions following the experiment.

Nonexistent mental model

The average internet user does not have an accurate model of how the internet works (LAN, WAN, ASes, ISPs, etc.), nor should they be required to. These people can use browsers such as Firefox or Chrome without this information, and we should aim for this too.

There are many valid configuration settings to connect to Tor. A user who does not need a bridge or proxy can connect with a bridge, proxy, or both, provided that they are configured correctly.

Users rarely need proxies, and need bridges if they are circumventing censorship. The user must first successfully connect to the Internet through their proxy before trying to connect to a Tor relay or bridge.

Currently, we ask the users about internet concepts that they do not know (censorship, ISPs), and terms that are only specific to Tor (bridges, transports, etc). We do try to explain these concepts to them, but the experiments showed that it didn't work so well, and people were still confused. In general, users:

  • didn't know whether to connect directly or configure a bridge or proxy
  • if they determined that they needed to configure something, didn't know what they needed to configure
  • when a connection failed, couldn't diagnose what had been misconfigured and how to fix it
  • followed the recommendations in the interface (we suggest that they try a direct connection to Tor, then if that doesn't work, to use an obfs4 bridge), but didn't know what to do if recommendations didn't work (which they don't in heavily censored countries)

Missing information

Some transports work in some countries, while others do not. It depends if the country has figured out how the transport works and decide to block it.

None of this is really indicated in the interface. We would argue that these are details that should be abstracted for the user, so that they shouldn't need to know. But currently, they would be required to know to make the correct decision, which is not good.

Blanket user targeting

Currently, we don't differentiate between high/low risk users, users that do/don't know what they are doing, or users who are using it for nice-to-have security, anonymity, or censorship circumvention. While we discussed above that leveraging this information would be risky, it doesn't change the fact that users feel frustrated during the process. Users who aren't at risk may think, "why do I need do this?," "why can't this be automated for me?," etc.


Design changes and results

With the Tor-specific design considerations in mind, we made changes according to usability heuristics, such as increasing visibility of system status, leveraging recognition rather than recall, and matching the system with the real world. We also try to reduce text, make it aesthetically pleasing, and played with some graphics.

Proposed UI changes

See the before and after if you are curious. (Warning, it's just a screenshot of my paper and it's blurry; I was lazy about it for now. Will update later.)

To require our users to do less work, we eliminated the bridge and proxy questions and simulated auto-detection for proxies by telling users that they did not need a proxy (this is feasible to implement with a local scan that does not leak information.

To help users decide what to do, we labeled the configure button as an option for heavily censored environments (before, it didn't say so), tell users to configure a bridge if Tor is censored, and gave additional advice on choosing transports if the recommended one did not work (we recommended a meek bridge).

To help users understand what is going on, we added a summary screen that displays the current bridge and proxy settings for system visibility, switched proxy and bridge options to put them in a topologically sequential order, and added indicators into the progress bar that show if various components connected successfully.

Resulting Improvements

We used these industry standard metrics to measures users’ abilities to complete tasks:

  • Completion rate: percentage of users that connect to Tor in an experimental condition.
  • Connection time: time from Tor Browser startup to a successful connection to Tor.
  • Configuration time: time users spent in Tor Launcher configuring bridges and proxies, not counting time spent waiting for the connection to be made.

We simulated three environments of increasing severity in censorship. We chose to simulate environments for the stability and reproducibility of the experiment, as real censored networks are volatile and complex. These are not intended to imitate any particular country’s censorship environment, but is sufficient for testing the configuration interface since these environments require the users to take the all interface paths we wanted to test.

"OLD" refers to the interface before our design changes and "NEW" refers to the interface with our design changes. The changes caused a statistically significant reduction in time, but did not show a statistically significant increase in completion rates (those increases could be due to random chance, and the sample size was not big enough to determine this).

Notice how much time these simple UI changes helped the users. Also, people spent a lot of time waiting for their connections to work, rather than trying out different things (difference between configuration time and connection time). That's something that we want to fix in the future.

We would argue that our changes would significantly increase in completion rates, outside of a laboratory setting. Our users were paid $35 to try to connect to Tor for 40 minutes, but users in the wild would give up much sooner. If assumed that users gave up after a couple, five, or 20 minutes, you can see how it may increase the amount of people that successfully connect to Tor.

Future UI changes

We believe that these changes will help, but don't prescribe them exactly. We did not test our design changes independently, so we cannot (and do not) claim the effects each change. Instead of specific changes, we offer suggest these ways of alleviating user pain points:

  • Hiding the proxy screen and less-used transports: 1/200+ participants configured a proxy correctly and 0/200+ participants chose flashproxy, fte, fte-ipv6, or scramblesuit transports.
  • Discourage users from configuring optional components: many participants configured proxies and bridges that they did not need.
  • Redirect to relevant screens on error: i.e., redirect users to the bridge screen if the bridge is unreachable, instead of requiring users to diagnose the source of the problem and navigate manually.
  • Take advantage of the progress screen: we can use the minutes users spend here to educate users about what Tor does, or introduce Tor Browser.

We plan to work on making changes to Tor Launcher to improve its usability in 2017!


Appendix

The PETS paper (to come in July 2017!) documents the usability testing with academic rigor, and explains this project more in depth.

I've summarized the most relevant supplementary information in these sections below.

Scope and context

We restricted ourselves to making design changes that would not ignore any of the design considerations and not require any changes to existing infrastructure.

Although Tor Browser was originally designed for Internet anonymity, many now use Tor Browser to circumvent Internet censorship. Tor Launcher mainly exists so that people who face Internet censorship can configure bridges, transports, and proxies to connect to Tor.

Methodology

We performed a cognitive walkthrough by working tasks from the perspective of the user and assessing its learnability for new or infrequent users. We then conducted qualitative and quantitative usability tests of the interface. We believed that both were necessary for a complete evaluation. Qualitative testing tells you why or how it is or is not usable, while quantitative testing tells you how much it is or is not usable.

We worked with Xlab, a laboratory at UC Berkeley, to run these studies. ~50% of our participants were from their participant pool (mostly students, staff, and some outsiders) and ~50% were recruited from craigslist (specifically to not just have students).

Funding and collaboration

This work was Linda's master's thesis while she was getting her master's degree in computer science at UC Berkeley. This work was funded by the National Science Foundation (only because she was an NSF fellow) and Intel (because her academic advisers were funded by Intel). Neither funding source requested criteria or changes to this research, but did make sure that Linda had money to buy food and pay rent. This work was done in collaboration with SCRUB (secure computing research for users' benefit) Laboratory and BLUES (Berkeley laboratory for usable and experimental security) Laboratory, because that's where she worked while studying at Berkeley. Neither research institution requested criteria or changes to this research, but supported this research with its testing resources (computers, participants, etc).

Last modified 9 months ago Last modified on Feb 23, 2017, 5:38:35 PM

Attachments (7)

Download all attachments as: .zip