Rust in Tor

What & why

We are currently investigating integrating Rust as a first-class language in Tor. We decided upon Rust due to the benefits of memory safety and the ability to directly integrate Rust and C. To read more about how and why this started, see our meeting notes from the 2017 meeting in Amsterdam.

Current status

We are working to get basic structures in place in order to easily build Tor with Rust and add more Rust modules. This includes deciding how we will do dependency management, linking Rust and C modules, and type translation across the Rust/C FFI boundary. We recently merged #22106 which we will use to test platform support across distributions.

Please note that we're not taking implementations of new features in Rust at this point in time.

Future steps

What we are currently working on

  1. Understand alignment between Rust and Tor supported platforms. This is a list of which platforms we aim to support, it would be helpful to understand the intersection with Rust. (#22771)
  2. Adding automated tooling for code quality tools. (#22156)
  3. Build Tor with Rust for Windows. (#22839)
  4. Investigate the reproducibility of Rust binaries. (#22769)
  5. Implementing existing submodules in Rust as a proof of concept. Two that are currently in progress are consdiff (#24609) and protover (#22840).
  6. Add Rust-enabled build to the Tor CI. (#22636 and #22768)

All current, non-closed, Rust in Tor tickets

Ticket Summary
#24265 Fuzz all rust functions that are used by authorities to make sure they match C

We could break consensus if some authorities are running the rust version of the code, and some are running the C version of the code, and their outputs differ on any input.

This is like #24029, but with arbitrary inputs that may or may not be UTF-8.

#22776 Implement the remaining cryptographic protocols for Hyphae

We'll need:

1) Back-Maxwell Rangeproofs (requires Borromean Ring Signatures) 2) A ZKP compiler 3) Testvectors for Ristretto (a.k.a. Decaf for curve25519)

#24029 Test all rust functions' behavior when called from C with bad UTF8

We should make sure that the various rust implementations of our protover functions will correctly detect and reject strings that aren't UTF-8

#24030 Wrap types in

Our rust protover implementation throws around HashSet and HashMap with wild abandon. We should probably wrap those types in struct declarations, to make the intent more clear.

#24031 could use a better algorithm

This probably doesn't matter in practice, but: it would be cool if used a smarter representation for sets of protocol versions than HashSet.  Maybe a BTreeSet of (low,high) tuples?

#24249 Create automated mechanism for C/Rust types to stay in sync

In transitioning parts of tor to Rust, some parts of the code will either need to temporarily exist in both C and Rust (such as protover), or will be highly coupled (such as enums that are passed between the FFI boundary).

It would be good to automatically verify these areas of the code don't get out of sync. This could either be a post-hoc verifier, or a generator that takes a higher-level specification and generates both C and Rust types.

Ideally, the coupling between C and Rust will be as minimal as possible, so this probably does not need to be a heavyweight solution.

#24609 consdiff implementation in Rust

in my public repo in branch rust4, there's a pretty much complete consdiff implementation in Rust (only missing some logging and testing from the C side iirc). I won't have time to pick it up anytime soon I'm afraid but I hope someone finds it useful. Note it looks a bit different compared to the C code as we were trying very hard to come up with something without any unsafe code and no external dependencies, as this was some of the first rust code ever written for tor. It should be straight-forward, though.

#22156 Add Rust linting/formatting tools

We need this as another initial step to support Rust development in tor.

Work will involve adding rustfmt, Clippy, and determining rules we want/don't want.

See conversation in #22106

#23351 Create a rustfmt.toml defining our whitespace/formatting standards

We currently have no style consensus for Rust code. It would be good to agree on something! We could agree on whatever the Rust people like (still a WIP last I checked) or we could modify that by creating a `rustfmt.toml`.

We should also probably add a pre-commit hook for running rustfmt, since we have a pretty clean slate and we should keep it clean. :)

#22816 Run tests for single Rust module

In Tor, we currently have the ability to run tests for a single C module (or even a single unit test). As specified in doc/HACKING/WritingTests, running tests for the cell format module (for example) can be done via ./src/test/test cellfmt/..

Rust modules should have a similar option. Currently 'cargo test' can be run within a single Rust module, but this will not link against C modules. It would be good to be able to do this and retain the ability to test a single Rust module. Also, it would be nice to make this similar to running single C module tests, to minimize developer confusion.

#22769 Investigate the reproducibility of Rust binaries

If we are going to start writing more Tor things in Rust, it would be nice to understand the reproducibility of binaries created with rustc. I suspect the Tor Browser Team would also be interested in having these results, since parts of Firefox are now written in Rust, and soon (ESR 58?) it will no longer be optional to use them.

Note: this ticket is not about the reproducibility of rustc iteself. That is an extremely deep rabbit hole (trust me, I have a rustc chained back to the OCaml days). Someday we may need to explore that, but that time is not now.

My approach for this task would be probably be to create a Docker instance which builds some trivial Rust program, and then run the Docker instance on different machines and compare the hashes of the binaries (then optionally investigate the differences using whatever tools like running strings and moving up to Ida or whatever).

#24608 Update our Cargo.lock file to remove the deprecated and removed [root] section

This is causing build errors on Travis, which picked up the newest cargo nightly a week ago. As pointed out by Sebastian, the error appears to be due to cargo issue #4563, which completely removed the [root] section of Cargo.lock files. Often, historically, the [root] section was used with an arbitrary non-existent crate, before cargo workspaces were implemented. However, our Cargo.lock file contains not only a [root], but one which points to a non-arbitrary crate, tor_util. IIUC, we'll just need to remove that section.

(While we're at it, we may want to update to the newest libc dependency.)

I think we'll need to backport this to 0.3.0.x, since no newer cargos will build something with a [root] section.

#23880 Build tor with --enable-rust in Orbot and OnionBrowser

Hello! During our Rust discussions at the Montréal meeting, we discussed that it would be extremely useful to know — before we enable Rust by default — if doing so will cause issues for our packagers and downstreams, particularly on mobile. Would it be possible, please, for someone to create an experimental build of Orbot (and OnionBrowser!) building with ./configure --enable-rust [--enable-cargo-online-mode], and let us know any issues you encounter here?

#23881 Implement a way to utilise tor's logging system from Rust code

We really need a way to use tor's logging subsystem from Rust code. I haven't ever really looked at our logging code because it always Just Works™, but it seems possible that we could construct/format Strings to log in Rust code, choose a logging level, and throw the String across the FFI boundary, have the C code log it, then have the Rust code free it?

I'm not sure what we'll do about logging in general moving forward, once (and if) more and more of tor is rewritten in Rust.

#23882 Investigate implementing a Rust allocator wrapping tor_malloc

We should look into implementing the Rust alloc::allocator::Alloc trait as a wrapper around tor_malloc as a way to have a cleaner allocator interface in Rust moving forward (which still works with our current legacy C code).

This is what the Rust code in Firefox has done, and the alloc crate is supposed to stabilised "soon" (as in, within the next six months) because FF is using it.

#23886 Write FFI bindings and function pointers for ed25519-dalek

As part of our efforts to get a few modules in Tor written in Rust for 0.3.3, an exceptionally easy candidate is our ed25519 code, given that the current code is already highly modularised, taking function pointers to implement an interface. I wrote ed25519-dalek, and I recently revised the API to be a very close match to what tor expects, so I believe this task should be extremely easy, and a prime candidate for someone newer to Rust who wishes to learn about writing FFI. (I'm happy to pair program on this too! Also on anything else, but this too.)

#22907 Investigate using cargo-vendor for offline dependencies

People on the cargo team recommended we look into using to facilitate our offline builds. (See also #22830)

#23878 Attempt rewriting buffers.c in Rust

In buffers.c, we define buf_t, which is essentially a doubly-linked list comprised of chunks of contiguously-allocated memory. During the Montréal meeting, we identified buf_t as a potentially good candidate datatype for reimplementation in Rust.

My understanding of possibly the ideal way to do this (after talking with Alex Crichton, without boats, nickm, and Nika Layzell) would be to entirely rethink the implementation in terms of a VecDeque<Bytes> using VecDeque from the stdlib and Bytes or another buffer type from the bytes crate. If this is something which works out, we could then (hopefully!) expose a similar API as to the C interface. (If that doesn't work out, there's only a couple points in the code which appear to rely on the current implementation of buf_t.)

#24033 Require all directory documents to be utf-8

There are only a few places that directory documents can have arbitrary bytes today, and almost nobody is using them to encode anything besides UTF-8. let's standardize on UTF-8 while we still can.

Step one will be for the authorities to start rejecting these documents. Once they're rejecting them, everybody else can begin rejecting them too.

#24116 Torsocks deadlocks every Rust program

Any Rust program that is run with torsocks will deadlock. This has nothing to do with networking, even the program 'fn main() { }' compiled with a recent rustc will deadlock when run as 'torsocks ./rust_torsocks'.

This is a backtrace I got when attaching to the deadlocked process:

#0  0xb7713cf9 in __kernel_vsyscall ()
#1  0xb76b9d92 in __lll_lock_wait ()
  at ../sysdeps/unix/sysv/linux/i386/lowlevellock.S:144
#2  0xb76b38de in __GI___pthread_mutex_lock (mutex=0xb770d024)
  at ../nptl/pthread_mutex_[lock.c:80 lock.c:80]
#3  0xb77001ed in tsocks_mutex_lock ()
  from /usr/lib/i386-linux-gnu/torsocks/
#4  0xb7700334 in tsocks_once ()
  from /usr/lib/i386-linux-gnu/torsocks/
#5  0xb76fa25e in tsocks_initialize ()
  from /usr/lib/i386-linux-gnu/torsocks/
#6  0xb76fd02d in syscall ()
  from /usr/lib/i386-linux-gnu/torsocks/
#7  0x004a5049 in os_overcommits_proc ()
  at /checkout/src/liballoc_jemalloc/../jemalloc/src/pages.c:252
#8  je_pages_boot ()
  at /checkout/src/liballoc_jemalloc/../jemalloc/src/pages.c:297
#9  0x004745dd in malloc_init_hard_a0_locked ()
  at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1366
#10 0x00474768 in malloc_init_hard ()
  at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1493
#11 0x00489b95 in malloc_init ()
  at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:317
#12 ialloc_body (slow_path=true, usize=<synthetic pointer>,
  tsdn=<synthetic pointer>, zero=true, size=20)at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1583
#13 calloc (num=1, size=20)
  at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1824
#14 0xb76d23ec in _dlerror_run (
  operate=operate@entry=0xb76d1b80 <dlopen_doit>, args=args@entry=0xbfb8cd10)
#15 0xb76d1c9e in __dlopen (file=0xb7703345 "", mode=1) at dlopen.c:87
#16 0xb76fa44f in ?? () from /usr/lib/i386-linux-gnu/torsocks/
#17 0xb7700352 in tsocks_once ()__
  from /usr/lib/i386-linux-gnu/torsocks/
#18 0xb76fa25e in tsocks_initialize ()
  from /usr/lib/i386-linux-gnu/torsocks/
#19 0xb7724c65 in call_init (l=<optimized out>, argc=argc@entry=1,
  argv=argv@entry=0xbfb8ce74, env=0xbfb8ce7c) at dl-init.c:72
#20 0xb7724d8e in call_init (env=0xbfb8ce7c, argv=0xbfb8ce74, argc=1,
  l=<optimized out>) at dl-init.c:30
#21 _dl_init (main_map=<optimized out>, argc=1, argv=0xbfb8ce74,
  env=0xbfb8ce7c) at dl-init.c:120
#22 0xb7715a5f in _dl_start_user () from /lib/

It looks like tsocks_initialize() is called when libtorsocks is loaded, it calls tsocks_once() which locks a mutex and then calls dlopen() to get the libc symbols, dlopen() tries to allocate some memory which leads jemalloc (the default allocator for Rust programs) to try to call syscall() (it wants to open a proc file to see if the system overcommits memory or not), which is intercepted by libtorsocks, which leads to another call to tsocks_initialize()... and since the mutex is already locked, it deadlocks.

One way to fix this might be to just let through any syscall() calls that happen during bootstrapping, but i don't know the torsocks code well enough to know if this could cause any dangerous leaks.

Interested in helping out?

Please see doc/HACKING/ (rendered) in the tor.git repo.

Coding Standards

Please see doc/HACKING/ (rendered) in the tor.git repo.

Last modified 3 days ago Last modified on Dec 13, 2017, 12:13:15 AM