Snippets Groups Projects

This is an archived project. Repository and other project resources are read-only.

Split relay and link crypto across multiple CPU cores

Right now, Tor does nearly all of its work in one main thread. We have a basic "CPUWorker" implementation that we use for doing server-side onionskin crypto in a separate thread, but thanks to improvements long ago, server-side onionskin crypto on longer dominates. If we could split the work of relay AES-CTR crypto and SSL crypto across multiple threads, that would be pretty helpful in letting high-performance servers saturate their connections. (Blutmagie has wanted this for some while.)

Child Tickets: [[TicketQuery(parent=#1749 (moved))]]

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Child items 0

No child items are currently assigned. Use child items to break down this issue into smaller parts.

Activity

Nick Mathewson changed milestone to %Tor: unspecified 14 years ago

changed milestone to %Tor: unspecified
Nick Mathewson added 035-roadmap-master 035-triaged-in-20180711 component::core tor/tor milestone::Tor: unspecified owner::chelseakomlo performance points::10 priority::high severity::normal status::assigned term-project-ideas threads tor-relay type::project labels 14 years ago

added 035-roadmap-master 035-triaged-in-20180711 component::core tor/tor milestone::Tor: unspecified owner::chelseakomlo performance points::10 priority::high severity::normal status::assigned term-project-ideas threads tor-relay type::project labels
Nick Mathewson @nickm · 14 years ago

Author

Trac:
Status: new to accepted
Owner: N/A to nickm
Type: defect to task
Nick Mathewson @nickm · 14 years ago

Author

Trac:
Description: Right now, Tor does nearly all of its work in one main thread. We have a basic "CPUWorker" implementation that we use for doing server-side onionskin crypto in a separate thread, but thanks to improvements long ago, server-side onionskin crypto on longer dominates. If we could split the work of relay AES-CTR crypto and SSL crypto across multiple threads, that would be pretty helpful in letting high-performance servers saturate their connections. (Blutmagie has wanted this for some while.)

to

Right now, Tor does nearly all of its work in one main thread. We have a basic "CPUWorker" implementation that we use for doing server-side onionskin crypto in a separate thread, but thanks to improvements long ago, server-side onionskin crypto on longer dominates. If we could split the work of relay AES-CTR crypto and SSL crypto across multiple threads, that would be pretty helpful in letting high-performance servers saturate their connections. (Blutmagie has wanted this for some while.)

Child Tickets: [[TicketQuery(parent=#1749 (moved))]]
Nick Mathewson @nickm · 14 years ago

Author

At least the relay crypto part of this should happen in 0.2.3.x

Trac:
Milestone: N/A to Tor: 0.2.3.x-final
Karsten Loesing @karsten · 13 years ago

Trac:
Actualpoints: N/A to N/A
Points: N/A to N/A
Type: task to project
Summary: Project: Split relay and link crypto across multiple CPU cores to Split relay and link crypto across multiple CPU cores
Nick Mathewson @nickm · 13 years ago

Author

Trac:
Milestone: Tor: 0.2.3.x-final to Tor: unspecified
Nick Mathewson @nickm · 12 years ago

Author

Trac:
Milestone: Tor: unspecified to Tor: 0.2.4.x-final
Priority: normal to major
Nick Mathewson @nickm · 12 years ago

Author

Trac:
Keywords: N/A deleted, tor-relay added
Nick Mathewson @nickm · 12 years ago

Author

Trac:
Component: Tor Relay to Tor
Nick Mathewson @nickm · 12 years ago

Author

Added a sub-ticket for the relay component.

Trac:
Milestone: Tor: 0.2.4.x-final to Tor: unspecified
Trac @tracbot · 11 years ago

May I suggest to get this at critical priority? 21th century crypto software can't afford to be not fully-threaded ;) No CPU sold today is mono-core anymore, and I sure few people would run a tor dedicated relay up 24/24 to see it used at only 1/n'th of its capacity.

Trac:
Username: elgo
Trac @tracbot · 11 years ago

Trac:
Username: towelenee

relay.2.c
Trac @tracbot · 11 years ago

Trac:
Username: towelenee
Trac @tracbot · 11 years ago

Trac:
Username: towelenee

relay.c
Trac @tracbot · 11 years ago

Here is my changes for relay.c It use multiply cores by openMp, but it needs changes in Makefile.am

Trac:
Username: towelenee
Trac @tracbot · 11 years ago

Trac:
Username: towelenee

relay.3.c
Trac @tracbot · 11 years ago

Trac:
Username: towelenee
Trac @tracbot · 11 years ago

Trac:
Username: towelenee

relay.4.c
Trac @tracbot · 11 years ago

Last patch doesn't use openmp, just pthreads

Trac:
Username: towelenee
Nick Mathewson @nickm · 9 years ago

Author

Trac:
Keywords: N/A deleted, 6s194 added
Nick Mathewson @nickm · 9 years ago

Author

These tickets were tagged "6s194" as ideas for possible term projects for students in MIT subject 6.S194 spring 2016. I'm retagging with term-project-ideas, so that the students can use the 6s194 tag for tickets they're actually working on.

Trac:
Keywords: 6s194 deleted, term-project-ideas added
Nick Mathewson @nickm · 7 years ago

Author

Trac:
Severity: N/A to Normal
Points: N/A to 10
Reviewer: N/A to N/A
Sponsor: N/A to N/A
Keywords: N/A deleted, performance, threads added
C

Chelsea Komlo @chelsea · 6 years ago

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?
C

cypherpunks @cypherpunks · 6 years ago

Replying to chelseakomlo:

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized? Isis offered a glimpse of the answer: https://blog.torproject.org/comment/269723#comment-269723
Nick Mathewson @nickm · 6 years ago

Author

Replying to chelseakomlo:

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?

It's not so well modularized right now. The big problem here is that the code is written with the assjumption that relay crypto finishes immediately, but with this change, we'd sometimes have to wait on another thread before we had cells to send on a given circuit.
C

Chelsea Komlo @chelsea · 6 years ago

Replying to nickm:

Replying to chelseakomlo:

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?

It's not so well modularized right now. The big problem here is that the code is written with the assjumption that relay crypto finishes immediately, but with this change, we'd sometimes have to wait on another thread before we had cells to send on a given circuit.

Ok, understood- it seems like relay_crypt_one_payload is the place where this would happen, and instead of blocking, it would emit an event once the relay crypto finishes.

I'll dig more into https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto, and will come up with a mini implementation plan for review.
C

Chelsea Komlo @chelsea · 6 years ago

Trac:
Owner: nickm to chelseakomlo
Status: accepted to assigned
S

Samdney @Samdney · 6 years ago

Add me as observer. I already spend some time with this. Maybe I can help :)

Trac:
Cc: N/A to Samdney
C

Chelsea Komlo @chelsea · 6 years ago

Ok, I have a start of a plan which I'm looking forward to discussing/further refining in Seattle. I took a large amount from https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto but there are some things which are out of date (circuit priority logic, for example) so any further pointers on what is different between when that wiki was written and where we are today would be helpful.

Below is a pad with a high level plan/starting implementation ideas; I've also attached a high level (pretty rough, sorry!) proposed architectural diagram to this ticket. Looking forward to further discussion, particularly around the proposal to use Rust and any Rust/C integration issues that could be particularly painful, and also any better ideas about how to cleanly register/edge trigger events.

https://pad.riseup.net/p/MultiThreadedCrypto_ImplementationPlan-keep
C

Chelsea Komlo @chelsea · 6 years ago

Trac:
David Goulet @dgoulet · 6 years ago

I have few comment about the proposed design. This is something I thought about a while back but never got cycles to implement.

Where should I discuss the plan? I would avoid using the ticket for that. I think a tor-dev@ thread would be ideal here?
H

Alex Xu @Hello71 · 6 years ago

one small thing: as I said on IRC, my informal profiling seems to show that a significant amount of CPU time is spent in the kernel networking stack, including TCP/IP and waiting for the network device. it's possible that that last bit is partially because of an old virtio-net though. it could potentially be easier to integrate recv multithreading at this point; or maybe not! maybe it would be easier (even in total) to just do crypto first, and then the other stuff.
C

Chelsea Komlo @chelsea · 6 years ago

Replying to Hello71:

one small thing: as I said on IRC, my informal profiling seems to show that a significant amount of CPU time is spent in the kernel networking stack, including TCP/IP and waiting for the network device. it's possible that that last bit is partially because of an old virtio-net though. it could potentially be easier to integrate recv multithreading at this point; or maybe not! maybe it would be easier (even in total) to just do crypto first, and then the other stuff.

That is a good point- this is a separate piece of work to this specific task though. However, it would probably be good to open an issue for "Recent profiling outcomes" so that we can take a closer look and track/make other issues for discoveries like this.
Nick Mathewson @nickm · 6 years ago

Author

Trac:
Milestone: Tor: unspecified to Tor: 0.3.5.x-final
Keywords: N/A deleted, 035-roadmap-master added
Nick Mathewson @nickm · 6 years ago

Author

Trac:
Keywords: N/A deleted, 035-triaged-in-20180711 added
Trac @tracbot · 6 years ago

Hello71, chelseakomlo: I've made such ticket some time ago: #23433 (moved).

Trac:
Username: Vort
Nick Mathewson @nickm · 6 years ago

Author

Trac:
Milestone: Tor: 0.3.5.x-final to Tor: unspecified
A

Arthur Edelstein @arthuredelstein · 6 years ago

Trac:
Cc: Samdney to Samdney, arthuredelstein
Trac @tracbot · 6 years ago

There are early plans to distribute crypto operations across multiple cores, but there might be a better way.

(I emailed before, but I just found the tiny reply link-button)

The ticket states the goal is to saturate the bandwidth available (by using all the cores as efficiently as possible).

I don't understand why a relay needs to have a "main thread". Network traffic arrives as an async operation and can be sent back out asynchronously. So a final strategy shouldn't have a central thread. The main thread might still be needed for startup, runtime adjustment, and system upkeep, but not for the core network-crypto processing; that should never need to touch the main thread.

The current proposal speaks about multi-threading crypto operations, let's call that "A) Speed - Speeding up processing of a single cell". Instead, I propose "B) Concurrency - Restructuring so multiple cells can be processed concurrently".

A cell of data should arrive via IO-Completion thread on a random CPU core, have crypto transformation applied on the same one core, then be dispatched onward out via the network. This seems to be quite a simple approach where I would think crypto code can remain the same "single-threaded" implementation.

Approach [A] will have diminishing returns as the number of cores increases. You can only break up a cell unit of work so much until you're encrypting one byte per cpu core. However, with approach [B], if you have millions of CPU cores (as an extreme) you can be processing millions of cells concurrently. Therefore, I believe approach [B] would be more scalable.

There would be circuit-state to maintain. Concurrent cells on the same circuit should be queued or thread-locked. I suspect thread-locking will be simple enough - the best approach.

Given that it's only a problem for the biggest nodes, a design should be chosen that is very time-efficient to implement and focuses on achieving the goals of such users, not focusing on squeezing every drop of performance, for performance sake. I believe this is that efficient and focused design.

What do you think?

Trac:
Username: schroeder
H

Alex Xu @Hello71 · 6 years ago

I've been saying this whole time that my (admittedly very informal) benchmarks show that the time spent in relay crypto is not significant, as long as AES is hardware accelerated (e.g. AES-NI). I assume that this task will require a significant amount of effort to bang out the final design and implement it. Therefore, if someone wants to do this, it is my opinion that they should first make better benchmarks, or find better benchmarks (I heard dgoulet was doing something...) showing that with modern openssl and hardware accelerated AES, parallelization is required.

Additionally: as I understand, the current design is highly single-threaded. In particular, the scheduler is a key component of modern Tor, and if I follow correctly, is sort of a bottleneck to full parallelism.
N

Neel Chauhan @neel · 5 years ago

Is any work being done on this? Do we require Rust support before work can start?

I'd love to have multicore relays as well (for my home server hosted on residential FTTH).

Trac:
Cc: Samdney, arthuredelstein to Samdney, arthuredelstein, neel
C

cypherpunks @cypherpunks · 5 years ago

it woud also help virtualized servers, with high bbandwidth butlow cpu single core power
T

teor @teor · 5 years ago

Replying to neel:

Is any work being done on this?

No.

Do we require Rust support before work can start?

No, but reliable multithreaded code is hard to write in C.

Maybe start with some easier tasks first?
Trac changed time estimate to 80h 4 years ago

changed time estimate to 80h
Nick Mathewson mentioned in issue #1760 (moved) 4 years ago

mentioned in issue #1760 (moved)
Nick Mathewson mentioned in issue #1826 (moved) 4 years ago

mentioned in issue #1826 (moved)
Roger Dingledine mentioned in issue #1948 (moved) 4 years ago

mentioned in issue #1948 (moved)
Nick Mathewson mentioned in issue #2087 (moved) 4 years ago

mentioned in issue #2087 (moved)
Nick Mathewson mentioned in issue #7572 (moved) 4 years ago

mentioned in issue #7572 (moved)

Please register or sign in to reply