Opened 8 years ago

Last modified 2 months ago

#1749 assigned project

Split relay and link crypto across multiple CPU cores

Reported by: nickm Owned by: chelseakomlo
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay, term-project-ideas, threads, performance, 035-roadmap-master, 035-triaged-in-20180711
Cc: Samdney Actual Points:
Parent ID: Points: 10
Reviewer: Sponsor:

Description (last modified by nickm)

Right now, Tor does nearly all of its work in one main thread. We have a basic "CPUWorker" implementation that we use for doing server-side onionskin crypto in a separate thread, but thanks to improvements long ago, server-side onionskin crypto on longer dominates. If we could split the work of relay AES-CTR crypto and SSL crypto across multiple threads, that would be pretty helpful in letting high-performance servers saturate their connections. (Blutmagie has wanted this for some while.)

Child Tickets:

#1760
Parallel Crypto: Design a good crypto parallelization plan and architecture
#26296
Refactor cell crypto to pre/post crypto operations


Child Tickets

TicketStatusOwnerSummaryComponent
#1760closedParallel Crypto: Design a good crypto parallelization plan and architectureCore Tor/Tor
#26296assignedchelseakomloRefactor cell crypto to pre/post crypto operationsCore Tor/Tor

Attachments (5)

relay.2.c (1.6 KB) - added by towelenee 5 years ago.
relay.c (1.6 KB) - added by towelenee 5 years ago.
Here is my changes for relay.c It use multiply cores by openMp, but it needs changes in Makefile.am
relay.3.c (2.5 KB) - added by towelenee 5 years ago.
relay.4.c (2.2 KB) - added by towelenee 5 years ago.
Last patch doesn't use openmp, just pthreads
MultiThreadedCrypto.png (100.1 KB) - added by chelseakomlo 6 months ago.

Download all attachments as: .zip

Change History (32)

comment:1 Changed 8 years ago by nickm

Owner: set to nickm
Status: newaccepted
Type: defecttask

comment:2 Changed 8 years ago by nickm

Description: modified (diff)

comment:3 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-final

At least the relay crypto part of this should happen in 0.2.3.x

comment:4 Changed 7 years ago by karsten

Summary: Project: Split relay and link crypto across multiple CPU coresSplit relay and link crypto across multiple CPU cores
Type: taskproject

comment:5 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified

comment:6 Changed 6 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.2.4.x-final
Priority: normalmajor

comment:7 Changed 6 years ago by nickm

Keywords: tor-relay added

comment:8 Changed 6 years ago by nickm

Component: Tor RelayTor

comment:9 Changed 6 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: unspecified

Added a sub-ticket for the relay component.

comment:10 Changed 5 years ago by elgo

May I suggest to get this at critical priority?
21th century crypto software can't afford to be not fully-threaded ;)
No CPU sold today is mono-core anymore, and I sure few people would run a tor dedicated relay up 24/24 to see it used at only 1/n'th of its capacity.

Changed 5 years ago by towelenee

Attachment: relay.2.c added

Changed 5 years ago by towelenee

Attachment: relay.c added

Here is my changes for relay.c It use multiply cores by openMp, but it needs changes in Makefile.am

Changed 5 years ago by towelenee

Attachment: relay.3.c added

Changed 5 years ago by towelenee

Attachment: relay.4.c added

Last patch doesn't use openmp, just pthreads

comment:11 Changed 3 years ago by nickm

Keywords: 6s194 added

comment:12 Changed 3 years ago by nickm

Keywords: term-project-ideas added; 6s194 removed

These tickets were tagged "6s194" as ideas for possible term projects for students in MIT subject 6.S194 spring 2016. I'm retagging with term-project-ideas, so that the students can use the 6s194 tag for tickets they're actually working on.

comment:13 Changed 18 months ago by nickm

Keywords: threads performance added
Points: 10
Severity: Normal

comment:14 Changed 6 months ago by chelseakomlo

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?

comment:15 in reply to:  14 Changed 6 months ago by cypherpunks

Replying to chelseakomlo:

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?

Isis offered a glimpse of the answer: https://blog.torproject.org/comment/269723#comment-269723

comment:16 in reply to:  14 ; Changed 6 months ago by nickm

Replying to chelseakomlo:

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?

It's not so well modularized right now. The big problem here is that the code is written with the assjumption that relay crypto finishes immediately, but with this change, we'd sometimes have to wait on another thread before we had cells to send on a given circuit.

comment:17 in reply to:  16 Changed 6 months ago by chelseakomlo

Replying to nickm:

Replying to chelseakomlo:

How likely is it that this functionality (or parts of it) can be implemented in Rust? Would it require a lot of refactoring or is it already fairly modularized?

It's not so well modularized right now. The big problem here is that the code is written with the assjumption that relay crypto finishes immediately, but with this change, we'd sometimes have to wait on another thread before we had cells to send on a given circuit.

Ok, understood- it seems like relay_crypt_one_payload is the place where this would happen, and instead of blocking, it would emit an event once the relay crypto finishes.

I'll dig more into https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto, and will come up with a mini implementation plan for review.

comment:18 Changed 6 months ago by chelseakomlo

Owner: changed from nickm to chelseakomlo
Status: acceptedassigned

comment:19 Changed 6 months ago by Samdney

Cc: Samdney added

Add me as observer. I already spend some time with this. Maybe I can help :)

comment:20 Changed 6 months ago by chelseakomlo

Ok, I have a start of a plan which I'm looking forward to discussing/further refining in Seattle. I took a large amount from https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto but there are some things which are out of date (circuit priority logic, for example) so any further pointers on what is different between when that wiki was written and where we are today would be helpful.

Below is a pad with a high level plan/starting implementation ideas; I've also attached a high level (pretty rough, sorry!) proposed architectural diagram to this ticket. Looking forward to further discussion, particularly around the proposal to use Rust and any Rust/C integration issues that could be particularly painful, and also any better ideas about how to cleanly register/edge trigger events.

https://pad.riseup.net/p/MultiThreadedCrypto_ImplementationPlan-keep

Changed 6 months ago by chelseakomlo

Attachment: MultiThreadedCrypto.png added

comment:21 Changed 6 months ago by dgoulet

I have few comment about the proposed design. This is something I thought about a while back but never got cycles to implement.

Where should I discuss the plan? I would avoid using the ticket for that. I think a tor-dev@ thread would be ideal here?

comment:22 Changed 6 months ago by Hello71

one small thing: as I said on IRC, my informal profiling seems to show that a significant amount of CPU time is spent in the kernel networking stack, including TCP/IP and waiting for the network device. it's possible that that last bit is partially because of an old virtio-net though. it could potentially be easier to integrate recv multithreading at this point; or maybe not! maybe it would be easier (even in total) to just do crypto first, and then the other stuff.

comment:23 in reply to:  22 Changed 6 months ago by chelseakomlo

Replying to Hello71:

one small thing: as I said on IRC, my informal profiling seems to show that a significant amount of CPU time is spent in the kernel networking stack, including TCP/IP and waiting for the network device. it's possible that that last bit is partially because of an old virtio-net though. it could potentially be easier to integrate recv multithreading at this point; or maybe not! maybe it would be easier (even in total) to just do crypto first, and then the other stuff.

That is a good point- this is a separate piece of work to this specific task though. However, it would probably be good to open an issue for "Recent profiling outcomes" so that we can take a closer look and track/make other issues for discoveries like this.

comment:24 Changed 5 months ago by nickm

Keywords: 035-roadmap-master added
Milestone: Tor: unspecifiedTor: 0.3.5.x-final

comment:25 Changed 4 months ago by nickm

Keywords: 035-triaged-in-20180711 added

comment:26 Changed 4 months ago by Vort

Hello71, chelseakomlo: I've made such ticket some time ago: #23433.

comment:27 Changed 2 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: unspecified
Note: See TracTickets for help on using tickets.