Opened 6 years ago

Last modified 11 months ago

#7572 assigned defect

Make relay crypto run on multiple CPU cores

Reported by: nickm Owned by: yawning
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay, term-project-ideas, tor-dos multithreading performance cpu
Cc: chelseakomlo Actual Points:
Parent ID: Points: 4.5
Reviewer: Sponsor: SponsorU-can


Splitting relay crypto across multiple CPU cores is less essential than maybe it once was, now that we can use aesni, but it can still be critical-path. It's likely to become important again if we shift to something like Salsa20, or some large-block cipher based on it.

For information on the planned architecture, see

Child Tickets

Attachments (2)

opreport-tor.log (81.5 KB) - added by mo 6 years ago.
opreport -g -l /usr/sbin/tor
opreport-l.log (404.0 KB) - added by mo 6 years ago.
opreport -l

Download all attachments as: .zip

Change History (29)

comment:1 Changed 6 years ago by mo

BTW, I'm still hitting CPU limits on Gbit with boxes that support AES-NI, so it's still "essential" for me.

comment:2 in reply to:  1 Changed 6 years ago by nickm

Replying to mo:

BTW, I'm still hitting CPU limits on Gbit with boxes that support AES-NI, so it's still "essential" for me.

Ow. Do you have any profiles from those boxes?

comment:3 Changed 6 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: unspecified

(FWIW, Andrea made som progress here, but it didn't make the 0.2.4 big feature window, so it'll have to be in 0.2.5)

Changed 6 years ago by mo

Attachment: opreport-tor.log added

opreport -g -l /usr/sbin/tor

Changed 6 years ago by mo

Attachment: opreport-l.log added

opreport -l

comment:4 Changed 6 years ago by mo

Are these useful?

comment:5 Changed 6 years ago by nickm

Heck yeah! Could you tell me as much as possible about which version of Tor they were made with, built how, using which versions of openssl and libevent, built how?

comment:6 Changed 6 years ago by nickm

Actually, let's move discussion of profiling stuff and reacting to it in general to bug #7727.

comment:7 Changed 6 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.2.5.x-final

comment:8 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:9 Changed 4 years ago by nickm

Priority: normalmajor

comment:10 Changed 3 years ago by jsturgix

Severity: Normal

I attempted to partially address this ticket, but could use additional insight from someone more experienced with Tor. I am learning the Tor daemon code, but I understand the cell crypto is done (primarily) in relay_crypt() for middle relays and circuit_package_relay_cell() for exit relays. My changes ONLY address the relay_crypt() case. I realize my code is not up to Tor project coding standards, but so far, I've been focused on learning the Tor code base and trying to get this to work.

I've been developing and testing on Linux and branched from tags/tor- My github branch can be found here:

In general, I refactored circuit_receive_relay_cell() in relay.c (which calls relay_crypt() and eventually the AES crypt routines) to use the workqueue.c infrastructure similar to cpuworker.c.

When the refactored code runs in single threaded mode, all seems good in limited tests. Once I activate the thread pool and start sending it work with threadpool_queue_work(), it Bootstraps 100% okay and runs for several minutes before crashing on cells it doesn't handle properly. It seems to pass several cells successfully, but then crashes on the bandwidth test(?).

In my branch, commit 842edc9 shows my refactored, single threaded version. Commit 940d1bd

shows my attempt at pushing relay_crypt() into a thread pool of 1.

In a separate post, I'll write up some explanations of what I was trying to do.

Last edited 3 years ago by jsturgix (previous) (diff)

comment:11 Changed 3 years ago by jsturgix

I looked for an approach that I could generalize and apply to both the relay_crypt() case and the circuit_package_relay_cell() case. At first glance, I didn't see anything easy, and since there were already a number of moving parts unfamiliar to me, I focused on the relay_crypt() case.

In general, this was my thought process and approach:

(1) I created new files src/or/cryptothreads.c and src/or/cryptothreads.h. These are modeled after src/or/cpuworker.c and create the thread pool. cpuworker.c is big and I thought cryptothreads.c might also become big. Now it is small and it might make sense to roll cryptothreads.c into another existing source file like src/or/relay.c.

(2) From src/or/main.c, I call crypto_threads_init() (in cryptothreads.c) to initialize the events and thread pool handling.

(3) In command_process_relay_cell() (src/or/command.c), I encapsulated and moved everything after the call to circuit_receive_relay_cell() into circuit_receive_relay_cell_post() (relay.c). The idea was circuit_receive_relay_cell() would eventually queue the crypto task, but circuit_receive_relay_cell_post() would still be executed by the thread pool callback function in the context of the main thread. In other words, command_process_relay_cell() needs unwind and eventually return back to event loop monitoring; and circuit_receive_relay_cell_post() is still called but asynchronously.

(4) I basically broke circuit_receive_relay_cell() (relay.c) into two parts: cryptothread_threadfn() and cryptothread_replyfn(). cryptothread_threadfn() is run by a thread in the thread pool and calls down relay_crypt() -> relay_crypt_one_payload() -> crypto_cipher_crypt_inplace() and so forth into AES routines. When cryptothread_threadfn() finishes, the main thread (through its event loop) is signaled task complete and the main thread then calls cryptothread_replyfn(). There is some glue to make this happen such as queue_job_for_cryptothread() (reply.c) and replyqueue_process_cb() (cryptothread.c), but uses the existing src/common/workqueue.c implementation as modeled by cpuworker.c.

Initially, I did not think relay_crypt() accessed any resources shared by the main thread, so I have *NOT* added any synchronized access of shared data and I suspect this is the problem. All/most? access of shared data seemed to be done in the main thread's context after responding to an event (to include the thread pool callback function cryptothread_replyfn()) but admittedly I don't have a good grasp of the cell structures and cell/circuit queues used in the main thread. Me thinks I have reasoned incorrectly since the differences between the refactored single-thread version and the multiple thread version are relatively few.

From what I remember (or perhaps assumed), the functionality in src/common/workqueue.c is properly synchronized because it is already being used (but less intensely?).

Also, I have read the wiki article but I have not fully merged these ideas with the newer(?) workqueue/cpuworker implementation.

comment:12 Changed 3 years ago by yawning

Owner: changed from andrea to yawning
Status: newassigned

I'll take this, probably starting from next week due to travel fun times.

comment:13 Changed 3 years ago by nickm

Keywords: 6s194 added
Milestone: Tor: 0.2.???Tor: 0.2.9.x-final

guessing a milestone?

comment:14 Changed 3 years ago by nickm

Keywords: term-project-ideas added; 6s194 removed

These tickets were tagged "6s194" as ideas for possible term projects for students in MIT subject 6.S194 spring 2016. I'm retagging with term-project-ideas, so that the students can use the 6s194 tag for tickets they're actually working on.

comment:15 Changed 3 years ago by isabela

Sponsor: SponsorU-can

comment:16 Changed 3 years ago by nickm

Points: medium/large

comment:17 Changed 3 years ago by nickm

Keywords: tor-dos added

comment:18 Changed 3 years ago by isabela

Points: medium/large4.5

comment:19 Changed 3 years ago by nickm

Parent ID: #1749#17293

comment:20 Changed 3 years ago by nickm

Keywords: nickm-deferred-20160905 added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

Hi, Yawning! I'm deferring these tickets assigned to you from 0.2.9 to 0.2.???, since you're going to be out for September. But if you wind up wanting to do any of them for 0.2.9 anyway, please feel free to move them back.

(This is my ticket-deferring afternoon)

comment:21 Changed 3 years ago by nickm

Parent ID: #17293

Unparenting these from #17293; holding for future work.

comment:22 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:23 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:24 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:25 Changed 2 years ago by nickm

Keywords: nickm-deferred-20160905 removed

comment:26 Changed 23 months ago by nickm

Keywords: multithreading performance cpu added

comment:27 Changed 11 months ago by chelseakomlo

Cc: chelseakomlo added
Note: See TracTickets for help on using tickets.