Make relay crypto run on multiple CPU cores

changed milestone to %Tor: unspecified

added component::core tor/tor cpu milestone::Tor: unspecified multithreading owner::yawning performance points::4.5 priority::high severity::normal sponsor::U-can status::assigned term-project-ideas tor-dos tor-relay type::defect labels

BTW, I'm still hitting CPU limits on Gbit with boxes that support AES-NI, so it's still "essential" for me.

Replying to mo:

BTW, I'm still hitting CPU limits on Gbit with boxes that support AES-NI, so it's still "essential" for me.

Ow. Do you have any profiles from those boxes?

(FWIW, Andrea made som progress here, but it didn't make the 0.2.4 big feature window, so it'll have to be in 0.2.5)

Trac:
Milestone: Tor: 0.2.4.x-final to Tor: unspecified

Trac:
opreport-tor.log

opreport -g -l /usr/sbin/tor

Trac:
opreport-l.log

opreport -l

Are these useful?

Heck yeah! Could you tell me as much as possible about which version of Tor they were made with, built how, using which versions of openssl and libevent, built how?

Actually, let's move discussion of profiling stuff and reacting to it in general to bug #7727 (moved).

Trac:
Milestone: Tor: unspecified to Tor: 0.2.5.x-final

Trac:
Milestone: Tor: 0.2.5.x-final to Tor: 0.2.???

Trac:
Priority: normal to major

I attempted to partially address this ticket, but could use additional insight from someone more experienced with Tor. I am learning the Tor daemon code, but I understand the cell crypto is done (primarily) in relay_crypt() for middle relays and circuit_package_relay_cell() for exit relays. My changes ONLY address the relay_crypt() case. I realize my code is not up to Tor project coding standards, but so far, I've been focused on learning the Tor code base and trying to get this to work.

I've been developing and testing on Linux and branched from tags/tor-0.2.7.5. My github branch can be found here: https://github.com/sturgix/tor/tree/tor-0.2.7.5-multithreaded

In general, I refactored circuit_receive_relay_cell() in relay.c (which calls relay_crypt() and eventually the AES crypt routines) to use the workqueue.c infrastructure similar to cpuworker.c.

When the refactored code runs in single threaded mode, all seems good in limited tests. Once I activate the thread pool and start sending it work with threadpool_queue_work(), it Bootstraps 100% okay and runs for several minutes before crashing on cells it doesn't handle properly. It seems to pass several cells successfully, but then crashes on the bandwidth test(?).

In my branch, commit 842edc9 shows my refactored, single threaded version. Commit 940d1bd shows my attempt at pushing relay_crypt() into a thread pool of 1.

In a separate post, I'll write up some explanations of what I was trying to do.

Trac:
Username: jsturgix
Sponsor: N/A to N/A
Severity: N/A to Normal

I looked for an approach that I could generalize and apply to both the relay_crypt() case and the circuit_package_relay_cell() case. At first glance, I didn't see anything easy, and since there were already a number of moving parts unfamiliar to me, I focused on the relay_crypt() case.

In general, this was my thought process and approach:

(1) I created new files src/or/cryptothreads.c and src/or/cryptothreads.h. These are modeled after src/or/cpuworker.c and create the thread pool. cpuworker.c is big and I thought cryptothreads.c might also become big. Now it is small and it might make sense to roll cryptothreads.c into another existing source file like src/or/relay.c.

(2) From src/or/main.c, I call crypto_threads_init() (in cryptothreads.c) to initialize the events and thread pool handling.

(3) In command_process_relay_cell() (src/or/command.c), I encapsulated and moved everything after the call to circuit_receive_relay_cell() into circuit_receive_relay_cell_post() (relay.c). The idea was circuit_receive_relay_cell() would eventually queue the crypto task, but circuit_receive_relay_cell_post() would still be executed by the thread pool callback function in the context of the main thread. In other words, command_process_relay_cell() needs unwind and eventually return back to event loop monitoring; and circuit_receive_relay_cell_post() is still called but asynchronously.

(4) I basically broke circuit_receive_relay_cell() (relay.c) into two parts: cryptothread_threadfn() and cryptothread_replyfn(). cryptothread_threadfn() is run by a thread in the thread pool and calls down relay_crypt() -> relay_crypt_one_payload() -> crypto_cipher_crypt_inplace() and so forth into AES routines. When cryptothread_threadfn() finishes, the main thread (through its event loop) is signaled task complete and the main thread then calls cryptothread_replyfn(). There is some glue to make this happen such as queue_job_for_cryptothread() (reply.c) and replyqueue_process_cb() (cryptothread.c), but uses the existing src/common/workqueue.c implementation as modeled by cpuworker.c.

Initially, I did not think relay_crypt() accessed any resources shared by the main thread, so I have NOT added any synchronized access of shared data and I suspect this is the problem. All/most? access of shared data seemed to be done in the main thread's context after responding to an event (to include the thread pool callback function cryptothread_replyfn()) but admittedly I don't have a good grasp of the cell structures and cell/circuit queues used in the main thread. Me thinks I have reasoned incorrectly since the differences between the refactored single-thread version and the multiple thread version are relatively few.

From what I remember (or perhaps assumed), the functionality in src/common/workqueue.c is properly synchronized because it is already being used (but less intensely?).

Also, I have read the wiki article https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto but I have not fully merged these ideas with the newer(?) workqueue/cpuworker implementation.

Trac:
Username: jsturgix

I'll take this, probably starting from next week due to travel fun times.

Trac:
Owner: andrea to yawning
Status: new to assigned

guessing a milestone?

Trac:
Keywords: tor-relay deleted, tor-relay 6s194 added
Milestone: Tor: 0.2.??? to Tor: 0.2.9.x-final

These tickets were tagged "6s194" as ideas for possible term projects for students in MIT subject 6.S194 spring 2016. I'm retagging with term-project-ideas, so that the students can use the 6s194 tag for tickets they're actually working on.

Trac:
Keywords: tor-relay 6s194 deleted, term-project-ideas, tor-relay added

Make relay crypto run on multiple CPU cores

Child items ...

Activity