Initial pass. I still need to figure out what I want to do when there's max tasks queued and we get a created, and also to support canceling CREATED processing when the circuit gets removed when it's pending. I think the way things are structured now, the code is merely inefficient instead of crashing, but adding cancellation support is the right thing to do.
Now that I think about it, I'll probably go and rename a bunch of stuff (again) so I don't have lots of pain when I decide to parallelize onion_skin_create (Needed when #17272 (moved) happens)...
So, this branch will get force-pushed at some point. At least it shows how I think this should work...
Ok. I just squashed down my branch. One day in the future, we should strongly consider also parallelizing onion_skin_create, but it's reasonably fast for now (~25 usec) vs the client/server side specific parts of NTOR (~160 usec).
One side benefit of the new code is that there is a centralized place to add rate limiting for outgoing handshakes if we decide to do something different under extreme load in the future (origin_pending_add()/origin_next_task()).
I have reviewed the code: it makes sense to me and I don't see any issues:
There is some code duplication between the client and server onionskins, but I think that's ok.
The starvation-avoidance mechanism in queue_pending_tasks works as described. I wonder if we will want to favour server, client, or the longer queue in future.
Someone who is more familiar with cpuworkers or circuit crypto could give a more thorough review than I can.
In the future I was thinking of adding some form of active queue management to the client side queue (we detect extreme backlog for the server side queue and start dropping after a while), but that's a separate ticket, and I need to figure out what sort of algorithm to use (CoDel suggests itself as reasonable behavior with minor modifications).
Until we can get profiler output again, this branch is probably the last of the low hanging fruit for HS performance optimization. :(
Comment here should be REPLY instead of "RESPONSE".
+ /** Magic number; must be CPUWORKER_[type]_[REQUEST,RESPONSE]_MAGIC. */
The rest lgtm. I have to agree that there is a lot of code duplication here but I'm also OK with having server/client seperate here even though it's almost the same.
I'm also not super fan of having this ++total_pending_tasks; done before we actually successfully queue the task but I think that was in before this patch anyway (and highly unlikely also...).
Comment here should be REPLY instead of "RESPONSE".
{{{
/** Magic number; must be CPUWORKER_[type]_[REQUEST,RESPONSE]_MAGIC. */
}}}
Fixed in a fixup commit.
The rest lgtm. I have to agree that there is a lot of code duplication here but I'm also OK with having server/client seperate here even though it's almost the same.
Yeah, there's a good amount of boiler plate that's needed, but I was reluctant to merge the code further.
I'm also not super fan of having this ++total_pending_tasks; done before we actually successfully queue the task but I think that was in before this patch anyway (and highly unlikely also...).
1b150bcb9d54ccfe132b3e58ee9ef5f001f70221 -- looks ok.
3e75d22bd496c1238cabee25b8616ea6b02805d8
The fields in origin_entry_t need documentation. What are they for? Who sets them? The type itself could also use some explicitness. How do they get removed? How do they get removed when the origin_circuit_dies?
origin_entry_t and origin_queue could use a friendlier name, maybe with "pending_cpu" or "pending_crypto" in it.
All arguments of origin_pending_add should be documented. It should document which items it takes ownership of. It should also document whether it gives items to cpuworkers, or if not, who does?
In origin_next_task, must the caller also free hop_out?
origin_pending_remove has no documentation.
I am fine with having this linear search in origin_pending_remove(), but we should probably have an XXX about it.
Since CIRC_STATE_ONIONSKIN_PENDING can now mean something a little different than before, its documentation should be updated.
I am not in love with having total_pending_tasks and max_pending_tasks be shared by both types of work. (This speaks to the 'queue_pending_tasks' thing.)
I would like to have queue_pending_*_tasks have a more obvious end condition; I'm worried about infinite loops.
[META]: I assume this is duplicated from existing code, but I don't like how assign_onionskin_client_to_cpuworker() can return -1 for "I queued the work for later and -1 for "I tried to add the job, but threadpool_queue_work failed!"
META:
Do we really need to have two separate queues here, one for pre-queued stuff and one for stuff that's on a workqueue? Why not just put stuff straight on the workqueue?
Hi, Yawning! I'm deferring these tickets assigned to you from 0.2.9 to 0.2.???, since you're going to be out for September. But if you wind up wanting to do any of them for 0.2.9 anyway, please feel free to move them back.