Opened 3 months ago

Closed 9 days ago

#23709 closed defect (fixed)

channel: `outgoing_queue` and `incoming_queue` are always empty

Reported by: dgoulet Owned by: dgoulet
Priority: High Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched
Cc: Actual Points:
Parent ID: #23993 Points:
Reviewer: nickm Sponsor: SponsorV

Description

  1. Let's start with the incoming_queue of a channel_t:

The channel_queue_(var)cell() function is the one taking a single cell and putting it in the in queue.

If the queue is empty, it processes the cell immediately.
If the queue is NOT empty, it puts the cell in the queue and processes the cell immediately if we have cell handlers which is currently always the case.

The "process cell" function is in charge of removing the cell from the queue. So I think you can clearly see the problem with this code flow ;).

  1. Now outgoing_queue of a channel_t:

Inserting a cell in that queue is with channel_write_cell_queue_entry() which does it IFF the outgoing_queue is not empty and channel is open. If the queue is empty, the cell is processed immediately with the write_cell handler.

If the cell was for whatever reason not able to be sent by the write handler, the cell is then put in the outgoing_queue. However, the only possible handler we have right now is channel_tls_write_cell_method() (and packed/var friends) that always return 1 if the tlschan->conn exists. Empirical test shows that it is really ALWAYS the case that a conn exists on the tls channel.

---

Bottom line, those queues are as good as local to a function but we do consider them everywhere in the code such as in channel_flush_some_cells() or channel_more_to_flush().

We should either get rid of them or use them properly. For KIST, this is a big deal though to have them work properly because once data goes from a cmux queue to the outbuf of the connection, libevent will kick in to send() the data from the outbuf of a connection. KIST needs to have this steps where once it puts the data in the outbuf it then knows it has to write to kernel. If it can NOT write to kernel, the data needs to stay in the out queue else libevent will write the data from the outbuf to kernel without the scheduler knowing about it.

But because those queues are basically not usable outside their functions, everytime we queue a cell, it is put immediately in the outbuf. Fortunately, KIST does control a bit of it by doing the flush from the cmux to the outbuf (channel_flush_some_cells()) and then tries to write to kernel what has been flushed. But is seems that tor can put cells in the outbuf in other places which is not ideal...

If we can make the scheduler the *only* one capable of flushing the cmux to the out buf, we could get rid of those queues and only use the cmux queue. The scheduler does NOT care about the incoming_queue at all.

Child Tickets

Change History (12)

comment:1 Changed 8 weeks ago by dgoulet

Sponsor: SponsorV

Tagging all scheduler (and KIST) related tickets for SponsorV

comment:2 Changed 8 weeks ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

Taking the scheduler related tickets.

comment:3 Changed 8 weeks ago by dgoulet

Parent ID: 23993

comment:4 Changed 6 weeks ago by dgoulet

Parent ID: 23993#23993

comment:5 Changed 4 weeks ago by dgoulet

On going work is in: ticket23709_033_01

comment:6 Changed 3 weeks ago by dgoulet

Status: acceptedneeds_review

Ooook! This is pretty big but mostly code removal and new tests.

Because the channel subsystem is quite critical to Tor, I've added a load of unit tests raising the coverage to 83.6% for channel.c. More will be added as more things get fixed (like #23993).

In order to pull off this queue removal, I had to cleanup many things for which I believe channel.c is now much nicer and simpler. A lot of dead or unused code has been removed as well as an attempt to simplify everything.

This branch has been running for two days or so on my fast tor relay. So far, no memleaks nor issues I can see.

This is the summary of git diff --stat master. The insertions are at least 95% of unit tests. All in all, this makes things simpler, more tested, better documented and one less abstraction on which the cells transit. Will help us greatly for the next step of cleanly separating channel_t and channeltls_t. One extra bonus point, I now understand that subsystem :).

 11 files changed, 884 insertions(+), 2752 deletions(-)

Branch: ticket23709_033_01
OnionGit: https://oniongit.eu/dgoulet/tor/merge_requests/11

comment:7 Changed 12 days ago by nickm

Status: needs_reviewneeds_revision

This was less frightening than I had feared.

I left some comments and question on oniongit.

comment:8 Changed 11 days ago by dgoulet

Reviewer: nickm
Status: needs_revisionneeds_information

Commented back. This needs some discussion.

comment:9 Changed 9 days ago by dgoulet

Status: needs_informationneeds_review

comment:10 Changed 9 days ago by nickm

Status: needs_reviewneeds_revision

Looks good, but now I think there's an incorrect comment hiding there. (See note on oniongit.)

Also, this branch needs a changes file.

comment:11 Changed 9 days ago by dgoulet

Status: needs_revisionneeds_review

Done. Fixup commit from your review and extra commit for the changes file are in the branch.

Once merged, I'll monitor this closely especially for memleak overtime. So far so good on my public relay, running without a problem for more than 2 weeks now.

comment:12 Changed 9 days ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Okay, I think this is looking fairly good. Time to merge!

Note: See TracTickets for help on using tickets.