Opened 7 years ago

Last modified 4 months ago

#7743 needs_revision defect

Avoid needless wasted space in cells

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay, bwbug, nagle, sponsor8-removed
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #7707, cypherpunks notes that when packaging cells, we can do some really dumb things when bandwidth is low, such reading incoming bytes in a super-slow trickle, and packaging them into cells aggressively.

There are two comparatively easy things I want to do to prevent this:

  • We should use FIONREAD to determine the number of cells waiting on an incoming edge socket. If we determine that more reading in the future will let us fit more bytes in a cell, we shouldn't package bytes in a cell yet.
  • We could avoid packaging incoming bytes from a stream for so long as that circuit has any queued cells that originated here, in case more incoming bytes arrive before we'd transmit the current cell.

This feels like a backport candidate to 0.2.3.

Child Tickets

Attachments (3)

216769.txt (917 bytes) - added by nickm 7 years ago.
11734484.txt (986 bytes) - added by nickm 7 years ago.
bug7743_notes.txt (107.9 KB) - added by andrea 7 years ago.

Download all attachments as: .zip

Change History (37)

comment:1 Changed 7 years ago by nickm

Owner: set to nickm
Status: newaccepted

I'm going to try to write patches for these before I go to bed.

comment:2 Changed 7 years ago by nickm

Status: acceptedneeds_review

Branch bug7743_023 now has a patch for the first idea above.

comment:3 Changed 7 years ago by nickm

Keywords: regression added
Priority: majorcritical

This is urgent to backport to 0.2.3 because of the shorter read bucket refill times. (Alternatively, the authorities could set the refill time to, say, 500 msec until 0.2.3 is obsolete, or tell 0.2.3 to use 500 msec intervals and give 0.2.4 Tors with this patch a new consensus variable to look at.)

The second fix I suggest above is less urgent, since the issue it fixes isn't a regression. It's also trickier to do, since the most obvious implementations would seem to allow a loud stream to starve a quiet stream if they share a circuit.

comment:4 Changed 7 years ago by nickm

Keywords: bwbug added

Adding a bwbug tag for this tranche of bandwidth bugs.

comment:5 Changed 7 years ago by nickm

Oops; there is no such consensus parameter; this is really going to need a backport, I think.

Also, we should probably do this ioctl hack at most once per cell, or else a trickle of incoming bytes might keep us from sending a cell in a timely way. Attaching a variant patch (submitted pseudonymously) to do that.

Changed 7 years ago by nickm

Attachment: 216769.txt added

comment:6 Changed 7 years ago by arma

Right, we never did the consensus param since our refill code couldn't handle being changed.

That said, do we have any evidence that this feared bug is for real much? connection_bucket_round_robin() tries pretty hard to fill its cell.

comment:7 in reply to:  6 Changed 7 years ago by nickm

Replying to arma:

Right, we never did the consensus param since our refill code couldn't handle being changed.

That said, do we have any evidence that this feared bug is for real much? connection_bucket_round_robin() tries pretty hard to fill its cell.

It tries, but it can leave cells low when the per-tick bandwidth isn't huge. Please engage with the bug reporter if you don't believe me.

comment:8 Changed 7 years ago by nickm

The current code also delays in the case where data arrives after recv but before the ioctl. That might not be the best choice; it might instead be a good idea to do the ioctl earlier, at the time when we do connection_bucket_read_limit

Then again, maybe not.

comment:9 Changed 7 years ago by nickm

One way forward, pending a simulation or something in 0.2.3, would be to get this in 0.2.4, along with a patch to track how many wasted bytes it is preventing, and log that as part of the heartbeat.

comment:10 Changed 7 years ago by nickm

Alternate version in bug7743_023_alt2. Basically untested, probably buggy.

comment:11 Changed 7 years ago by nickm

20:20 <@nickm> I need somebody to try running master (as of 7a99d26c798a2223) 
               on an exit node.  It doesn't have to be a big one -- a smaller 
               one would be fine.
20:21 <@nickm> Ideal even, maybe.
20:21 <@nickm> There's a new heartbeat message of the form 
20:21 <@nickm> Dec 18 15:15:13.000 [notice] Average packaged cell fullness: 
               91.479%
20:22 <@nickm> If that number is low in practice, I want to know whether 
               running branch bug7743_024_test improves matters.

comment:12 Changed 7 years ago by nickm

If that helps but not enough (bug7743_024_test is just bug7743_023_alt2 merged onto master), there are more things we can try to do in order to improve cell performance, including:

  • Remove the loop from connection_handle_read_impl. It doesn't seem to do anything useful.
  • If, when setting avail_to_package, we find that it is more than RELAY_PAYLOAD_SIZE, round it down to the nearest RELAY_PAYLOAD_SIZE.
  • When reading, round at_most down to avail.

Changed 7 years ago by nickm

Attachment: 11734484.txt added

comment:13 Changed 7 years ago by nickm

Attaching an additional patch, pseudonymously contributed, to handle the two rounding ideas.

comment:14 Changed 7 years ago by cypherpunks

Resolution: worksforme
Status: needs_reviewclosed

comment:15 Changed 7 years ago by aagbsn

Resolution: worksforme
Status: closedreopened

comment:16 Changed 7 years ago by andrea

Stack dump for SIGSEGV seen while testing:

#0 0x0000000000539321 in tor_strlower (s=0x0) at src/common/util.c:517
#1 0x00000000004d56c4 in connection_exit_begin_conn (cell=0x7fffffffde30, circ=0x1bec960)

at src/or/connection_edge.c:2482

#2 0x0000000000429143 in connection_edge_process_relay_cell (cell=0x7fffffffde30, circ=0x1bec960, conn=0x0,

layer_hint=0x0) at src/or/relay.c:1190

#3 0x0000000000425e66 in circuit_receive_relay_cell (cell=0x7fffffffde30, circ=0x1bec960,

cell_direction=CELL_DIRECTION_OUT) at src/or/relay.c:193

#4 0x00000000004b00a9 in command_process_relay_cell (cell=0x7fffffffde30, chan=0x1d9c0e0) at src/or/command.c:407
#5 0x00000000004af32d in command_process_cell (chan=0x1d9c0e0, cell=0x7fffffffde30) at src/or/command.c:147
#6 0x0000000000485fac in channel_queue_cell (chan=0x1d9c0e0, cell=0x7fffffffde30) at src/or/channel.c:2516
#7 0x000000000048cb48 in channel_tls_handle_cell (cell=0x7fffffffde30, conn=0x22fc5e0) at src/or/channeltls.c:921
#8 0x00000000004db608 in connection_or_process_cells_from_inbuf (conn=0x22fc5e0) at src/or/connection_or.c:1937
#9 0x00000000004d80a6 in connection_or_process_inbuf (conn=0x22fc5e0) at src/or/connection_or.c:461
#10 0x00000000004cbba9 in connection_process_inbuf (conn=0x22fc5e0, package_partial=1) at src/or/connection.c:3954
#11 0x00000000004c95c9 in connection_handle_read_impl (conn=0x22fc5e0) at src/or/connection.c:2819
#12 0x00000000004c96f4 in connection_handle_read (conn=0x22fc5e0) at src/or/connection.c:2860
#13 0x000000000040a26f in conn_read_callback (fd=22, event=2, _conn=0x22fc5e0) at src/or/main.c:722
#14 0x00007ffff772f930 in event_process_active (base=0x7e4c70, flags=<value optimized out>) at event.c:395
#15 event_base_loop (base=0x7e4c70, flags=<value optimized out>) at event.c:547
#16 0x000000000040cc77 in do_main_loop () at src/or/main.c:1989
#17 0x000000000040e237 in tor_main (argc=3, argv=0x7fffffffe658) at src/or/main.c:2701
#18 0x0000000000408844 in main (argc=3, argv=0x7fffffffe658) at src/or/tor_main.c:30

Not sure if this is related to this patch or not yet.

comment:17 Changed 7 years ago by nickm

At first glance that seems likely to be related to the BEGIN cell changes in 0.2.4.7-alpha.

comment:18 Changed 7 years ago by nickm

Opened as #7814.

comment:19 Changed 7 years ago by andrea

I'm attaching some data from the series of test runs I did.

Changed 7 years ago by andrea

Attachment: bug7743_notes.txt added

comment:20 Changed 7 years ago by nickm

Keywords: backport-023 regression removed
Priority: criticalnormal
Status: reopenedneeds_review

Okay, it doesn't *appear* that there are horrendous cases of this happening right now, though there are probably opportunities to do better than we're doing now and save some bandwidth. I should re-review all the versions of the code above for the small features deadline if I can.

comment:21 Changed 7 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

The results don't appear to show a big change; deferring further analysis to 0.2.5.

comment:22 Changed 6 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:23 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:24 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:25 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:26 Changed 2 years ago by nickm

Keywords: nagle sponsor8-maybe added
Severity: Normal

comment:27 Changed 2 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.2.x-final
Sponsor: Sponsor8-can

comment:28 Changed 2 years ago by nickm

Keywords: review-group-21 added

comment:29 Changed 2 years ago by nickm

Status: needs_reviewneeds_revision

comment:30 Changed 2 years ago by nickm

Keywords: review-group-21 removed

comment:31 Changed 2 years ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: unspecified

Mark some needs_revision tickets as unspecified. If/when the revisions happen, they can go back into a live milestone.

comment:32 Changed 9 months ago by gaba

Keywords: sponsor8-removed added; sponsor8-maybe removed
Sponsor: Sponsor8-can

comment:33 Changed 4 months ago by nickm

Owner: nickm deleted
Status: needs_revisionassigned

comment:34 Changed 4 months ago by nickm

Status: assignedneeds_revision

None of these revisions are in my near-term plans.

Note: See TracTickets for help on using tickets.