In #7707 (moved), cypherpunks notes that when packaging cells, we can do some really dumb things when bandwidth is low, such reading incoming bytes in a super-slow trickle, and packaging them into cells aggressively.
There are two comparatively easy things I want to do to prevent this:
We should use FIONREAD to determine the number of cells waiting on an incoming edge socket. If we determine that more reading in the future will let us fit more bytes in a cell, we shouldn't package bytes in a cell yet.
We could avoid packaging incoming bytes from a stream for so long as that circuit has any queued cells that originated here, in case more incoming bytes arrive before we'd transmit the current cell.
This feels like a backport candidate to 0.2.3.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
This is urgent to backport to 0.2.3 because of the shorter read bucket refill times. (Alternatively, the authorities could set the refill time to, say, 500 msec until 0.2.3 is obsolete, or tell 0.2.3 to use 500 msec intervals and give 0.2.4 Tors with this patch a new consensus variable to look at.)
The second fix I suggest above is less urgent, since the issue it fixes isn't a regression. It's also trickier to do, since the most obvious implementations would seem to allow a loud stream to starve a quiet stream if they share a circuit.
Trac: Priority: major to critical Keywords: backport-023 tor-relay deleted, backport-023 tor-relay regression added
Oops; there is no such consensus parameter; this is really going to need a backport, I think.
Also, we should probably do this ioctl hack at most once per cell, or else a trickle of incoming bytes might keep us from sending a cell in a timely way. Attaching a variant patch (submitted pseudonymously) to do that.
The current code also delays in the case where data arrives after recv but before the ioctl. That might not be the best choice; it might instead be a good idea to do the ioctl earlier, at the time when we do connection_bucket_read_limit
One way forward, pending a simulation or something in 0.2.3, would be to get this in 0.2.4, along with a patch to track how many wasted bytes it is preventing, and log that as part of the heartbeat.
20:20 <@nickm> I need somebody to try running master (as of 7a99d26c798a2223) on an exit node. It doesn't have to be a big one -- a smaller one would be fine.20:21 <@nickm> Ideal even, maybe.20:21 <@nickm> There's a new heartbeat message of the form 20:21 <@nickm> Dec 18 15:15:13.000 [notice] Average packaged cell fullness: 91.479%20:22 <@nickm> If that number is low in practice, I want to know whether running branch bug7743_024_test improves matters.
If that helps but not enough (bug7743_024_test is just bug7743_023_alt2 merged onto master), there are more things we can try to do in order to improve cell performance, including:
Remove the loop from connection_handle_read_impl. It doesn't seem to do anything useful.
If, when setting avail_to_package, we find that it is more than RELAY_PAYLOAD_SIZE, round it down to the nearest RELAY_PAYLOAD_SIZE.
#0 0x0000000000539321 in tor_strlower (s=0x0) at src/common/util.c:517
#1 0x00000000004d56c4 in connection_exit_begin_conn (cell=0x7fffffffde30, circ=0x1bec960)
at src/or/connection_edge.c:2482
#2 (closed) 0x0000000000429143 in connection_edge_process_relay_cell (cell=0x7fffffffde30, circ=0x1bec960, conn=0x0,
layer_hint=0x0) at src/or/relay.c:1190
#3 (closed) 0x0000000000425e66 in circuit_receive_relay_cell (cell=0x7fffffffde30, circ=0x1bec960,
cell_direction=CELL_DIRECTION_OUT) at src/or/relay.c:193
#4 (closed) 0x00000000004b00a9 in command_process_relay_cell (cell=0x7fffffffde30, chan=0x1d9c0e0) at src/or/command.c:407
#5 (closed) 0x00000000004af32d in command_process_cell (chan=0x1d9c0e0, cell=0x7fffffffde30) at src/or/command.c:147
#6 (closed) 0x0000000000485fac in channel_queue_cell (chan=0x1d9c0e0, cell=0x7fffffffde30) at src/or/channel.c:2516
#7 (closed) 0x000000000048cb48 in channel_tls_handle_cell (cell=0x7fffffffde30, conn=0x22fc5e0) at src/or/channeltls.c:921
#8 (closed) 0x00000000004db608 in connection_or_process_cells_from_inbuf (conn=0x22fc5e0) at src/or/connection_or.c:1937
#9 (closed) 0x00000000004d80a6 in connection_or_process_inbuf (conn=0x22fc5e0) at src/or/connection_or.c:461
#10 (closed) 0x00000000004cbba9 in connection_process_inbuf (conn=0x22fc5e0, package_partial=1) at src/or/connection.c:3954
#11 (closed) 0x00000000004c95c9 in connection_handle_read_impl (conn=0x22fc5e0) at src/or/connection.c:2819
#12 (closed) 0x00000000004c96f4 in connection_handle_read (conn=0x22fc5e0) at src/or/connection.c:2860
#13 (closed) 0x000000000040a26f in conn_read_callback (fd=22, event=2, _conn=0x22fc5e0) at src/or/main.c:722
#14 (closed) 0x00007ffff772f930 in event_process_active (base=0x7e4c70, flags=) at event.c:395
#15 (closed) event_base_loop (base=0x7e4c70, flags=) at event.c:547
#16 (closed) 0x000000000040cc77 in do_main_loop () at src/or/main.c:1989
#17 (closed) 0x000000000040e237 in tor_main (argc=3, argv=0x7fffffffe658) at src/or/main.c:2701
#18 (closed) 0x0000000000408844 in main (argc=3, argv=0x7fffffffe658) at src/or/tor_main.c:30
Not sure if this is related to this patch or not yet.