Opened 7 years ago

Closed 7 years ago

#6465 closed project (implemented)

Build abstraction layer around TLS

Reported by: andrea Owned by: andrea
Priority: High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Keywords: tor-relay
Cc: sjmurdoch Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Right now the code in connection.c/connection_or.c is tightly coupled to TLS. This makes it hard to experiment with different transports. A new abstraction, channel_t, should be introduce which hides the details of how cells get from one node to the next.

Possible follow-on: add mechanism for nodes to advertise support for multiple transports, replace IPv4 addresses in cells with flexible variable-length transport-specific address fields, investigate path selection if multi-transport network means the connectivity graph is no longer a complete graph on the set of all nodes.

Child Tickets

Change History (47)

comment:1 Changed 7 years ago by arma

Cc: sjmurdoch added

comment:2 Changed 7 years ago by nickm

I'll talk a little about the abstraction layer that I think we discussed, so that we have a record and so that sjmurdoch can let us know whether it works well.

The simplest abstraction is for a "channel" to have an interface like (abstractly) "Start opening a new channel to host X"; "Send a cell on this channel." It should have callbacks like "The channel opened successfully" and "The channel closed unexpectedly" and "We received this cell on this channel."

More (potentially) controversially, the channel itself should take the responsibility for authentication, encryption, etc. The channel should not report itself as having opened until it has done (approximately) everything that's currently done on OR connections through the NETINFO exchange.

Each channel is also responsible for making sure that it interacts properly with the rate-limiting system (by asking how much it's allowed to read/write before it does so, and by informing the rate-limiting system how much it has read/written).

That's the plan, at least.

comment:3 Changed 7 years ago by andrea

For my own reference, here is a list of all places outside connection_or.c that call into it, with a brief note on what will need to be done to it:

circuitbuild.c:1956 calls connection_or_get_for_extend()
circuitbuild.c:1971 calls connection_or_connect()

  • These are in circuit_handle_first_hop(), will need to be replaced with channel_*() calls.

circuitbuild.c:2469 calls connection_or_get_for_extend()
circuitbuild.c:2489 calls connection_or_connect()

  • These are in circuit_extend(), similar to above

circuitbuild.c:2853 calls connection_or_nonopen_was_started_here()

  • Looks like it uses this to test whether a connection was locally initiated, to decide whether to call router_orport_found_reachable(). I'll have to figure out something analogous for channelized circuits I guess.

circuitbuild.c:5537 calls connection_or_set_bad_connections()

  • This is in entries_retry_helper() and involved in declaring a node down; this will need to be changed to an abstract channel level (not per channel subclass, since a node could speak multiple transports and is presumably down for all of them) mechanism.

circuitlist.c:927 calls connection_or_unlink_all_active_circs()

  • This is circuit_unlink_all_from_or_conn(), which will need to be replaced with circuit_unlink_all_from_channel()

circuitlist.c:1310 calls connection_or_send_destroy()
circuitlist.c:1337 calls connection_or_send_destroy()

  • These are both in _circuit_mark_for_close(); connection_or_send_destroy() will need to be replaced with channel_send_destroy()

command.c:368 calls connection_or_send_destroy()

  • This is when processing a CREATE cell while shutting down; will need to replace with channel_send_destroy() and make sure these command_process_* functions get called from the channel receive callback mechanism.

command.c:380 calls connection_or_send_destroy()
command.c:393 calls connection_or_send_destroy()

  • Other CREATE-denial cases similar to above

command.c:642 calls connection_or_nonopen_was_started_here()

  • enter_v3_handshake_with_cell(): how are we going to handle this? TLS handshakes need to live below the channel layer but cell handling is above.

command.c:672 calls connection_or_nonopen_was_started_here()
command.c:732 calls connection_or_send_netinfo()
command.c:767 calls connection_or_send_versions()
command.c:774 calls connection_or_send_certs_cell()
command.c:781 calls connection_or_send_auth_challenge_cell()
command.c:788 calls connection_or_send_netinfo()

  • command_process_versions_cell() this will have to get called by the channel callback mechanism handling received cells. The connection_or_send_netinfo() call will need to become something like channel_send_netinfo(). Similar for other connection_or_send_* functions.

command.c:843 calls connection_or_set_circid_type()
command.c:845 calls connection_or_init_conn_from_address()
command.c:926 calls connection_or_set_state_open()

  • These are in command_process_netinfo_cell(); need to figure out how this bookkeeping stuff works; the state probably needs to become per channel rather than per or_connection_t.

command.c:1073 calls connection_or_set_circid_type()
command.c:1077 calls connection_or_client_learned_peer_id()
command.c:1116 calls connection_or_send_netinfo()

  • These are all in command_process_certs_cell(); similar comments to above.

command.c:1193 calls connection_or_send_authenticate_cell()
command.c:1204 calls connection_or_send_netinfo()

  • These are in command_process_auth_challenge_cell(); the connection_or_send_* functions need to become channel_send_*.

command.c:1275 calls connection_or_compute_authenticate_cell_body()
command.c:1333 calls connection_or_set_circid_type()
command.c:1336 calls connection_or_init_conn_from_address()

  • In command_process_authenticate_cell(); connection_or_compute_authenticate_cell_body() should probably become channel_compute_authenticate_cell_body().

config.c:1688 calls connection_or_update_token_buckets()

  • This is called while parsing config files if bandwidth limits change; does rate-limiting happen above or below the channel abstraction?

connection.c:556 calls connection_or_remove_from_identity_map()
connection.c:589 calls connection_or_remove_from_identity_map()

  • These are involved in freeing connections; they should check for the CONN_TYPE_OR case as now, but that will become part of a subclass of the channel_t mechanism, so they'll have to make a callback to a close handler through that, which is where these will happen instead.

connection.c:621 calls connection_or_about_to_close()

  • Similar to above I think.

connection.c:2678 calls connection_or_connect_failed()

  • Part of connection_handle_read_impl(); the CONN_TYPE_OR case will need to relay cells up to a channel_t callback now.

connection.c:3027 calls connection_or_connect_failed()

  • Will have to pass the error on up through a callback here too.

connection.c:3202 calls connection_or_connect_failed()

  • Error handling for connection_handle_write_impl(); similar to other error cases.

connection.c:3828 calls connection_or_process_inbuf()

  • This can stay unchanged, but connection_or_process_inbuf() will have to pass the cells up to the right callback instead of whatever it does with them now.

connection.c:3857 calls connection_or_flushed_some()

  • I think this can stay unchanged, but connection_or_flushed_some() will have to be checked.

connection.c:3887 calls connection_or_finished_flushing()
connection.c:3925 calls connection_or_finished_connecting()
connection.c:3943 calls connection_or_reached_eof()

  • Similar to above.

connection.c:4284 calls connection_or_clear_identity_map()

  • This will have to become a callback freeing all channels I think.

connection_edge.c:3072 calls connection_or_digest_is_known_relay()

  • This is a check while opening a new exit connection; it should be per-channel I think.

control.c:4702 calls connection_or_report_broken_states()

  • This should probably remain, since connection_or.c will still handle TLS stuff

dirserv.c:3386 calls connection_or_connect()
dirserv.c:3398 calls connection_or_connect()

  • These open connections for IPv4 and IPv6 respectively; should call a channel subclass instead.

main.c:399 calls connection_or_remove_from_identity_map()

  • Connection cleanup in connection_unlink(); this gets used for lots of different connection types; do we need to split CONN_TYPE_OR out into something separate that handles channel_t? What calls connection_unlink()?

main.c:1054 calls connection_or_connect_failed()
main.c:1095 calls connection_or_write_cell_to_buf()

  • run_connection_housekeeping(): closing bad connections and sending keepalives; these become channel_* functions.

main.c:1484 calls connection_or_set_bad_connections()

  • run_scheduled_events(): also related to housekeeping stuff; probably should be tracking bad channels instead.

networkstatus.c:1840 calls connection_or_update_token_buckets()

  • Handle updated info about rate limits in networkstatus_set_current_consensus(); where are we handling rate- limiting?

relay.c:1236 calls connection_or_send_destroy()

  • In connection_edge_process_relay_cell() for RELAY_COMMAND_TRUNCATE; should become channel_send_destroy().

relay.c:2286 calls connection_or_unlink_all_active_circs()

  • This is part of how we track which circuits on an or_connection_t have pending cells; it should become channel_unlink_all_active_circs().

relay.c:2364 calls connection_or_flush_from_first_active_circuit()

  • Similar to preceding; should become channel_flush_from_first_active_circuit().

relay.c:2543 calls connection_or_flush_from_first_active_circuit()

  • This is in append_cell_to_circuit_queue(), the orconn parameter needs to become a channel_t.

comment:4 Changed 7 years ago by andrea

Current status of Tor with channel_t (see branch bug6465 in my repository):

Status:

  • Bootstrap works
  • Client functionality works
  • Hidden services work as client
  • Hidden services work as server
  • Middle relay works
    • Completed 4 hour test run at 0425 UTC 2012-09-11
    • Completed 30 minute Valgrind test run at 1020 UTC 2012-09-11
      • No leaks detected
  • Exit relay works
    • Completed 4 hour test run at 0937 UTC 2012-09-11
    • Completed 30 minute Valgrind test run at 1053 UTC 2012-09-11
      • No leaks detected

comment:5 Changed 7 years ago by andrea

Remaining issues:

  • Add queries for channels in control.c paralleling those for connections
  • Figure out relay.c:2475 assert (tmp == cell_ewma)
    • This has been observed in both middle and exit configurations
    • It seems to be very sporadic, occuring no more than about once every six hours of operation

comment:6 Changed 7 years ago by andrea

Sample debug log with some extra logging added for the tmp == cell_ewma assert:

Sep 11 04:25:16.000 [debug] conn_write_callback(): socket 26 wants to write.
Sep 11 04:25:16.000 [debug] flush_chunk_tls(): flushed 512 bytes, 0 ready to flush, 0 remain.
Sep 11 04:25:16.000 [debug] connection_handle_write_impl(): After TLS write of 512: 0 read, 586 written
Sep 11 04:25:16.000 [debug] channel_write_packed_cell(): Writing packed_cell_t 0xd921b8 to channel 0x1996790 with global ID 206
Sep 11 04:25:16.000 [debug] conn_write_callback(): socket 26 wants to write.
Sep 11 04:25:16.000 [debug] flush_chunk_tls(): flushed 512 bytes, 0 ready to flush, 0 remain.
Sep 11 04:25:16.000 [debug] connection_handle_write_impl(): After TLS write of 512: 0 read, 586 written
Sep 11 04:25:16.000 [debug] channel_write_packed_cell(): Writing packed_cell_t 0xd923c8 to channel 0x1996790 with global ID 206
Sep 11 04:25:16.000 [debug] conn_write_callback(): socket 26 wants to write.
Sep 11 04:25:16.000 [debug] flush_chunk_tls(): flushed 512 bytes, 0 ready to flush, 0 remain.
Sep 11 04:25:16.000 [debug] connection_handle_write_impl(): After TLS write of 512: 0 read, 586 written
Sep 11 04:25:16.000 [debug] channel_write_packed_cell(): Writing packed_cell_t 0xd8b468 to channel 0x1996790 with global ID 206
Sep 11 04:25:16.000 [debug] channel_write_packed_cell(): Writing packed_cell_t 0xd8b678 to channel 0x1996790 with global ID 206
Sep 11 04:25:16.000 [debug] channel_flush_from_first_active_circuit(): tmp == cell_ewma assert about to be hit: 0x19af218, 0x1b568b8
Sep 11 04:25:16.000 [err] channel_flush_from_first_active_circuit(): Bug: relay.c:2480: channel_flush_from_first_active_circuit: Assertion tmp == cell_ewma failed; aborting.

comment:7 Changed 7 years ago by andrea

Note in the previous log segment that the cell that triggers the assert is the second call to channel_write_packed_cell() after that invocation of connection_handle_write_impl().

Is the trigger for this calling channel_flush_from_first_active_circuit() and letting it write more than one cell? When invoked that way, it only picks the circuit once at the start of the function, but then updates the priority queue after each cell and asserts that it still matches the originally selected circuit. What happens when the head of the priority queue changes between iterations due to whatever vagary of the EWMA code?

comment:8 Changed 7 years ago by andrea

If that hypothesis is correct, though, then this bug was present prior to channels. The same loop structure and assert are present in the relay.c that the channel_t branch forked from: https://gitweb.torproject.org/user/andrea/tor.git/blob/08e65ce04fbc68389095c627ae752a9784f5739a:/src/or/relay.c#l2383

Possibility: the bug was always present, but the way the channel code affected the write path made it more likely to call channel_flush_from_first_active_circuit() and ask for more than one cell than ever was the case with connection_or_flush_from_first_active_circuit() ?

comment:9 Changed 7 years ago by andrea

The bug is that with max > 1 channel_flush_from_first_active_circuit() only picks a circuit once, and then flushes cells from it in a loop, and if the EWMA code causes the priority order to change partway through that loop, a different circuit will come to the head of the pqueue and the assert will fail.

This was a previously existing bug, but the old code always called with max == 1, so it was not exposed before channels. This function could use a bit of a refactor anyway, I think. I'm going to do that and fix the bug, and since it would create merge conflicts it's best if that gets done based on the channel_t code and merged into master at the same time, but it's logically an independent task so I'm going to break it out into a separate ticket.

comment:10 Changed 7 years ago by andrea

This is now rebased against the latest master: https://gitweb.torproject.org/user/andrea/tor.git/shortlog/refs/heads/bug6465_rebased

I've tested that client functionality works in the rebased branch, as well as hidden services as client and server. I'll test middle and exit relay configurations next, and then make a squashed branch and work on a fix for 6816 (https://trac.torproject.org/projects/tor/ticket/6816).

comment:11 Changed 7 years ago by andrea

Rebased version successfully completes two hour test run as middle relay.

comment:12 Changed 7 years ago by andrea

Rebased version successfully completes two hour test run as exit relay

comment:13 Changed 7 years ago by andrea

Valgrind with rebased branch as a client and providing a hidden service finds no evident leaks, but some channels (Valgrind reports memory as still rechable) are left open at exit time. I should add a channel cleanup wherever connections get cleaned up when handling SIGTERM.

comment:14 Changed 7 years ago by andrea

Results for Valgrind on the rebased branch in a two-hour test run as a middle relay are similar to running as a client. A few channels are left open at exit time, but I do not believe any resource leaks leading to increased usage over time exist.

comment:15 in reply to:  13 Changed 7 years ago by nickm

Replying to andrea:

Valgrind with rebased branch as a client and providing a hidden service finds no evident leaks, but some channels (Valgrind reports memory as still rechable) are left open at exit time. I should add a channel cleanup wherever connections get cleaned up when handling SIGTERM.

By convention, if resources allocated at the module level, we have a MODULE_free_all() function that gets called from tor_free_all() before exiting.

comment:16 Changed 7 years ago by andrea

Channels now clean up properly at exit in my bug6465_rebased branch, no channel-related unfreed blocks at exit are apparent in Valgrind output for client and hidden service configurations. I'm testing relay configs now.

comment:17 Changed 7 years ago by andrea

The dumpstats() function in main.c now calls a new channel_dumpstats() to spam the log with copious information on active channels.

comment:18 Changed 7 years ago by andrea

My new bug6465_rebased_2 branch is rebased against the latest master and includes the fix for 6341; I expect that function to be substantially rewritten for 6816 before this is merged, though.

comment:19 Changed 7 years ago by andrea

Squashed branch bug6465_squashed (https://gitweb.torproject.org/user/andrea/tor.git/shortlog/refs/heads/bug6465_squashed) is now available. This will be the basis for a new bug6816 branch to fix channel_flush_from_first_active() and refactor circuit selection, then that should be assert-free and ready to merge.

comment:20 Changed 7 years ago by andrea

Status: newneeds_review

comment:21 Changed 7 years ago by nickm

General notes as I read. There will likely be a few of these as this is a big patch. I'm reading though the _squashed branch with git log -p --reverse, so if I say something now that a later commit in that branch will make nonsensical, that's probably why.

PART 1:

  • Functions like channel_register probably want to note not only what they do, but when you must call them. Similarly, when there's a field in a data structure that shouldn't ever be changed except by a particular group of functions, that field's documentation should say so.
  • Your doxygen usage style isn't the one we've been using. Some stuff (like using @param and @return) is cool and we should consider whether we want to do that more. But some stuff (like dropping the imperative-sentence) style shouldn't proliferate. This doesn't need a cleanup before we can merge this branch, but should get made more uniform sooner or later once we decide what to do.
  • You're doing a printf of global_identifier using the %lu format. That won't work on systems where long isn't 64 bits, right? To format a uint64 portably, you need to do the dance with U64_FORMAT and U64_PRINTF_ARG.
  • channel_remove_from_digest_map is very very complex. Can anything be done to make it simpler? For instance, could the error-case code get extracted into a separate function or something?
  • Assignments like "chan->u.cell_chan.prev_with_same_id->u.cell_chan.next_with_same_id = chan->u.cell_chan.next_with_same_id;" are pretty verbose and make me wonder whether we've got some lack-of-genericity problem. Not sure if we need to solve them now or what the right solution is.
  • I don't see the point of the rv=tmp dance in channel_find_by_remote_digest.
  • channel_find_by_remote_nickname really shouldn't exist: it shouldn't be necessary to look up a channel by the peer's nickname, since nicknames are pretty fragile. (I note that it seems to have no callers.
  • Style issue: you seem to like single-point of return a little more than me. That's fine; but please don't think you need to do it on my account. For instance, channel_next_with_digest could probably have its body replaced with the two asserts and "return chan->u.cell_chan.next_with_same_id"
  • Should listener channels be a different type entirely from cell channels? They seem to have very little in common. IOW, so far in my reading, it looks like there aren't a lot of the same operations that it makes sense to invoke on one as on another, or a lot of code that makes sense to share. There are also a lot of functions that have to assert that the channel_t they got is really one or another type of channel_t, which also suggests that using the type system here could be right. I also get this impression from looking at the data structure.
  • By convention, it's always a no-op to pass NULL to any of Tor's foo_free() functions. channel_free() doesn't follow this convention.
  • If channel_force_free() should only be called from channel_free_all(), should it be static?
  • In channel_free, we free the active_circuit_pqueue smartlist, but don't do anything to its members. Should it be documented what does mark/free them?
  • In channel_force_free(), we tor_free one or more cell_queue_entry_t items, which is presumably done elsewhere for cases where we would hit channel_free(). Can/should that be extracted into its own function?
  • When we're freeing cell_queue_entry_t items in that function, what frees the underlying cell_t/var_cell_t/packed_cell_t ? Maybe there should be a cell_queue_entry_free() function
  • Doing a SMARTLIST_DEL_CURRENT() when freeing items in a list that you're about to smartlist_free() is usually unnecessary.
  • re channel_get_listener: For functions that return function pointers, I strongly prefer to have a typedef for the function type so as to avoid the Trickiest Syntax In C. Also, we're using the word "listener" here both to mean a kind of a channel and a function. Maybe the function that returns the function should be called channel_get_listener_fn() or channel_get_listener_cb() or channel_get_listener_handler() or something?
  • Let's eliminate every function to set the channel cell handlers other than channel_set_cell_handlers(). That's the only one we currently use, and I don't see why you'd need the other two. As they stand, they're mostly duplicated code. If we need them later, we can reimplement them by calling e.g. channel_set_sell_handlers(chan, channel_get_old_cell_handler(chan), new_var_cell_handler);
  • By convention, channel_request_close() seems like what we would name channel_mark_for_close. Is it different enough from the existing mark_for_close() functions that we should keep its current name?
  • In channel_closed(), we only call "circuit_n_chan_done(chan, 0);" in one of the close paths. What tells those circuits that the channel is closing otherwise? Or am I missing something?
  • channel_clear_identity_digest makes me kind of wish that we had a bit on the channel_t to tell us whether the channel should be in the id-digest map. Looking at states like this feels a little fragile.
  • re channel_write_cell: As I understand it, we have one function here that'll want to queue a cell, and another that will want to deliver it. From the documentation, I can't tell which one this function is.
  • In channel_write_cell: It looks like there are only two possible return values from write_cell looking at this function's use of it. I would have at least expected "I could write it" "I couldn't write it", and "I got an error." Must investigate later.
  • Using a smartlist of cells for an outgoing queue makes me a little nervous. Popping the first item from this queue is going to be O(N).
  • Probably there should be a cell_queue_entry_new() function [or family of functions] to allocate cell_queue_entry_t objects.
  • Does connection_write_cell() potentially double-flush? That is, can it try to call write_cell, then call channel_flush_cells() on the same cell even if the first write didn't work? If so, is that a problem?
  • The channel_write*cell() functions are mostly duplicate code with each other. That needs to get fixed; there should be one implementation function here and two-three wrappers, I think.
  • The comment "+ /* That shouldn't happen; bail out */" in channel_flush_some_cells_from_outgoing_queue seems orphaned?
  • The swtich statement in channel_flush_some_cells_from_outgoing_queue seems like it could stand to be refactored into its own function, maybe with some of the common code between the different cases eliminated.

comment:22 Changed 7 years ago by nickm

PART 2:

A general thing: this branch doesn't actually compile for me. I get lots of warnings like this:

src/or/channel.c:1595: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 7 has type ‘uint64_t’

Please try building on a more recent GCC and/or a 64-bit platform and/or a 32-bit platform and/or with a recent clang -- whatever you haven't tried yet.

Another general thing: I'm not done reading yet, but I'm a little nervous about queues and buffers here. As I understand it, there will be 5 layers where cells get queued:

  • The circuit's queue of cells
  • The channel's queue of cells [*]
  • The or_connection_t's queue of data to encrypt (conn->outbuf)
  • SSL's write buffer. (ssl->wbuf)
  • The kernel's write buffer.

The one marked with [*] above is new; what does it give us that conn->outbuf doesn't? And what are we doing to make sure it doesn't get too big? We don't want it getting large, since we want to use the circuit pqueue mechanism to choose an active circuit late. Filling buffers aggressively here means that loud circuits fill the queue, and quiet low-latency circuits don't get heard.

Last general thing: Some of these functions should probably take const channel_t*, not channel_t*.

Other stuff, going on with my reading:

  • In channel_process_incoming, do we want to process that queue in-order? It seems that doing the DEL_CURRENT approach is likely to have weird consequences. Perhaps it would be better to extract all the members into a separate list (smartlist_add_all(), smartlist_clear()), then process the separate list, then free it.
  • Should channel_more_to_flush() check outgoing_queue rather than cell_queue?
  • Should cell_queue be renamed incoming_queue for consistency? I think it should.
  • In channel_process_cells, I feel like maybe we're going to hit trouble if there isn't a way to break out in the middle of processing all those cells? I guess that might be happening in the middle of the processing functions, where they note if there's an error and don't process anything if an error has occurred.
  • It seems like channel_process_cells, channel_queue_cell, and channel_queue_var_cell have some common code that should be extracted.
  • In channel_send_destroy, I am confused about why we're calling channel_write_cell instead of channel_queue_cell. In fact, it's likely that I'm going to have this confusion about every channel_write_cell/channel_queue_cell instance. Hmm. I wonder if there's any way we can make those more distinct/obvious/safe?
  • I'm confused about why channel_send_destroy() is here, but there aren't other channel_send_foo() functions.
  • The first part of channel_free_all() seems redundant with channel_run_cleanup().
  • Typo in comment in channel.h : "see thw channel_t"
  • Crazy idea: How badly, if at all, would we break things by making the channel_s definition internal so that only channel.c (and later others if needed) was allowed to look at it? It's not how we do most stuff right now, but I bet it would make us happier in the long run.

comment:23 Changed 7 years ago by nickm

PART 3:

Woo. 29% done.

General stuff: I'm perpetually terrified of breaking the handshake in a way to allow us to count as authenticated, or to process cells we shouldn't, or to send data we shouldn't, without actually completing the TLS handshake and verifying the other party with the Tor handshake. I'm also perpetually afraid of breaking the v2 or v1 TLS handshakes and not noticing because I only tested master against master.

  • Weird thing: connection_or.c in your branch is 2322 lines long. In master, it's 2290 lines long. I would have expected it to get shorter as code moved out.
  • The BASE_CHAN_TO_TLS(c) and TLS_CHAN_TO_BASE(c) macros are dicey; look at how connection_t and circuit_t handle these for safety.
  • The new log_debug in connection_mark_for_close makes me wonder: can we be checking for this case and warning/asserting if it hits? If the new rule is "never call connection_mark_for_close on an or_conn directly", that's going to be fragile.
  • We'd better audit to make sure there are no more conn->state changes in connection_or.c
  • Where did that big block of code in connection_or_set_state_open go?

comment:24 Changed 7 years ago by nickm

PART 4:

I'm up to fdf8a75acea4795027ee3c815b49912fadd2afc4.

Oh, a subtle point I should have added to the list of stuff I worry about: I worry about making sure that there can be no data on the TLS connection before the handshake, and no data on the TLS connection before the VERSIONS cell. If anything gets sent first, the connection needs to close.

  • The removal of stats_n_vpadding_cells_processed and stats_n_padding_cells_processed is a little worrisome; those cell types are still real, generic cell types. Must investigate to make sure they're still handled right. I think these probably need to go back in command.c

Ok, I think I'm through the branch. Pfew!

comment:25 Changed 7 years ago by andrea

Responses to part 1 (points I generally agree with/am fine with going ahead and adjusting without debate):

Functions like channel_register probably want to note not only what they do,
but when you must call them. Similarly, when there's a field in a data
structure that shouldn't ever be changed except by a particular group of
functions, that field's documentation should say so.

Yeah, will do.

You're doing a printf of global_identifier using the %lu format. That won't
work on systems where long isn't 64 bits, right? To format a uint64
portably, you need to do the dance with U64_FORMAT and U64_PRINTF_ARG.

Yeah, that's true. I think I meant to fix that and forgot to make a note of it.

channel_remove_from_digest_map is very very complex. Can anything be done
to make it simpler? For instance, could the error-case code get extracted
into a separate function or something?

Assignments like "chan->u.cell_chan.prev_with_same_id->
u.cell_chan.next_with_same_id = chan->u.cell_chan.next_with_same_id;" are
pretty verbose and make me wonder whether we've got some lack-of-genericity
problem. Not sure if we need to solve them now or what the right solution
is.

Hmm, it doesn't feel all that complex to me because it's a straightforward
doubly linked list pattern, but I see now that you mention it. It got that
way because the old version had a singly linked list attached to
or_connection_t that I moved to channel_t, and then picked up the previous
link for something I didn't end up using. Looks like we can simplify some
of that.

channel_find_by_remote_nickname really shouldn't exist: it shouldn't be
necessary to look up a channel by the peer's nickname, since nicknames are
pretty fragile. (I note that it seems to have no callers.

Yeah, that can go.

By convention, it's always a no-op to pass NULL to any of Tor's foo_free()
functions. channel_free() doesn't follow this convention.

Okay, will change that.

If channel_force_free() should only be called from channel_free_all(), should
it be static?

Yeah, I think so.

In channel_force_free(), we tor_free one or more cell_queue_entry_t items,
which is presumably done elsewhere for cases where we would hit
channel_free(). Can/should that be extracted into its own function?

Yeah, I think so.

When we're freeing cell_queue_entry_t items in that function, what frees the
underlying cell_t/var_cell_t/packed_cell_t ? Maybe there should be a
cell_queue_entry_free() function.

Yeah.

Probably there should be a cell_queue_entry_new() function [or family of
functions] to allocate cell_queue_entry_t objects.

Agreed.

Let's eliminate every function to set the channel cell handlers other than
channel_set_cell_handlers(). That's the only one we currently use, and I
don't see why you'd need the other two. As they stand, they're mostly
duplicated code. If we need them later, we can reimplement them by calling
e.g. channel_set_sell_handlers(chan, channel_get_old_cell_handler(chan),
new_var_cell_handler);

Yeah, okay.

I don't see the point of the rv=tmp dance in channel_find_by_remote_digest.

Yeah, me neither. :)

The comment "+ /* That shouldn't happen; bail out */" in
channel_flush_some_cells_from_outgoing_queue seems orphaned?

Yeah, looks like it.

Should listener channels be a different type entirely from cell channels?
They seem to have very little in common. IOW, so far in my reading, it looks
like there aren't a lot of the same operations that it makes sense to invoke
on one as on another, or a lot of code that makes sense to share. There are
also a lot of functions that have to assert that the channel_t they got is
really one or another type of channel_t, which also suggests that using the
type system here could be right. I also get this impression from looking at
the data structure.

Yeah, I had a bit of the same sense myself, so if you saw it the same way it
should probably get changed.

Doing a SMARTLIST_DEL_CURRENT() when freeing items in a list that you're
about to smartlist_free() is usually unnecessary.

Okay.

re channel_get_listener: For functions that return function pointers, I
strongly prefer to have a typedef for the function type so as to avoid the
Trickiest Syntax In C. Also, we're using the word "listener" here both to
mean a kind of a channel and a function. Maybe the function that returns
the function should be called channel_get_listener_fn() or
channel_get_listener_cb() or channel_get_listener_handler() or something?

Okay.

Your doxygen usage style isn't the one we've been using. Some stuff (like
using @param and @return) is cool and we should consider whether we want
to do that more. But some stuff (like dropping the imperative-sentence)
style shouldn't proliferate. This doesn't need a cleanup before we can
merge this branch, but should get made more uniform sooner or later once
we decide what to do.

Okay; I'm fine with either.

Style issue: you seem to like single-point of return a little more than
me. That's fine; but please don't think you need to do it on my account.
For instance, channel_next_with_digest could probably have its body replaced
with the two asserts and "return chan->u.cell_chan.next_with_same_id"

Okay, sure.

By convention, channel_request_close() seems like what we would name
channel_mark_for_close. Is it different enough from the existing
mark_for_close() functions that we should keep its current name?

Yeah, maybe better to be consistent there.

The channel_write*cell() functions are mostly duplicate code with each
other. That needs to get fixed; there should be one implementation function
here and two-three wrappers, I think.

Okay.

re channel_write_cell: As I understand it, we have one function here that'll
want to queue a cell, and another that will want to deliver it. From the
documentation, I can't tell which one this function is.

Okay, I should make that clearer then.

The swtich statement in channel_flush_some_cells_from_outgoing_queue seems
like it could stand to be refactored into its own function, maybe with some
of the common code between the different cases eliminated.

Yeah, that is kinda hairy.

comment:26 Changed 7 years ago by andrea

Responses to part 1 (points in potential need of further discussion):

In channel_free, we free the active_circuit_pqueue smartlist, but don't do
anything to its members. Should it be documented what does mark/free them?

The members are circuits, and they shouldn't be freed there because that
queue just tracks which are ready to send; they get added/removed all the
time other than from channel_free(). It's a bit ugly, but it was that way
on or_connection_t and just got moved, and it's going to go away when I
finish 6816 anyway.

Does connection_write_cell() potentially double-flush? That is, can it try
to call write_cell, then call channel_flush_cells() on the same cell even
if the first write didn't work? If so, is that a problem?

(Assuming you meant channel_write_cell()) It could, but only if write_cell()
failed the first time, so if channel_flush_cells() calls it again it should
either succeed or be idempotent.

channel_clear_identity_digest makes me kind of wish that we had a bit on the
channel_t to tell us whether the channel should be in the id-digest map.
Looking at states like this feels a little fragile.

Change that to 'function to compute whether a channel should be in the id-
digest map' and I'll go with it; adding a bit just means a bit that needs to
be kept in sync with the other state and creates room for bugs if it isn't.

In channel_closed(), we only call "circuit_n_chan_done(chan, 0);" in one
of the close paths. What tells those circuits that the channel is closing
otherwise? Or am I missing something?

The circuit_unlink_all_from_channel(); I think it's the same logic
or_connection_t used to have.

Using a smartlist of cells for an outgoing queue makes me a little nervous.
Popping the first item from this queue is going to be O(N).

Agreed, but you were complaining about me having all that doubly-linked list
logic elsewhere. Either make up your mind, or we need to add some stuff to
src/common/container.c (and while we're at it I wouldn't mind a balanced tree
usable for maps and sets for something with 6816).

In channel_write_cell: It looks like there are only two possible return
values from write_cell looking at this function's use of it. I would have
at least expected "I could write it" "I couldn't write it", and "I got an
error." Must investigate later.

'Must investigate later'? I think your code review has a bug there. :)

Anyway, if there's an error that can be retried, the lower layer should just
return 'I couldn't write it' and it'll get retried, and if it can't, then
the lower layer can close the channel for it. Should there be some other
behavior?

comment:27 Changed 7 years ago by andrea

Responses to part 2 (points I generally agree with/am fine with going ahead and adjusting without debate):

Last general thing: Some of these functions should probably take const
channel_t*, not channel_t*.

Okay.

A general thing: this branch doesn't actually compile for me. I get lots
of warnings like this:

src/or/channel.c:1595: warning: format %lu expects type long unsigned int,
but argument 7 has type uint64_t

Please try building on a more recent GCC and/or a 64-bit platform and/or a
32-bit platform and/or with a recent clang -- whatever you haven't tried yet.

Yeah, that should change.

Typo in comment in channel.h : "see thw channel_t"

Yeah, I obviously meant 'thaw channel_t' :)

Should channel_more_to_flush() check outgoing_queue rather than cell_queue?
Should cell_queue be renamed incoming_queue for consistency? I think it
should.

Yeah, yes to both I think. Good catch with channel_more_to_flush().

In channel_process_incoming, do we want to process that queue in-order? It
seems that doing the DEL_CURRENT approach is likely to have weird
consequences. Perhaps it would be better to extract all the members into a
separate list (smartlist_add_all(), smartlist_clear()), then process the
separate list, then free it.

Yeah, I think that's probably better.

It seems like channel_process_cells, channel_queue_cell, and
channel_queue_var_cell have some common code that should be extracted.

Yeah, probably.

The first part of channel_free_all() seems redundant with
channel_run_cleanup()

Yeah, that can change.

comment:28 Changed 7 years ago by andrea

Responses to part 2 (points in potential need of further discussion):

Another general thing: I'm not done reading yet, but I'm a little nervous
about queues and buffers here. As I understand it, there will be 5 layers
where cells get queued:

The circuit's queue of cells
The channel's queue of cells [*]
The or_connection_t's queue of data to encrypt (conn->outbuf)
SSL's write buffer. (ssl->wbuf)
The kernel's write buffer.

The one marked with [*] above is new; what does it give us that conn->outbuf
doesn't? And what are we doing to make sure it doesn't get too big? We don't
want it getting large, since we want to use the circuit pqueue mechanism to
choose an active circuit late. Filling buffers aggressively here means that
loud circuits fill the queue, and quiet low-latency circuits don't get heard.

Well, under most circumstances the channel's outgoing cell queue can't get
too large because if the channel is in the OPEN state and write_cell succeeds
then it can go straight on down. The normal path is for the lower layer to
decide it wants more cells, call channel_flush_some_cells(), which drains the
outgoing queue if any, and then calls channel_flush_from_first_active_circuit()
to do the ewma circuit selection thing and get more. The only way for cells
to get into the queue other than by request from the lower layer like that is a
few places where they get queued for events that only happen once per channel
lifetime or once per circuit lifetime, mostly calls to channel_send_destroy()
from various places.

I wouldn't totally mind finding a way to simplify this queueing structure,
though.

Crazy idea: How badly, if at all, would we break things by making the
channel_s definition internal so that only channel.c (and later others if
needed) was allowed to look at it? It's not how we do most stuff right now,
but I bet it would make us happier in the long run.

I'd like that better and tried to do it that way originally, and had to move
it because channel_t had to carry all the circuit queue stuff or_connection_t
used to and the circuit code reaches into that in a bazillion places. It
might get a lot easier to deal with it after I refactor for 6816 though.

I'm confused about why channel_send_destroy() is here, but there aren't other
channel_send_foo() functions.

Because that's the only one that gets called from things that happen above the
channel layer. There are still a bunch of connection_or_send_foo() functions,
but they only get used when handshaking in channeltls.c.

In channel_process_cells, I feel like maybe we're going to hit trouble if
there isn't a way to break out in the middle of processing all those cells?
I guess that might be happening in the middle of the processing functions,
where they note if there's an error and don't process anything if an error
has occurred.

The latter; the cell handlers that calls are just the old code path in
command.c, less the handshaking stuff that went to channeltls.c.

In channel_send_destroy, I am confused about why we're calling
channel_write_cell instead of channel_queue_cell. In fact, it's likely
that I'm going to have this confusion about every channel_write_cell/
channel_queue_cell instance. Hmm. I wonder if there's any way we can make
those more distinct/obvious/safe?

Hmm, how would you feel about s/channel_write_cell/channel_send_cell/g and
s/channel_queue_cell/channel_recv_cell/g ?

comment:29 Changed 7 years ago by andrea

Responses to part 3 (points I generally agree with/am fine with going ahead and adjusting without debate):

We'd better audit to make sure there are no more conn->state changes
in connection_or.c

I already did that, more or less, but it couldn't hurt to have a second look.

The BASE_CHAN_TO_TLS(c) and TLS_CHAN_TO_BASE(c) macros are dicey; look
at how connection_t and circuit_t handle these for safety.

Okay.

comment:30 Changed 7 years ago by andrea

Responses to part 3 (points in potential need of further discussion):

Weird thing: connection_or.c in your branch is 2322 lines long. In master,
it's 2290 lines long. I would have expected it to get shorter as code moved
out.

Yeah, but connection_or.c didn't lose that much; the biggest loser was
command.c. Looking at the diff on that file, it looks like the
connection_or_send_destroy(), connection_or_set_circid_type(),
connection_or_mark_bad_for_new_circs(), connection_or_get_for_extend() and
connection_or_is_better() functions went away, but a lot of functions got
a little longer because of additional channel-related logic, and it gained
some glue logic it needed to let things that called it elsehwere pass
through to channels (connection_or_get_num_circuits(),
connection_or_change_state(), connection_or_notify_error(),
connection_or_close_normally(), connection_or_close_for_error(),
connection_or_client_used()), and a laundry list of other functions gained
a few extra lines here and there. The latter outweighed the former a bit.

The new log_debug in connection_mark_for_close makes me wonder: can we be
checking for this case and warning/asserting if it hits? If the new rule is
"never call connection_mark_for_close on an or_conn directly", that's going
to be fragile.

If you do that it'll assert in the channel code because you'll be closing it
when it isn't in the CLOSING state. I added that debug message to track down
where things were doing that.

General stuff: I'm perpetually terrified of breaking the handshake in a way
to allow us to count as authenticated, or to process cells we shouldn't, or
to send data we shouldn't, without actually completing the TLS handshake and
verifying the other party with the Tor handshake. I'm also perpetually afraid
of breaking the v2 or v1 TLS handshakes and not noticing because I only
tested master against master.

Hmm, good point. Got any suggestions on how to test stuff like this?

Where did that big block of code in connection_or_set_state_open go?

It's in channel_do_open_actions() now.

comment:31 Changed 7 years ago by nickm

Replying to andrea:

Responses to part 2 (points in potential need of further discussion):

Hi! On this and on part one above, please assume anything that I don't respond to, I agree with. (If it seems especially tricky, it might deserve a comment.)

In channel_process_cells, I feel like maybe we're going to hit trouble if
there isn't a way to break out in the middle of processing all those cells?
I guess that might be happening in the middle of the processing functions,
where they note if there's an error and don't process anything if an error
has occurred.

The latter; the cell handlers that calls are just the old code path in
command.c, less the handshaking stuff that went to channeltls.c.

ok; could use a comment then.

In channel_send_destroy, I am confused about why we're calling
channel_write_cell instead of channel_queue_cell. In fact, it's likely
that I'm going to have this confusion about every channel_write_cell/
channel_queue_cell instance. Hmm. I wonder if there's any way we can make
those more distinct/obvious/safe?

Hmm, how would you feel about s/channel_write_cell/channel_send_cell/g and
s/channel_queue_cell/channel_recv_cell/g ?

Oh hm. I think of channel_recv_cell as meaning that a cell is coming from the network. Is *that* what it means?

comment:32 in reply to:  26 Changed 7 years ago by nickm

Replying to andrea:

Using a smartlist of cells for an outgoing queue makes me a little nervous.
Popping the first item from this queue is going to be O(N).

Agreed, but you were complaining about me having all that doubly-linked list
logic elsewhere. Either make up your mind, or we need to add some stuff to
src/common/container.c (and while we're at it I wouldn't mind a balanced tree
usable for maps and sets for something with 6816).

As discussed on IRC, let's think about grabbing a nicely licensed instance of sys/queue.h and using it as appropriate for our linked list stuff.

comment:33 in reply to:  31 Changed 7 years ago by andrea

Replying to nickm:

Replying to andrea:

Responses to part 2 (points in potential need of further discussion):

Hi! On this and on part one above, please assume anything that I don't respond to, I agree with. (If it seems especially tricky, it might deserve a comment.)

Okay, will do.

The latter; the cell handlers that calls are just the old code path in
command.c, less the handshaking stuff that went to channeltls.c.

ok; could use a comment then.

Okay.

Hmm, how would you feel about s/channel_write_cell/channel_send_cell/g and
s/channel_queue_cell/channel_recv_cell/g ?

Oh hm. I think of channel_recv_cell as meaning that a cell is coming from the network.
Is *that* what it means?

Yeah. The lower layer calls channel_queue_cell() to deliver an incoming cell.

comment:34 Changed 7 years ago by nickm

Replying to andrea:

General stuff: I'm perpetually terrified of breaking the handshake in a way
to allow us to count as authenticated, or to process cells we shouldn't, or
to send data we shouldn't, without actually completing the TLS handshake and
verifying the other party with the Tor handshake. I'm also perpetually afraid
of breaking the v2 or v1 TLS handshakes and not noticing because I only
tested master against master.

Hmm, good point. Got any suggestions on how to test stuff like this?

Right now, a combination of trying with different Tor versions and Tor versions with some of the handshakes disabled is the only way I know to to make sure the old handshakes work. I don't know a way other than code audit to make completely sure that we can't handle any commands without having first authenticated; whenever it seems unclear, we could add defensive programming to make sure that it's impossible.

We could maybe at some point hack together in the scripting language of anybody's choice an implementation of the various handshakes, and of some of their broken variants, as an attempt to better fuzz and test this stuff.

comment:35 Changed 7 years ago by andrea

Responses to part 4 (points in potential need of further discussion):

The removal of stats_n_vpadding_cells_processed and
stats_n_padding_cells_processed is a little worrisome; those cell types are
still real, generic cell types. Must investigate to make sure they're still
handled right. I think these probably need to go back in command.c

Those ended up in channeltls.c; those are generic enough we should have them in
command.c, you think?

Oh, a subtle point I should have added to the list of stuff I worry about:
I worry about making sure that there can be no data on the TLS connection
before the handshake, and no data on the TLS connection before the VERSIONS
cell. If anything gets sent first, the connection needs to close.

See channel_tls_handle_cell() in channeltls.c; we set handshaking =
(TO_CONN(conn)->state != OR_CONN_STATE_OPEN), which was the same test
used in the old command_process_cell(), and then if we see anything other
than NETINFO or VERSIONS with that true we kill the connection.

comment:36 in reply to:  35 ; Changed 7 years ago by nickm

Replying to andrea:

Responses to part 4 (points in potential need of further discussion):

The removal of stats_n_vpadding_cells_processed and
stats_n_padding_cells_processed is a little worrisome; those cell types are
still real, generic cell types. Must investigate to make sure they're still
handled right. I think these probably need to go back in command.c

Those ended up in channeltls.c; those are generic enough we should have them in
command.c, you think?

I think so. By the specification, padding cells are part of the Tor protocol.

Oh, a subtle point I should have added to the list of stuff I worry about:
I worry about making sure that there can be no data on the TLS connection
before the handshake, and no data on the TLS connection before the VERSIONS
cell. If anything gets sent first, the connection needs to close.

See channel_tls_handle_cell() in channeltls.c; we set handshaking =
(TO_CONN(conn)->state != OR_CONN_STATE_OPEN), which was the same test
used in the old command_process_cell(), and then if we see anything other
than NETINFO or VERSIONS with that true we kill the connection.

ok. There also used to be a DOS issue where you could send a bunch of data to bloat a server's buffers and get it ignored while the server was waiting for a v2 tls handshake. But it's unlikely we reintroduced that and kept the v3 handshake working too.

comment:37 in reply to:  36 Changed 7 years ago by andrea

Replying to nickm:

I think so. By the specification, padding cells are part of the Tor protocol.

Yeah, but so are the auth/version/netinfo cells. To some extent, anything you could possibly do with this [1] other than just wedge it between circuit_t and or_connection_t as an abstraction layer is about experimenting with different protocols on the wire. The combination of channel_t and channel_tls_t/or_connection_t implements the protocol in the specification, and if it doesn't it's a bug - but the whole point of this is to allow experimentation with alternatives, so the choice of how much to genericize and handle above the channel layer and how much to handle below is relevant too and orthogonal to whether the channel_t/channel_tls_t/or_connection_t stack here implements the same protocol the old or_connection_t did.

I opted for as much flexibility about different lower layers as possible, possibly at the expense of making it more work to implement new ones, so the interface
channel_t exposes is pretty simple: it just passes the cells concerned with circuits and relaying things over them, namely CELL_CREATE, CELL_CREATE_FAST, CELL_CREATED, CELL_CREATED_FAST, CELL_RELAY, CELL_RELAY_EARLY or CELL_DESTROY, as you can see from command_process_cell() in command.c. The things above channel_t assume if a channel gets to CHANNEL_STATE_OPEN it's authenticated the remote end and so we can trust that we're really speaking to the OR the channel's identity_digest says we are, but it's up to the or_connection_t/channel_tls_t lower layer to handle the authentication, and any alternate lower layer that gets implemented would have to define its own protocol and handle authentication too.

Anyway, the case of the padding cell types seemed analogous, that I had to make a decision about whether to treat them as generic enough that they would be a feature of any reasonable protocol that could deliver cells to circuits and should be handled above channel_t, or if they were specific to the particular protocol we use now and should go below it. Since they were treated as no-ops on reception except for incrementing a counter, it seemed like the latter.

[1] Except maybe some stuff about experimenting with alternate implementations of the specification protocol like the thing ioerror suggested about privilege separation.

See channel_tls_handle_cell() in channeltls.c; we set handshaking =
(TO_CONN(conn)->state != OR_CONN_STATE_OPEN), which was the same test
used in the old command_process_cell(), and then if we see anything other
than NETINFO or VERSIONS with that true we kill the connection.

ok. There also used to be a DOS issue where you could send a bunch of data to bloat a
server's buffers and get it ignored while the server was waiting for a v2 tls handshake.
But it's unlikely we reintroduced that and kept the v3 handshake working too.

No, I don't think it is. The channel code doesn't even start receiving any cells until after the handshake is done and it goes to CHANNEL_STATE_OPEN. Everything before the handshake is pretty much the same logic it was before, except the parts that used to be in command.c are in channeltls.c now and some of the variables that used to be in the or_connection_t struct are now in channel_t.

comment:38 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:39 Changed 7 years ago by nickm

Component: Tor RelayTor

comment:40 Changed 7 years ago by andrea

Rebased against latest master in bug6465_rebased_3; changes per code review to follow.

comment:41 Changed 7 years ago by andrea

I've identified the cause of the last known assert: when connection_connect() fails, channel_tls_connect() doesn't notice, but the normal 'connection close' callbacks never happen, so the channel gets wedged in the opening state with a pointer to a bad connection forever. The connection gets freed at some point and the assert gets triggered because the thing connection_close() is being called on is not a connection any more.

comment:42 Changed 7 years ago by andrea

The assert on exit was being caused by connection_or_connect() setting tlschan->conn early so some calls it makes work, but then returning NULL on a connection_connect() error later on. When channel_tls_connect() calls it, it (wrongly) assumed that tlschan->conn == NULL iff error, so in some error cases (host unreachable during connection_connect() in the instance I observed), it would leave a channel_tls_t hanging around in CHANNEL_STATE_OPENING with a bad pointer to a freed or_connection_t forever, and then during channel_free_all() would trigger the assert by trying to access that or_connection_t. Fix is straight forward (check return value rather than tlschan->conn in channel_tls_connect()), and implemented in bug6465_draft_2. Changes per code review to follow in that branch.

comment:43 Changed 7 years ago by andrea

I won't be eliminating the prev_with_same_id field because channel_remove_from_digest_map() would have to walk the list to find the previous entry to set its next pointer without the doubly-linked list. I'll try to find other simplifications for that function though.

comment:44 Changed 7 years ago by sjmurdoch

Minor comment on 47adaddf8d966b4d3cbca4155d1f56df5572c5d5 of bug6465_draft_2:

Under LLVM 4.2.1 pretending to be GCC on MacOS X 10.7.5, the build of Tor using the ./configure option --enable-gcc-warnings fails with:

make  all-am
  CC       src/or/channel.o
cc1: warnings being treated as errors
src/or/channel.c: In function ‘channel_flush_some_cells’:
src/or/channel.c:1924: warning: implicit conversion shortens 64-bit value into a 32-bit value
make[1]: *** [src/or/channel.o] Error 1
make: *** [all] Error 2

Presumably this is because ssize_t is 64 bit and int is 32 bit. The following change makes it compile, but is likely not the correct solution:

diff --git a/src/or/channel.c b/src/or/channel.c
index 3e29717..985ee8a 100644
--- a/src/or/channel.c
+++ b/src/or/channel.c
@@ -1921,7 +1921,7 @@ channel_flush_some_cells(channel_t *chan, ssize_t num_cells)
       num_cells_from_circs =
         channel_flush_from_first_active_circuit(chan,
             (unlimited ? MAX_CELLS_TO_GET_FROM_CIRCUITS_FOR_UNLIMITED :
-                         (num_cells - flushed)));
+                         ((int)num_cells - (int)flushed)));
 
       /* If it claims we got some, process the queue again */
       if (num_cells_from_circs > 0) {

comment:45 Changed 7 years ago by andrea

A subset of the code review changes are ready to merge in bug6465_draft_2_squashed; see also forthcoming rebased bug6816 branch.

comment:46 in reply to:  44 Changed 7 years ago by andrea

Replying to sjmurdoch:

Minor comment on 47adaddf8d966b4d3cbca4155d1f56df5572c5d5 of bug6465_draft_2:

Under LLVM 4.2.1 pretending to be GCC on MacOS X 10.7.5, the build of Tor using the ./configure option --enable-gcc-warnings fails with:

make  all-am
  CC       src/or/channel.o
cc1: warnings being treated as errors
src/or/channel.c: In function ‘channel_flush_some_cells’:
src/or/channel.c:1924: warning: implicit conversion shortens 64-bit value into a 32-bit value
make[1]: *** [src/or/channel.o] Error 1
make: *** [all] Error 2

Presumably this is because ssize_t is 64 bit and int is 32 bit. The following change makes it compile, but is likely not the correct solution:

diff --git a/src/or/channel.c b/src/or/channel.c
index 3e29717..985ee8a 100644
--- a/src/or/channel.c
+++ b/src/or/channel.c
@@ -1921,7 +1921,7 @@ channel_flush_some_cells(channel_t *chan, ssize_t num_cells)
       num_cells_from_circs =
         channel_flush_from_first_active_circuit(chan,
             (unlimited ? MAX_CELLS_TO_GET_FROM_CIRCUITS_FOR_UNLIMITED :
-                         (num_cells - flushed)));
+                         ((int)num_cells - (int)flushed)));
 
       /* If it claims we got some, process the queue again */
       if (num_cells_from_circs > 0) {

The right thing is probably to clamp num_cells to MAX_CELLS_TO_GET_FROM_CIRCUITS_FOR_UNLIMITED, I think. Will implement.

comment:47 Changed 7 years ago by andrea

Resolution: implemented
Status: needs_reviewclosed

This is finished and merged now.

Note: See TracTickets for help on using tickets.