Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#2205 closed defect (fixed)

private network of nodes running master doesn't allow client requests to complete

Reported by: Sebastian Owned by:
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: iang@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

According to my bisect, a9172c87beaf94119b0c0dc280267d9c76b957b7 has a bug that will cause application requests by clients to always fail. Reproduced in a private network. I don't have a more thorough analysis at this point, but I believe the relay side is the issue, not the client side.

Child Tickets

Attachments (3)

logs_for_2205.diff (1.3 KB) - added by nickm 9 years ago.
another_idea_v1_2205.diff (564 bytes) - added by cypherpunks 9 years ago.
idea_raw_for_2205.diff (2.2 KB) - added by cypherpunks 9 years ago.

Download all attachments as: .zip

Change History (35)

comment:1 Changed 9 years ago by nickm

This is with or without bufferevents turned on? Also, what versions of libevent and openssl is this tested with? Feel free to wait till #2204 is solved to answer, of course.

comment:2 Changed 9 years ago by nickm

Milestone: Tor: 0.2.3.x-final

comment:3 Changed 9 years ago by Sebastian

This is with bufferevents disabled, openssl either 1.0.0b or 0.9.8l; libevent 2.0.8-rc-dev. Not quite sure which libevent commit, if it matters I can try latest master.

comment:4 Changed 9 years ago by nickm

Priority: normalmajor

If bufferevents are off, the libevent version shouldn't matter too much.

comment:5 Changed 9 years ago by nickm

If http://archives.seul.org/or/dev/Nov-2010/msg00045.html is not _this_ bug, it is has the same symptoms.

comment:6 Changed 9 years ago by nickm

re that email: The identity_digest field is *supposed* to get set by connection_or_set_identity_digest(), which is called from connection_or_init_conn_from_address and connection_or_check_valid_tls_handshake (on outgoing connections only, so it's not there). connection_or_init_conn_from_digest is called from connection_or_connect [outgoing only again] and connection_tls_finish_handshake(). That last one -- connection_tls_finish_handshake -- is where the identity digest is supposed to be set on incoming connections.

I'm not seeing the problem in the code here yet; maybe adding a debugging log message to connection_tls_finish_handshake would help track down the issue.

comment:7 Changed 9 years ago by Sebastian

Just as a note: The connection sometimes succeeding could mean that my bisect is way off, I bisected by starting a private network on that commit. Looks like we should consider more commits than the one posted above.

comment:8 Changed 9 years ago by nickm

Maybe incoming connections get no identity_digest set, but if we connect to the other router first, we will set identity_digest, and the connection will work?

I guess the right bisection thing to try would be to run a number of tests on each version. Or we could test whether a9172c87be is really the culprit by picking the first commit before it and verifying that that one doesn't fail at all, but a9172c87 does sometimes fail.

comment:9 in reply to:  6 Changed 9 years ago by rransom

Replying to nickm:

I'm not seeing the problem in the code here yet; maybe adding a debugging log message to connection_tls_finish_handshake would help track down the issue.

I think logging the value of conn->tls->server_handshake_count early in connection_or_handle_event_cb would be more helpful.

comment:10 Changed 9 years ago by nickm

That function is only invoked when using bufferevents; this bug occurs when not using bufferevents.

comment:11 Changed 9 years ago by cypherpunks

_Version №1_

          /* improved handshake, but not a client. */
          tor_tls_set_renegotiate_callback(conn->tls,
                                           connection_or_tls_renegotiated_cb,
                                           conn);
          conn->_base.state = OR_CONN_STATE_TLS_SERVER_RENEGOTIATING;
          connection_stop_writing(TO_CONN(conn));
          connection_start_reading(TO_CONN(conn));

Wait for reneg.

happened: connection_handle_read().
going: connection_read_to_buf(),read_to_buf_tls():
(lets assume handshake state already was SSL3_ST_SW_SRVR_HELLO_A, so tor_tls_server_info_callback() set tls->got_renegotiate = 1. and now state is SSL3_ST_SR_CERT_A)

Handshake after reneg still is incomplete.

connection_process_inbuf(conn, 1), connection_or_process_inbuf(TO_OR_CONN(conn)):

    case OR_CONN_STATE_TLS_SERVER_RENEGOTIATING:
      if (tor_tls_server_got_renegotiate(conn->tls))
        connection_or_tls_renegotiated_cb(conn->tls, conn);

connection_tls_finish_handshake(conn):

    if (!started_here) {
      connection_or_init_conn_from_address(conn, &conn->_base.addr,
                                           conn->_base.port, digest_rcvd, 0);

connection_or_set_identity_digest(conn, id_digest),
No client-side cert, zeroes inside id_digest.

comment:12 Changed 9 years ago by iang

Cc: iang@… added

comment:13 Changed 9 years ago by nickm

So, if I understand right, the problem is you think we're calling the renegotate_callback too early: when we're in state SSl3_ST_SR_CERT_A, but we haven't actually read any client certificates yet. Then connection_or_check_valid_tls_handshake will say that we got a valid handshake from the client and the client didn't authenticate (when really it hasn't authenticated _yet_). And it will leave digest_rcvd set to 0.

That certainly seems possible!

comment:14 Changed 9 years ago by nickm

(please let me know if I didn't understand.)

comment:15 Changed 9 years ago by nickm

(By the way, I am going to be very slow answering stuff until Monday because of the US holiday. If somebody else can jump in and come up with a correct patch faster, that would be excellent. Remember to test both with and without bufferevents on, of course. :/ )

Changed 9 years ago by nickm

Attachment: logs_for_2205.diff added

comment:16 Changed 9 years ago by nickm

I'm uploading a quick and dirty patch to add a couple of log messages that, if I understand right, should confirm whether this is the problem.

comment:17 Changed 9 years ago by cypherpunks

Wrong patch.

-      if (tls->negotiated_callback)
+      if (tls->negotiated_callback) {
+        log_notice(LD_HANDSHAKE, "Calling renegotiated_callback while in "
+                   "state %d [%s]", tls->ssl->state,
+                   ssl_state_to_string(tls->ssl->state));
         tls->negotiated_callback(tls, tls->callback_arg);
+      }

tor_tls_read()?

  r = SSL_read(tls->ssl, cp, (int)len);
  if (r > 0) {

Of course, such log_notice happen after success handshake only.

But!

Problem is connection_or_tls_renegotiated_cb() while it happen inside connection_or_process_inbuf(), for any non fatal result of SSL_read() while it does handshake.

comment:18 Changed 9 years ago by cypherpunks

I'm uploading a raw _idea_ for patch to fix the bug.

comment:19 Changed 9 years ago by cypherpunks

I wanted a consistent logic for tor's code while it uses bufferevents and w/o it. There are still a some unpredicted edges for non-bufferevent case in the logic with idea_raw_for_2205.

I'm uploading another an orthogonal idea.

Changed 9 years ago by cypherpunks

Attachment: another_idea_v1_2205.diff added

Changed 9 years ago by cypherpunks

Attachment: idea_raw_for_2205.diff added

comment:20 Changed 9 years ago by nickm

So here's a thought. Since our problem is that we're detecting the renegotiation as having happened before it is really done, what if we change the rule from "ST_SW_SRVR_HELLO_A means renegotiating!" to a new rule: "ST_SW_SRVR_HELLO_A means that the next ST_OK means that renegotiation is done" ? This way we keep all the complexity inside the tortls code.

I've uploaded something that might work to my public repo in branch "bug2205_idea"; what do you think? It's totally untested, so it might fail for reasons related or unrelated to the basic idea.

comment:21 Changed 9 years ago by nickm

Status: newneeds_review

comment:22 Changed 9 years ago by Sebastian

I just tried nickm's idea, but with that patch applied the client can't even fetch a consensus anymore (this works on master without the patch).

comment:23 Changed 9 years ago by Sebastian

After removing the log_info call in https://trac.torproject.org/projects/tor/raw-attachment/ticket/2205/idea_raw_for_2205.diff (to make it compile) this patch makes the network work again.

comment:24 Changed 9 years ago by rransom

nickm on IRC:

Okay, here is what I think is up, as translated from boboper via conversation and code

When running in non-bufferevent mode, we'd repeatedly call tor_tls_read when we saw bytes on the buffer. That's cool, but it means that we would call connection_process_inbuf whenever we did, even if we didn't actually see any bytes

so we would wind in connection_or_process_inbuf while we were still renegotiating, and we would call connection_or_tls_renegotiated_cb too early, which would clear the renegotiate_cb status of the tls, which would make the other code that was supposed to call the renegotiate_cb from inside tortls never happen.

boboper's fix is to:

  • remove the renegotiation code from tortls.c, and have all configurations of Tor use the version in connection_or_process_inbuf
  • revise the code in connection.c that calls connection_process_inbuf for non-bufferevent SSL connections so that it only does so when data has arrived

I think this approach is correct.

It means, I think, that there is more code we can rip out as unneeded in 0.2.3

comment:25 in reply to:  20 ; Changed 9 years ago by cypherpunks

Replying to nickm:

branch "bug2205_idea"; what do you think?

it's working too at least as idea, overcomplex but still valid. wonder why it's non working in practice, mystery.

comment:26 in reply to:  25 ; Changed 9 years ago by cypherpunks

Replying to cypherpunks:

Replying to nickm:

branch "bug2205_idea"; what do you think?

it's working too at least as idea, overcomplex but still valid. wonder why it's non working in practice, mystery.

Openssl calling a callback with another type for SSL_ST_OK:

		case SSL_ST_OK:
			/* clean a few things up */
			ssl3_cleanup_key_block(s);

..[skiped while pasted]..			

			ret = 1;
			goto end;

end:
	/* BIO_flush(s->wbio); */

	s->in_handshake--;
	if (cb != NULL)
		cb(s,SSL_CB_ACCEPT_EXIT,ret);

Tor waiting for SSL_CB_ACCEPT_LOOP, not SSL_CB_ACCEPT_EXIT.

comment:27 in reply to:  26 Changed 9 years ago by cypherpunks

Replying to cypherpunks:

Openssl calling a callback with another type for SSL_ST_OK:

Skiped to much, skiped another cb too:

		case SSL_ST_OK:
			/* clean a few things up */
			ssl3_cleanup_key_block(s);

			BUF_MEM_free(s->init_buf);
			s->init_buf=NULL;

			/* remove buffering on output */
			ssl_free_wbio_buffer(s);

			s->init_num=0;

			if (s->new_session == 2) /* skipped if we just sent a HelloRequest */
				{
				/* actually not necessarily a 'new' session unless
				 * SSL_OP_NO_SESSION_RESUMPTION_ON_RENEGOTIATION is set */
				
				s->new_session=0;
				
				ssl_update_cache(s,SSL_SESS_CACHE_SERVER);
				
				s->ctx->stats.sess_accept_good++;
				/* s->server=1; */
				s->handshake_func=ssl3_accept;

				if (cb != NULL) cb(s,SSL_CB_HANDSHAKE_DONE,1);
				}
			
			ret = 1;
			goto end;
			/* break; */

SSL_CB_HANDSHAKE_DONE more usefull for fixing of "bug2205_idea".

comment:28 Changed 9 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Sebastian has confirmed that the nicely minimal "another_idea_v1_2205.diff" works for him. I think for now we should just apply that one, though the current code path is inelegant. We should look into a combination of bug2205_idea+cypherpunks's suggestion above and idea_raw branch in order to make the code better and more robust, but for now, the simpler fix will do. I'm applying it, closing a new bug, and opening a new (less severe) bug to mention the cleanup opportunities.

comment:29 Changed 9 years ago by nickm

New ticket opened as #2231.

comment:30 Changed 9 years ago by iang

Just in case, I confirm it works for me as well.

comment:31 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:32 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.