Opened 11 years ago

Closed 10 years ago

Last modified 8 years ago

#1346 closed defect (fixed)

Reachability Testing passes, yet Relay is marked as down by authorities

Reported by: Sebastian Owned by:
Priority: Low Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.2.10-alpha
Severity: Keywords: tor-client
Cc: Sebastian, nickm, BarkerJr, data, arma, RazorsEdge Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by arma)

We've had reports from many operators of historically stable and well-running
relays that their relays have lately been falling out of the consensus. They do
test reachability successfully for both dirport and orport, and publish a
descriptor to the authorities, which they acknowledge as valid.

In debugging and testing this more, I looked at
BarkerJrParis B1BFFE96D67CC1BD7EF6A9D4AC618AF681012A3E 92.243.8.139:21
Trying to use this relay as a bridge fails, as circuit building times out. Using
openssl s_client -connect 92.243.8.139:21 gives a valid-looking cert, though;
and the connecting Tor doesn't provide any debug log messages that
would have helped me to track down the cause.

BarkerJR mentioned that this occurred after restarting Tor. A few weeks earlier,
he had updated openssl, but not restarted Tor since then. I have no data
from the other relay operators on openssl versions, platform, tor version
etc that they're using, but will try to get them to update the bug report here.

My current theory is that something prevents a circuit to be established
with the relay, after the initial connection is made, and that this
connection and sending of certs is enough to make reachabilitytesting
pass, thus confusing the relay into believing that it is in fact reachable.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (14)

comment:1 Changed 11 years ago by BarkerJr

I rebooted another of my relays and it did not come back up either. Both relays run CentOS 5.4 Linux. I have downgraded BarkerJrParis to a fresh compile of 0.2.1.25 with --enable-openbsd-malloc --disable-asciidoc, but that didn't help, so this seems to impact the stable branch as well.

The relay I rebooted today is:
BarkerJrCoast2 A8A63DE7C4875FA96DD5A2FF9703E427F67393A9 74.207.247.39:110

comment:2 Changed 11 years ago by Sebastian

Scott Bennett advises that it might be openssl 1.0.0 that produces this problem.
See this thread http://archives.seul.org/or/relays/Apr-2010/msg00009.html

comment:3 Changed 11 years ago by data

Ok, you might be on to something here. I updated to openssl-0.9.8n on March 30th, after which the restart brought me the described behaviour.
I will post more information in 18-24 hours, when I have some more time.

comment:4 Changed 11 years ago by BarkerJr

I ran "yum downgrade openssl openssl-devel" and my relay is working fine now.

comment:5 Changed 11 years ago by RazorsEdge

Running CentOS 5.4 on i686 and tor 0.2.1.19-3.el5 from the EPEL repository.

Updated several packages and rebooted.
Mar 28 18:47:38 Installed: kernel-2.6.18-164.15.1.el5.i686
Mar 28 18:47:46 Updated: kernel-headers-2.6.18-164.15.1.el5.i386
Mar 28 18:47:53 Updated: openssl-0.9.8e-12.el5_4.6.i686
Mar 28 18:47:54 Updated: cyrus-sasl-lib-2.1.22-5.el5_4.3.i386
Mar 28 18:48:02 Updated: nspr-4.8.4-1.el5_4.i386
Mar 28 18:48:07 Updated: httpd-2.2.3-31.el5.centos.4.i386
Mar 28 18:48:09 Updated: nss-3.12.6-1.el5.centos.i386
Mar 28 18:48:10 Updated: cyrus-sasl-gssapi-2.1.22-5.el5_4.3.i386
Mar 28 18:48:10 Updated: cyrus-sasl-md5-2.1.22-5.el5_4.3.i386
Mar 28 18:48:12 Updated: cyrus-sasl-2.1.22-5.el5_4.3.i386
Mar 28 18:48:13 Updated: cyrus-sasl-plain-2.1.22-5.el5_4.3.i386
Mar 28 18:48:14 Updated: gnutls-1.4.1-3.el5_4.8.i386
Mar 28 18:48:15 Updated: 1:mod_ssl-2.2.3-31.el5.centos.4.i386

Things appeared to function fine. A few days later I noticed that my bandwidth utilization had dropped to nothing. Normal tor connections had been ~350+ and were now ~12. I had been running a relay fine for years and now my router nickname is no longer in consensus docsuments.

RazorsEdge 205ED2C309999F0F18767A1ECCD384B580070BA9

Upgraded to tor-0.2.1.25-tor.0.rh5_4.i386 and this also shows the same problem symptoms (ie http://archives.seul.org/or/relays/Apr-2010/msg00009.html)

My wild guess is the RHEL/CentOS openssl patch:
http://archives.seul.org/tor/relays/Apr-2010/msg00022.html
http://rhn.redhat.com/errata/RHSA-2010-0162.html

Maybe rebuilding tor-0.2.1.25-tor.0.rh5_4.i386 RPM against the new openssl eratta may fix things?

comment:6 Changed 11 years ago by Sebastian

Looks like the plot thickens. When I build my tor against openssl 0.9.8n,
it is able to bootstrap off of Bryon's node without any problems. When I
build against 0.9.8l, it doesn't work. A part of the openssl changelog for
0.9.8m might help to explain the problem:

If client attempts to renegotiate and doesn't support RI respond with
a no_renegotiation alert as required by RFC5746. Some renegotiating
TLS clients will continue a connection gracefully when they receive
the alert. Unfortunately OpenSSL mishandled this alert and would hang
waiting for a server hello which it will never receive. Now we treat a
received no_renegotiation alert as a fatal error. This is because
applications requesting a renegotiation might well expect it to succeed
and would have no code in place to handle the server denying it so the
only safe thing to do is to terminate the connection.

comment:7 Changed 11 years ago by nickm

That would explain why the client is hanging instead of connecting, but it's not explaining why the server refuses
to renegotiate. It's supposed to accept it anyway if SSL_OP_ALLOW_UNSAFE_LEGACY_RENEGOTIATION is present.

If I read IRC backlogs right, Byron's node is using one of two different patched versions of openssl-0.9.8e. But
if the later patched one includes the RFC5746 support, that will cause a bug when we use the version number to
detect which method of re-enabling negotiation to use.

Our code says "No, we can't just set option 0x00040000L everywhere: before 0.9.8m, it meant something else."
But looking through the code, I can't find any version in our supported range (0.9.7 and on) that actually uses that
value. Instead, they all seem to have a gap there. I'll research more thoroughly, and have a look at 1.0.0 betas
as well.

comment:8 Changed 11 years ago by Sebastian

Looks like your fix in bug1346.use_ssl_option works! Let's
see if BarkerJRParis makes it back into the consensus

comment:9 Changed 11 years ago by BarkerJr

Looks good. My bandwidth use is climbing.

comment:10 Changed 11 years ago by arma

Great. I'm going to call this one solved then. Thanks!

comment:11 Changed 10 years ago by nickm

Milestone: Tor: 0.2.2.x-final

comment:12 Changed 10 years ago by arma

Description: modified (diff)
Resolution: Nonefixed
Status: newclosed

We still find people running Centos and Tor 0.2.1.19, but at this point it's pretty rare. I'm going to close this.

comment:13 Changed 8 years ago by nickm

Keywords: tor-client added

comment:14 Changed 8 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.