Opened 5 years ago

Closed 5 years ago

#8443 closed defect (fixed)

SSL handshake filtered when MAX_SSL_KEY_LIFETIME_ADVERTISED is 365 days

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-bridge
Cc: cda, phw Actual Points:
Parent ID: #3972 Points:
Reviewer: Sponsor:

Description

I spent some time this afternoon with cda, doing Tor handshakes from inside Iran. The handshake completed, but then the TCP connection got cut, when the SSL cert had a lifetime of 365 days.

When I changed the 365 to 65 in or.h, on the bridge, the TCP connection survived.

(But that wasn't sufficient, since for some reason the directory request wasn't getting through, or the response wasn't getting through.)

In any case, we should take steps to randomize our SSL link cert lifetime.

This is the follow-on ticket to #4014 (which we knew we'd need to do one day, and this is the day).

Child Tickets

Change History (20)

comment:1 Changed 5 years ago by arma

(I picked 0.2.4 as the target milestone, rather than 0.2.3, since this fix doesn't fully solve the problem. Once we have sufficient fixes, we can re-assess.)

comment:2 Changed 5 years ago by arma

Incidentally, somebody with new enough crypto libs on both sides should check to see how our new ECC link handshake (added in 0.2.4.8-alpha) fares here. I believe I don't have new enough crypto on my side, and I don't know whether cda does.

comment:3 Changed 5 years ago by arma

Status: newneeds_review

See my bug8443 branch.

comment:4 Changed 5 years ago by arma

Hmmmmm.

Seems to be working great (running it on my bridge now).

But I also noticed that my ssl cert was issued on today. That's another low-hanging fruit that's probably worth dealing with.

comment:5 Changed 5 years ago by nickm

Status: needs_reviewneeds_revision

I did a little spot-checking to see whether it was more usual for certificate to have notbefore/notafter times at more or less random intervals, or to have notbefore/notafter times of an exact duration, or to have them more or less at random.

I checked 4 or 5 well-known websites and found that their certificates in the wild are all over the map. More research could be warranted.

Here's the script I used:

echo |openssl s_client -connect $HOST:443 | perl -ne 'if (/^-----BEGIN/) { $p=1}; print if $p; if (/^-----END/) {$p=0}' |openssl asn1parse |grep UTCTIME

(First, set HOST to the host you want to connect to.

This prints stuff like:

  232:d=3  hl=2 l=  13 prim: UTCTIME           :121017000000Z
  247:d=3  hl=2 l=  13 prim: UTCTIME           :131018235959Z

where the notBefore time is 121017000000Z (that is, 2012-10-17 00:00:00 GMT) and the notAfter time is 131018235959Z (that is, 2013-10-18 23:59:59 GMT).

Those are the intervals I got for amazon. I found other stuff too. We could stand to do a little more spot checking before we settle on 'random' IMO. Nearly nobody has a 1-day lifetime AFAICT.

Is it guaranteed that we'll get a new link certificate at least daily? If not, the "one day" minimum lifetime is too short.

The patch looks okay otherwise, but it needs a patch to tor.1.txt to accompany it.

comment:6 Changed 5 years ago by cda

Cleared the torrc changes from this afternoon and confirmed Tor v0.2.2.35 would not progress past

Mar 10 04:11:23.195 [notice] Bootstrapped 10%: Finishing handshake with directory server.

Added your bridge and would stall at:

Mar 10 04:15:15.180 [notice] Bootstrapped 50%: Loading relay descriptors.
Mar 10 04:15:22.186 [notice] new bridge descriptor 'bridge2' (fresh)
Mar 10 04:15:22.186 [notice] I learned some more directory information, but not enough to build a circuit: We have no network-status consensus.
Mar 10 04:16:36.103 [notice] I learned some more directory information, but not enough to build a circuit: We have no network-status consensus.
Mar 10 04:16:36.698 [notice] I learned some more directory information, but not enough to build a circuit: We have only 0/3169 usable descriptors.

Updated to v0.2.4.10-alpha, was able to open a circuit.
Removed bridge, cleared /var/lib/tor; could not progress past 10%.

comment:7 Changed 5 years ago by phw

I downloaded the EFF's SSL observatory data and calculated the certificate life times. Here are the top 20 in ascending order:

5159   1825 days, 0:00:00
5895   790 days,  23:59:59
6552   761 days,  0:00:00
7199   366 days,  23:59:59
7569   1461 days, 23:59:59
8503   760 days,  23:59:59
9101   369 days,  23:59:59
10190  1099 days, 23:59:59
10472  425 days,  23:59:59
14865  395 days,  23:59:59
15284  1826 days, 23:59:59
19428  731 days,  0:00:00
22130  1095 days, 0:00:00
51588  1096 days, 0:00:00
65542  730 days,  0:00:00
79855  1095 days, 23:59:59
85521  730 days,  23:59:59
85526  1826 days, 0:00:00
94504  365 days,  0:00:00
157614 365 days,  23:59:59

One year seems to be the most popular life time. Simply dropping such certificates would imply a large collateral damage, so there is probably something else we are missing so far.

comment:8 Changed 5 years ago by phw

Some more statistics:

Out of all 1,533,359 certificates, 497,650 (~32%) have a life time which does not end in 0:00:00
or 23:59:59. A couple thousand are close to these values, but most of the 32% are all over the place. These could be called "random life times".

The above is just an unbiased view on all certificates. We should also consider well-known and important web sites to Iran, as nickm did above.

comment:9 Changed 5 years ago by nickm

phw -- while you're at it, what's the distribution on time of day at which certs start and end? I bet that a large number start or end at the start or end of a day, and a large number start or end at the start or end of an hour.

comment:10 Changed 5 years ago by arma

I've been thinking something like

@@ -632,7 +633,7 @@ tor_tls_create_certificate(crypto_pk_t *rsa,
 
   tor_tls_init();
 
-  start_time = time(NULL);
+  start_time = time(NULL) - crypto_rand_int(cert_lifetime);
 
   tor_assert(rsa);
   tor_assert(cname);
@@ -667,7 +668,7 @@ tor_tls_create_certificate(crypto_pk_t *rsa,
 
   if (!X509_time_adj(X509_get_notBefore(x509),0,&start_time))
     goto error;
-  end_time = start_time + cert_lifetime;
+  end_time = time(NULL) + cert_lifetime;
   if (!X509_time_adj(X509_get_notAfter(x509),0,&end_time))
     goto error;
   if (!X509_set_pubkey(x509, pkey))

would be wise, and sufficient to get rid of my "gosh, your cert was born within the past 2 hours" worry. It's sort of a hack though -- it makes your cert valid for 1 to 365 days in the
future, and 0 to that-previous-number days in the past.

comment:11 Changed 5 years ago by nickm

I think that would combine nicely with your previous patch: determine the lifetime, *then* determine how far into the lifetime we are. But see also phw and my comments above

comment:12 in reply to:  9 Changed 5 years ago by phw

Replying to nickm:

phw -- while you're at it, what's the distribution on time of day at which certs start and end? I bet that a large number start or end at the start or end of a day, and a large number start or end at the start or end of an hour.

When ignoring year/month/day, 42% of all certificates start at 00:00:00 *and* end at 23:59:59.

42% of all certificates end at x:59:59. There's only a negligible amount of end times other than x:59:59. Pretty much the same applies to the start times - just that the time is x:00:00, of course.

So it looks like your first guess is true. Starting and ending around midnight is very popular. Your second guess does not seem to be true, though. The amount of certificates starting or ending around the start/end of an hour (+/- 1 second) other than midnight is < 0.6%.

comment:13 Changed 5 years ago by arma

Status: needs_revisionneeds_review

Ok, I added a man page entry, made us start part-way through the lifetime period, clipped defaults to start and end on day boundaries, and flipped a coin to decide if we end at 23:59:59 or at midnight.

New commits on my bug8443 branch.

comment:14 Changed 5 years ago by arma

Parent ID: #3972

comment:15 Changed 5 years ago by arma

See also #4583. I believe this ticket obsoletes that one by now.

comment:16 Changed 5 years ago by arma

(It seems we're screwed either way here, if the new firewall strategy is to look for a collection of properties. By sticking to the day boundary we're blending in better but still reducing our entropy. By *not* sticking to the day boundary we blend in worse, but at first glance we're harder to fingerprint. The trouble is that the new fingerprint should be "X, Y, and also doesn't use a day boundary". This is a good example of why playing the "look like ssl" arms race is unwinnable.)

comment:17 Changed 5 years ago by nickm

+  start_time -= (start_time % 24*3600);

In addition to the order of operations issue here, are we sure it actually produces the result we expect? Did you look at these certificates and make sure they come out right? I guess they must, since leap-seconds aren't included in time_t if I understand correctly.

Other than that, it looks okay to me. If it test out okay, I say we at least merge into 0.2.4.

comment:18 Changed 5 years ago by arma

I'm running it on https://128.31.0.34:9010 and on moria1 right now.

Looks like it is working as expected.

comment:19 Changed 5 years ago by nickm

Then feel free to squash/merge as appropriate.

comment:20 Changed 5 years ago by arma

Resolution: fixed
Status: needs_reviewclosed

Merged. Thanks!

Note: See TracTickets for help on using tickets.