wiki:doc/HTTPSEverywhere/SSLObservatorySubmission

Version 58 (modified by pde, 7 years ago) (diff)

--

Design

This is a privacy preserving mechanism for checking HTTPS certificates against the EFF SSL Observatory to see if they might be malicious or known to have been compromised. The Observatory also uses this as a collection mechanism to detect certificates that are not visible on public IPs from our data center (we expect that most man-in-the-middle attacks are not publicly visible).

The feature can run from Firefox extensions like HTTPS Everywhere or Torbutton. This is an opt-in feature, and we suggest you run it over Tor. You can run it without Tor if you would like.

When a participating client sees a new HTTPS TLS certificate, it compares its fingerprint against a local list of the top T most popular TLS fingerprints. If it does not find the fingerprint in this list, it submits the entire certificate chain to the EFF SSL Observatory using Tor or other available proxies.

To prevent submission of private infrastructure certificates, the client also maintains a list of fingerprints of the superset of root CAs trusted by all versions of Firefox, as well as popular 3rd party CAs such as CACert. If a certificate chain is rooted in a CA not in this set, it is assumed to be private, and the certificates it signs are not submitted. Additionally, if the browser detects that the domain in question had resolved to an RFC1918 IP, the certificate would not be submitted by default. (XXX: Should this resolution be done over Tor, or should it always be done using the user's current local proxy settings/local resolver? The latter leaves them open to an MITM attack by someone who owns their router, but the former leaks potentially private DNS names to tor exits).

The certificate is POSTed to https://observatory.eff.org/submit_cert. The EFF also runs a Tor Exit Enclave on this host, which prevents certain circuit activity correlation attacks against Tor.

Client UI and configuration Variables

This version of HTTPS Everywhere can use EFF's SSL Observatory to try to detect suspicious HTTPS certificates for sites you visit. It uses Tor to do this anonymously, but the SSL Observatory may keep and publish these certificates for analysis and security research.

[x] Check certificates anonymously
[ ] Check certificates even if Tor is not available

Advanced options >>

If you use corporate networks that have secret HTTPS servers or HTTPS-inspecting antivirus software, you should keep at least the first two of the following options enabled. Otherwise, you can turn them off.

[x] Do not check/submit certificates for private DNS domains
[x] Do not check/submit certificates with non-standard root CAs
[ ] Do not check/submit certificates that are self-signed

Submission API

submit_cert

POST /submit_cert

Arguments

domain:

The value of the host piece of the url. If there is a port specified, it is present after a ':'

server_ip:

The Server IP. May be -1 if unknown, may also be inaccurate due to API limitations.

certlist:

A JSON-encoded Array of the base64 representation of each certificate in the chain.

client_asn:

The Autonmous System number of the client (or their Tor exit, if the cert was used through Tor). May be -1 if unknown.

private_opt_in:

Whether the client opts in to submitting private certs (either a "1" or "0").

padding:

An arbitrary amount of random data, used to pad the POST to a total of 8192*2n bytes.

Response

Normally, the status code is 200, with the body being "1" or "0".

The return value is 1 if (cert_sha256 not in certs) and (private_opt_in or visible_in_DNS(domain))

If the fplist contains a certificate that the observatory knows to be dangerous (eg, revoked or using a broken key), the status code is 403 and a textual explanation is provided. The first portion of this is a "." delimented, client-parseable list for internationalisation purposes, with values like "Revoked.", "Broken key.", "Malicious CA.", or "Revoked.Malicious CA."

If there is an error, an appropriate response code is set and the error message is included in the body.

Server Side Design

Input SQL schema

CREATE TABLE certs (
  fp binary(36) NOT NULL, -- sha1(cert) + md5(cert); because sha256 is not easily available in the client
  raw_cert blob NOT NULL,
  known_bad varchar(255) NOT NULL, 
  PRIMARY KEY (fp)
) ENGINE=MyISAM DEFAULT CHARSET=UTF8;

CREATE TABLE reports (
  id int(11) NOT NULL AUTO_INCREMENT,
  fp binary(36) NOT NULL,  -- sha1(cert) + md5(cert)
  server_ip varchar(39) NOT NULL,
  domain varchar(255) NOT NULL,
  client_asn int(11) NOT NULL DEFAULT '-1',
  PRIMARY KEY (id),
  KEY (fp),
  timestamp datetime
) ENGINE=MyISAM DEFAULT CHARSET=UTF8;

DB Integration With Observatory

The Observatory project's non-distributed system uses a schema that is partly dynamically generated. This is more suitable to a system which has discrete scans that complete infrequently, and will need to be adjusted to handle the higher frequency of updates the distributed observatory has, as well as to be able to include information about the source of the certificate chains. During a data import operation, certificate chains are processed, the SSL messages are interpreted, certificates are extracted, and their contents are added to "certs" tables. More than one of these certs tables exists, allowing the process to be run over a period of time and in parallel. These certs tables are created by hack_parse.py, use BLOBS or TEXT for most data types (things like subject, issuer, and extensions). Validation of certificate's and their chains happens in multiple passes, and is mostly coordinated by stich_tables.py, which also creates more usable tables like valid_certs. This "finishing process" involves a few other scripts, and results in time stamps, extraction of names, creating other summary tables like roots, seen, all_certs and adding of indexes. The process leaves the certs tables which are still sometimes needed.

Key weaknesses impacting integration:

  • Determining certificate chain order
  • Dealing with data from different collection times

Currently the only reliable way to determine certificate chains is to group the certificates by the path from which they are loaded. This can be done in the individual certs tables, or on the seen table but not in tables that unique certificates like valid_certs or all_certs. Path is just the name of the results file which was the transcript of the SSL connection. The "id" of the entries in the certs table are sequential with respect to any given path, and the order of the ids allows you to determine the order of the certificates as presented in the chain. This makes working with certificate chains and deterimining their position slow and burdensome, as well as unlikely to work with chains that lack a path such as those from the distributed observatory.

The "seen" table has an entry for each certificate, each time it is seen in a chain. Its fields are currently only IP, fingerprint, fetchtime, path and valid. This table has a lot of rows (12.67M in the December 2010 dataset). To determine what certificate the seen table is referring to a join on fingerprint is used. This isn't enough information to distinguish between submissions to a distributed observatory and scans from a central one, or handle ideas like source ASN.

Improving the schema

To make certificate chains easier find, and the certificate order easier to ascertain as well as to support a wider description of sources for data we need to make chains a clearer concept. The seen table could be augmented to do this and to include the optional data from distributed observatory submissions.

Proposed additions to "seen"

  • domain - varchar(255)

Optional domain, available in distributed observatory

  • client_asn int(11)

Optional ASN if known from distributed observatory. Might include this for centralized scans too as it seems useful.

  • ChainID - int(11) (type used for other numeric keys like certid)

A value unique to each certificate chain, and shared by all the seen entries for certs in that chain

  • CertPosition - tinyint

A small number, 0 for the leaf, and monotonically increasing down the cert chain. The combination of ChainID and CertPosition are unique.

  • Source

Not sure how to represent this, but an indicator where this cert came from (distributed obs? which one? central scan?)

Proposed change to "seen"

  • The "fetchtime" timestamp could be renamed to "collectiontime". It would remain as the time at which a cert chain was received.

When new submissions are available from the distributed observatory, an import script similar to hack_parse could process them into a certs table, and then re-run the same finishing process used to create valid_certs, validation information and other useful tables. The loading process could be altered to either:

  • Store DER encodings of each cert in the DB
  • Store text representations of OpenSSLs interpretation of each cert in the DB

Storing a parsed representation would make it a little simpler to do higher risk certificate parsing operations on a separate machine from some of the other finishing tasks.

Related projects