I'm running a recent tor master as an authority in a tor testing network:
[notice] Tor 0.3.1.0-alpha-dev (git-0266c4ac819d9c83) running on Darwin with Libevent 2.1.8-stable, OpenSSL 1.0.2k, Zlib 1.2.11, Liblzma N/A, and Libzstd N/A.
I get this warning every so often:
[warn] Received a bad CERTS cell: Link certificate does not match TLS certificate
Is this expected?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Confirmed, I get that warning on moria1. Here's some more context:
May 18 13:33:47.231 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 171.25.193.9:80; Sending cells: CERTSMay 18 13:33:47.231 [warn] Received a bad CERTS cell: Link certificate does not match TLS certificateMay 18 13:33:47.232 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 171.25.193.9:80: Invalid certificate chain!May 18 13:33:47.232 [info] conn_close_if_marked(): Conn (addr "171.25.193.9", fd 6541, type OR, state 7) marked, but wants to flush 1322 bytes. (Marked at src/or/connection_or.c:1319)May 18 13:33:47.232 [info] conn_close_if_marked(): We stalled too much while trying to write 1322 bytes to address "171.25.193.9". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 6541, type OR, state 7, marked at src/or/connection_or.c:1319).
Note that it's warn because I was making a connection to maatuska (an authority).
But now that you mention it, I have thousands of those lines at info-level log.
Here is one case, which looks legitimate and normal (I assume the person changed their key but kept the same IP:port):
May 31 09:31:46.987 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 203.59.211.197:9001; Sending cells: CERTSMay 31 09:31:46.987 [info] connection_or_client_learned_peer_id(): learned peer id for 0x7fdacfc47df0 (203.59.211.197): 0D3A46F079C9A8FF1F8506E6EFE4859B093086C6, <null>May 31 09:31:46.987 [info] connection_or_client_learned_peer_id(): Tried connecting to router at 203.59.211.197:9001, but RSA identity key was not as expected: wanted 31DAC214419D790CA6E200FC7937C9F836D67B24 + no ed25519 key but got 0D3A46F079C9A8FF1F8506E6EFE4859B093086C6 + no ed25519 key.May 31 09:31:46.987 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 203.59.211.197:9001: Problem setting or checking peer idMay 31 09:31:46.987 [info] conn_close_if_marked(): Conn (addr "203.59.211.197", fd 3903, type OR, state 7) marked, but wants to flush 1327 bytes. (Marked at src/or/connection_or.c:1319)May 31 09:31:46.987 [info] conn_close_if_marked(): We stalled too much while trying to write 1327 bytes to address "203.59.211.197". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 3903, type OR, state 7, marked at src/or/connection_or.c:1319).
May 31 09:40:36.170 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 23.252.50.179:9001; Sending cells: CERTSMay 31 09:40:36.170 [info] Certificate already expired. Either their clock is set wrong, or your clock is wrong.May 31 09:40:36.170 [info] (certificate lifetime runs from Sep 29 00:00:00 2015 GMT through Feb 23 00:00:00 2016 GMT. Your time is May 31 13:40:36 2017 UTC.)May 31 09:40:36.170 [info] or_handshake_certs_rsa_ok(): Received a bad CERTS cell: The link certificate was not validMay 31 09:40:36.170 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 23.252.50.179:9001: Invalid certificate chain!May 31 09:40:36.170 [info] conn_close_if_marked(): Conn (addr "23.252.50.179", fd 1639, type OR, state 7) marked, but wants to flush 1321 bytes. (Marked at src/or/connection_or.c:1319)May 31 09:40:36.170 [info] conn_close_if_marked(): We stalled too much while trying to write 1321 bytes to address "23.252.50.179". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 1639, type OR, state 7, marked at src/or/connection_or.c:1319).
Here's another case. Also legit looking, but it makes me wonder if the relay operator got told anything in their logs:
May 31 09:42:26.114 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 81.65.22.106:9001; Sending cells: CERTSMay 31 09:42:26.114 [info] rsa_ed25519_crosscert_check(): Received a bad RSA->Ed25519 crosscert: Crosscert is expiredMay 31 09:42:26.114 [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: Invalid RSA->Ed25519 crosscertMay 31 09:42:26.114 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 81.65.22.106:9001: Invalid certificate chain!May 31 09:42:26.114 [info] conn_close_if_marked(): Conn (addr "81.65.22.106", fd 8175, type OR, state 7) marked, but wants to flush 1321 bytes. (Marked at src/or/connection_or.c:1319)May 31 09:42:26.114 [info] conn_close_if_marked(): We stalled too much while trying to write 1321 bytes to address "81.65.22.106". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 8175, type OR, state 7, marked at src/or/connection_or.c:1319).
And yet another case. Man, there are a lot of ways we can be upset with the CERTS cell we get.
May 31 10:33:36.957 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 2.86.150.234:41648; Sending cells: CERTSMay 31 10:33:36.957 [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: At least one Ed25519 certificate was badly signedMay 31 10:33:36.957 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 2.86.150.234:41648: Invalid certificate chain!May 31 10:33:36.957 [info] conn_close_if_marked(): Conn (addr "2.86.150.234", fd 8039, type OR, state 7) marked, but wants to flush 1321 bytes. (Marked at src/or/connection_or.c:1319)May 31 10:33:36.957 [info] conn_close_if_marked(): We stalled too much while trying to write 1321 bytes to address "2.86.150.234". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 8039, type OR, state 7, marked at src/or/connection_or.c:1319).
Between what versions of Tor do these errors occur?
Are they relay<->relay only, or can they happen relay<->client?
Between a given pair of Tor instances, do they happen every time the pair tries to connect, or only some times?
Between a pair of Tor instances, do they occur no matter who initiates the connection, or do they occur only if the connection starts in a certain direction?
Between which versions of Tor exactly can these occur? They're impossible with 0.2.9, I believe, since they seem to depend on the ed handshake introduced in 0.3.0.x, but do they occur only with some 0.3.x.y<->0.3.x.y pairs, or all?
Another data point. For every [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: At least one Ed25519 certificate was badly signe, I get this two lines above:
[info] channel_register(): Channel 0x616000030680 (global ID 30923) in state opening (1) registered with no identity digest
As far as I can tell on my testnet dirauth, this only happens with >= 030. I also see that weirdly only with unstable relays. For instance (IP removed because from testnet):
May 29 23:27:08.000 [info] channel_tls_process_versions_cell(): Negotiated version 4 with <IP>:80; Sending cells: CERTSMay 29 23:27:08.000 [warn] Received a bad CERTS cell: At least one Ed25519 certificate was badly signedMay 29 23:27:08.000 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from <IP>:80: Invalid certificate chain!May 29 23:27:08.000 [info] conn_close_if_marked(): Conn (addr "<IP>", fd 21, type OR, state 7) marked, but wants to flush 1334 bytes. (Marked at src/or/connection_or.c:1319)May 29 23:27:08.000 [info] conn_close_if_marked(): We stalled too much while trying to write 1334 bytes to address "<IP>". If this happens a lot, either something is wrong with your network connection, or something is wrong with theirs. (fd 21, type OR, state 7, marked at src/or/connection_or.c:1319).May 29 23:27:08.000 [info] connection_or_note_state_when_broken(): Connection died in state 'handshaking (Tor, v3 handshake) with SSL state SSL negotiation finished successfully in OPEN'May 29 23:27:08.000 [info] connection_handle_write_impl(): in-progress connect failed. Removing. (Connection timed out)May 29 23:27:08.000 [info] connection_close_immediate(): fd 24, type Directory, state connecting, 1370 bytes on outbuf.May 29 23:27:08.000 [info] connection_handle_write_impl(): in-progress connect failed. Removing. (Connection timed out)May 29 23:27:08.000 [info] connection_close_immediate(): fd 22, type Directory, state connecting, 1373 bytes on outbuf.
I asked dgoulet what he meant by "unstable", and he said that they are ones that come and go often.
I wonder: do the relays only seem to come and go because of this bug? Or do the relays cause this bug by starting and stopping a lot?
One possible cause for this bug would be if there's some problem with keeping keys and certs in sync as we save them to disk, reload them, generate new ones, etc.
Ok, I picked out a random 0.3.0.7 relay to investigate:
r NAVcoinKAction AMwQ/0JOq/Nc3H/asneNQUzzlaE BqKk6i3DDnYFzMbH1PrN1MEkCCM 2017-05-31 04:44:34 193.233.60.159 443 80s Exit Fast Running Stable V2Dir Validv Tor 0.3.0.7
moria1 is voting Running for it currently.
There is one instance, in the past few weeks, of being unhappy with its certs:
May 26 18:05:31.511 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 193.233.60.159:443; Sending cells: CERTSMay 26 18:05:31.511 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 193.233.60.159:443: Invalid certificate chain!
Whereas both earlier than that and later than that there are successes, e.g.:
May 26 18:15:08.709 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 193.233.60.159:443; Sending cells: CERTSMay 26 18:15:08.709 [info] connection_or_client_learned_peer_id(): learned peer id for 0x7fdacda3a6b0 (193.233.60.159): 00CC10FF424EABF35CDC7FDAB2778D414CF395A1, K4sLttyh2+YfOPF/70MZGNTRQ+Iy6tYui/BdUYVs0ksMay 26 18:15:08.709 [info] dirserv_orconn_tls_done(): Found router $00CC10FF424EABF35CDC7FDAB2778D414CF395A1~NAVcoinKAction at 193.233.60.159 to be reachable at 193.233.60.159:443. Yay.May 26 18:15:08.709 [info] channel_tls_process_certs_cell(): Got some good certificates from 193.233.60.159:443: Authenticated it with RSA and Ed25519May 26 18:15:08.709 [info] channel_tls_process_auth_challenge_cell(): Got an AUTH_CHALLENGE cell from 193.233.60.159:443: Sending authentication type 3May 26 18:15:08.709 [info] channel_tls_process_netinfo_cell(): Got good NETINFO cell from 193.233.60.159:443; OR connection is now open, using protocol version 4. Its ID digest is 00CC10FF424EABF35CDC7FDAB2778D414CF395A1. Our address is apparently 128.31.0.34.
I can also say that this doesn't happen every time with a relay. The logs show it happened twice in two hours for a specific relay and then nothing for that relay even though it is up and running.
It doesn't look like that relay has restarted during that time:
May 26 12:06:28.959 [info] directory_handle_command_post(): New descriptor post, because: bandwidth has changedMay 26 12:06:28.960 [info] dirserv_add_descriptor(): Added descriptor from 'NAVcoinKAction' (source: 193.233.60.159): Valid server updated.[...]May 26 16:05:29.172 [info] directory_handle_command_post(): New descriptor post, because: bandwidth has changedMay 26 16:05:29.172 [info] dirserv_add_descriptor(): Added descriptor from 'NAVcoinKAction' (source: 193.233.60.159): Valid server updated.
My branch bug22460_diagnostic_1 is based on maint-0.3.0, and logs the certificates in the cell when the thing fails. If you can get me the certs for ONE OR TWO failing cases, with the attendant log messages, that'd rock.
May 31 14:39:49.919 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 216.218.222.12:443; Sending cells: CERTSMay 31 14:39:49.919 [info] or_handshake_certs_rsa_ok(): Received a bad CERTS cell: The link certificate didn't match the TLS public keyMay 31 14:39:49.919 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 216.218.222.12:443: Invalid certificate chain!May 31 14:39:49.919 [info] dump_certs_cell(): certificate 1/2: type 1, body 308201BE30820127A003020102020811A589D8189F7A5C300D06092A864886F70D01010505003027312530230603550403131C7777772E6B6B79737A6D6761357571666E6D6F73346E70712E636F6D301E170D3137303231383030303030305A170D3137303932313233353935395A301C311A3018060355040313117777772E6C786E6366746D32762E6E657430819F300D06092A864886F70D010101050003818D0030818902818100C8752AFAB82E416B18C29F375AC502AEFFE29C420AA4B279915E129EF2C14F574EB99862874ED3EDE216C94453704153564032CE7A7036400F2593411B5BE16A0EED9EFD3E3CA98F7C3F402C10EAD0A0F0342C8857278F23F0B93D714E71FFCD2291CA5910E90688EEE5BC48DECD018A87D86BF5E59A40BF3AF4E6F8AB7F2B730203010001300D06092A864886F70D010105050003818100638BA8A0138EA477E7F0722A88FFB7805909FDD932F4E8D8560BB7DC68C405843A4C36C3BDDC0B476B5D08FF3AC5389C5A3C58CB09A8E4014E442D522CCAD25C4DF816FB950FBA3EEFC4C7127C1CED7291F409A590179D4BCA35C8AEBD01A2F3372F547549B7FE10F5D8E70DC5537799D54E1AA929530951BDD55926C367540DMay 31 14:39:49.919 [info] dump_certs_cell(): certificate 2/2: type 2, body 308201CA30820133A003020102020900F3C0E09F5BECE149300D06092A864886F70D01010505003027312530230603550403131C7777772E6B6B79737A6D6761357571666E6D6F73346E70712E636F6D301E170D3137303132303030303030305A170D3138303132303030303030305A3027312530230603550403131C7777772E6B6B79737A6D6761357571666E6D6F73346E70712E636F6D30819F300D06092A864886F70D010101050003818D0030818902818100C6B225FCD79B8E1B1722B43C6A614AF692D16396617625377AB738CD00591D0161BFD35A9A8162CC7E7D2AEF09903648F9CCF0449FA451B66BD2E6239BE7B4ACABD7B712632AAAA06CD3FA171AC9C3D43D17ABF897C828C772FF5BCB9AEE516C617E0AD5F2B71CFFFC855C4922A69CF9FFBC8656B44DB97F525F9FCE11D135A10203010001300D06092A864886F70D0101050500038181006744BDA3DA62CDC6F14BC216212A7D57E19F1D0F59F23366EDB975B6FB3594A7CD8715C770098E696A2E90FAB455601028E85FC170F6C5213408221B35B98D1C85235F3888284B806B856B4B27806600AA60477A6040DC376D0268A464CFDF443F9CB932D346CBDDFB53AF4EB62F81FEEE3D89D37E07E7983B493CA73407574F
May 31 15:20:00.993 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 78.52.211.211:443; Sending cells: CERTSMay 31 15:20:00.993 [info] rsa_ed25519_crosscert_check(): Received a bad RSA->Ed25519 crosscert: Crosscert is expiredMay 31 15:20:00.993 [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: Invalid RSA->Ed25519 crosscertMay 31 15:20:00.993 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 78.52.211.211:443: Invalid certificate chain!May 31 15:20:00.993 [info] dump_certs_cell(): certificate 1/5: type 1, body 30820244308201ADA00302010202081259849D5624B00F300D06092A864886F70D01010B050030253123302106035504030C1A7777772E68626D366C63356E33786F7062733462726C2E636F6D301E170D3137303431313030303030305A170D3137313132383030303030305A3020311E301C06035504030C157777772E66767962726B77793570326B772E6E657430820122300D06092A864886F70D01010105000382010F003082010A0282010100D02AEF06951D49654A67C777E56F4E2A0B3249B5D0B560A1AB69A4135C2BEA9160C7E94AFDE033CCA4F7E8C56805D5E25CF3B4D1D4FB2F51EACEECA9A000C4EEAEE52AE3F7FAC61FF50B3B62819339ACACEDD1BEFD352F2D6886AAB68962E99FEC1E2C2DE6CF0204361BF6A6E3F5BC317639A101E762469DB8A73A651B1609EEBAFB5D06158450DF3152455E5551580A9D66BCB758882ADAE9408B6863BAF21ECFA6FD608E5F224468F70B5F9DC340D8F8F45A86476BE6F10FDBC44471B26651297EBEAC01C70E8BB8354C0A334EC48C32474380278690CCDEC71C46684B5775320E945F141ABF2074FDF9E663E71705CC0448704028E4AF943A29EE1201609B0203010001300D06092A864886F70D01010B0500038181006969DEBA09A81EBB32F5EF51495C93EB879398422BCF5993EA8086D9EBA2C573E9551953E78BA203D35791EFC8E117EA3B3370D205080AC6CDCF70863A7995B77CCFD067A311071F5F9C9A0F09DC4E1B0FF02ED34F71F59B4EB5B9E7D326D6A18371C536521644EAA19BBE1A0E1EDC8596D8491D6963ACF756060D91B9E53A19May 31 15:20:00.993 [info] dump_certs_cell(): certificate 2/5: type 2, body 308201C63082012FA003020102020900A12C926977359128300D06092A864886F70D01010B050030253123302106035504030C1A7777772E68626D366C63356E33786F7062733462726C2E636F6D301E170D3137303130373030303030305A170D3138303130373030303030305A30253123302106035504030C1A7777772E68626D366C63356E33786F7062733462726C2E636F6D30819F300D06092A864886F70D010101050003818D0030818902818100C8857E852A4991705CA30D3460172F42815C5F1E659692B795BDD1E4A00AD319017B565DB2F25C2E8959DF87E4493C4E1455D91C7BB68FF55E6E7FCF818D8308C6D4B70BD066F44FAE33519AD10C7404AC05497DA4E0F20E7F0FAB0A9FC7E320D859D62410CF95E045553730ACCBCFE34C52A3C1AD0ECF0EF8AAD94672685A0F0203010001300D06092A864886F70D01010B0500038181009BD2EFFDBABA39AC6365AEC34976AF2BF2BEB8E3718AF9F113A6ABEB66885441D0DF455B55C4835855BA50E053A42C81003DCDF4C7B6C2AB900CB4D14088575B32DF39A63DB7C6360D8D3893D85070123D63AB18DA42961A7B091F41597A762675C4A37FC5B45B275A15D1F7386AE2F7915896CE716BC26BA09FD3958C679911May 31 15:20:00.993 [info] dump_certs_cell(): certificate 3/5: type 4, body 0104000658E501A85A541FF2B5D4FBC156155D939779733E3AB55E8607D99942D470EBA1E79D96010020040006232608577AC3AF530DF8B046C51722C0C9529C5C98557F5515ACEB195ABCF0824974D7B657073ACEBB35EC2B12C0DA6BC3E602A7AAB3F8523633E073CFAD3E099100B33C9B5DBA09346D5CADD577A0216E0A09BF7895534B01566DBB796907May 31 15:20:00.993 [info] dump_certs_cell(): certificate 4/5: type 5, body 0105000657BA011D59E5BFECC9EC1C894AD8F97E38524120526B48891CF6EF4794FD8006595A12004CD4EEA7DE9517EC1D862154465A8BECFF321547A8BAB3C9C824A09919903CE3CC79460ED5B743F664D8CD5E6007C86CCAEE1502C81C93EE8423A4AD1D82BD05May 31 15:20:00.993 [info] dump_certs_cell(): certificate 5/5: type 7, body 06232608577AC3AF530DF8B046C51722C0C9529C5C98557F5515ACEB195ABCF00001563180463B800A78747A0759A51E037CA30C5253F5CAE555B8B3E9C50520FC3C72259E50339FA76474BC6A693043E443BDEA73F82A82CD94FD550945E9690BC610DB1938E12926781D37B5E72BB0F1991ACD376F45D29B9B8837CC49F5A128130CD553017BF1A4CE9770EE694403F9CE9E9A3C362EC59142B42DC3982A17653ABB64C5
{{{
May 31 15:20:00.993 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 78.52.211.211:443; Sending cells: CERTS
May 31 15:20:00.993 [info] rsa_ed25519_crosscert_check(): Received a bad RSA->Ed25519 crosscert: Crosscert is expired
May 31 15:20:00.993 [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: Invalid RSA->Ed25519 crosscert
May 31 15:20:00.993 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 78.52.211.211:443: Invalid certificate chain!
May 31 15:20:00.993 [info] dump_certs_cell(): certificate 3/5: type 4, body 0104000658E501A85A541FF2B5D4FBC156155D939779733E3AB55E8607D99942D470EBA1E79D96010020040006232608577AC3AF530DF8B046C51722C0C9529C5C98557F5515ACEB195ABCF0824974D7B657073ACEBB35EC2B12C0DA6BC3E602A7AAB3F8523633E073CFAD3E099100B33C9B5DBA09346D5CADD577A0216E0A09BF7895534B01566DBB796907
...
}}}
This is the master cert; it says that the master ID key is 06232608577AC3AF530DF8B046C51722C0C9529C5C98557F5515ACEB195ABCF0, and that the current signing key is A85A541FF2B5D4FBC156155D939779733E3AB55E8607D99942D470EBA1E79D96. The expiration type is 3600 * 0x000658E5, or Jun 15 at 5 am.
{{{
...
May 31 15:20:00.993 [info] dump_certs_cell(): certificate 5/5: type 7, body 06232608577AC3AF530DF8B046C51722C0C9529C5C98557F5515ACEB195ABCF00001563180463B800A78747A0759A51E037CA30C5253F5CAE555B8B3E9C50520FC3C72259E50339FA76474BC6A693043E443BDEA73F82A82CD94FD550945E9690BC610DB1938E12926781D37B5E72BB0F1991ACD376F45D29B9B8837CC49F5A128130CD553017BF1A4CE9770EE694403F9CE9E9A3C362EC59142B42DC3982A17653ABB64C5
}}}
This one is the RSA->Ed crosscert. The signed key is 06232608577AC3AF530DF8B046C51722C0C9529C5C98557F5515ACEB195ABCF0, which is what we had hoped for. But the expiration date here is 0x15631 * 3600 == 1979, Dec 30, 1am! That's very wrong.
Roger reports another relay is doing the "invalid RSA->Ed crosscert" thing too , and pastebinned it at https://paste.debian.net/956374/ . That relay is also reporting an expiration time of 0x15631 in its RSA->Ed crosscert, which is very weird.
May 31 16:01:10.334 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 179.43.158.176:443; Sending cells: CERTSMay 31 16:01:10.334 [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: At least one Ed25519 certificate was badly signedMay 31 16:01:10.334 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 179.43.158.176:443: Invalid certificate chain!May 31 16:01:10.334 [info] dump_certs_cell(): certificate 1/5: type 1, body 3082023A308201A3A0030201020208403CFF88309D1952300D06092A864886F70D01010B0500301E311C301A060355040313137777772E6466636D627A34363567322E636F6D301E170D3137303432333030303030305A170D3137303632343233353935395A301D311B3019060355040313127777772E64696E35717A617533632E6E657430820122300D06092A864886F70D01010105000382010F003082010A02820101009FDA187D1D0E46D09ABDD3952047D134DDF2EE849E59C19D4965E94EAAF25759C8A3A64AADA5382971A05EAE73E8577D591C6551BA6AE844320086F3C2676754AF70808038D4CD621710F8DE027095BC8F688121373A01FB3B3A04518D258F90963AB502BB6E180B3BAC00086223442EBB8FDA00CAC4F1D4C52243E3E6774A35729DCC3C7216E1D808E8A341F99EDE1CCF282104901BC70537BE7DB6C13B95A5A6E5084BC944C9C349939AB8A4976F4988A242AAD64C7A556524543D5E713B5800DF8395C947681F7BE57B0A02B55319A2E41E157B2A1EC2B40F70D210FD0BC86D053FF8EEDE8020241B72626D275C1A5CC096A14EB56FB2C8F808D751B25C4D0203010001300D06092A864886F70D01010B050003818100853CAC504C71B7327648D93C2385A119A1314936492FCD92E3E24D1DF25B3DE12D21EEAB0E0DE4A20142FB944DDF8F3000DC416169658F81FD4391C2F4E2D32C21CD88AD906009B30F55ED6E63A9A81C9BF00B3EEF362A2F25C47853370C7228E4A93598CFB29032FCC9C3564238526484731BBF15C738E8355741273DE67642May 31 16:01:10.334 [info] dump_certs_cell(): certificate 2/5: type 2, body 308201B730820120A00302010202086B8573A0774DCC4B300D06092A864886F70D01010B0500301E311C301A060355040313137777772E6466636D627A34363567322E636F6D301E170D3137303132323030303030305A170D3138303132323030303030305A301E311C301A060355040313137777772E6466636D627A34363567322E636F6D30819F300D06092A864886F70D010101050003818D0030818902818100CE66F4ED20424502B424EB5841B588971CA3C5AC9A1D21EB2F8411F80F6CFA4B4B616BE457CDC5F542360987BA9B10BD7AF9C22C58E1E8949FC9AAC4CD7903DFE95F9884230B50309DC284F409DF08366048AC5BF2A8EAD92870086609365A70B6AF38D1426E4586C517F4BB29F32F8C53AD5BB66A610CC79978E73E05C0EEA90203010001300D06092A864886F70D01010B0500038181001FA628267EAE49A1D4B661D2355A30FB2AB093E8EC5F154C9F922D3E0B04E8B238904FF5F1B69224806C77194C5DD1C6C6F2EEDCCE34913D20864676A615D6B26E664351FD50428E0F2EE653011294536580F96EC0F2B8464FC24CD9F53F5B2E584609DF0D79B2C161B22194D75E854ED47622DE5312A2E2CAE9E5F92E57CE0EMay 31 16:01:10.334 [info] dump_certs_cell(): certificate 3/5: type 4, body 010400065A5D0113828B6B956E21DFC1627EC1650723219DE19D0605FBBFC222E2C742AC81A31D01002004009D6030D9CD31261C33BFAD4571D6171C83ADEF6F2D1FAD82EC9451301FA6EAD0F148CDA928E407A428B3DCE870A3297C74FD08ECD07AF95197CEFB57B4E4BEA0681222BC2D4C13F79785AF2736B572435D246543591935FF11765C570E1EC30CMay 31 16:01:10.334 [info] dump_certs_cell(): certificate 4/5: type 5, body 0105000657BB018FA6D469EA74707EC7EBBC8AE79A75434287ADBDAF1678B003736535228D76C200FE2D6FF56E4A9D5D15105B7891C6B58A284178DBA0EE9FEFEF1A60DBC571BA1FCAD6BC5D5C22F76314DE1D9A7A7518ABC36EB60B1FDBC45D7ACB722FA9520709May 31 16:01:10.334 [info] dump_certs_cell(): certificate 5/5: type 7, body 9D6030D9CD31261C33BFAD4571D6171C83ADEF6F2D1FAD82EC9451301FA6EAD00007AB37800EA89BB84B2256FEE30A6842B4AE0F236489C826D4162C43EF434B8F9D965A1F4DDC44785B097E0431471F88783BFE0042214A619BD1037961441CA2F6AFEE56C042C5B6EC679BFC2262A37E65F8ECCC9C76014D56650040C1FD98A697B3DB956B7EE9B25C3A6CF30C33036E27FFD88F85C42C9D54547E1E0401A6FA5549A578r greenlantern aYuHCM9OoC0nygknTzNRtPZqWDQ CkAtDfZZdApxyUSzF6FOq9OeC9A 2017-05-31 13:56:02 179.43.158.176 443 80s Fast Guard HSDir Running Stable V2Dir Validv Tor 0.3.0.6
{{{
May 31 16:01:10.334 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 179.43.158.176:443; Sending cells: CERTS
May 31 16:01:10.334 [info] or_handshake_certs_ed25519_ok(): Received a bad CERTS cell: At least one Ed25519 certificate was badly signed
May 31 16:01:10.334 [info] channel_tls_process_certs_cell(): Received a bad CERTS cell from 179.43.158.176:443: Invalid certificate chain!
May 31 16:01:10.334 [info] dump_certs_cell(): certificate 4/5: type 5, body 0105000657BB018FA6D469EA74707EC7EBBC8AE79A75434287ADBDAF1678B003736535228D76C200FE2D6FF56E4A9D5D15105B7891C6B58A284178DBA0EE9FEFEF1A60DBC571BA1FCAD6BC5D5C22F76314DE1D9A7A7518ABC36EB60B1FDBC45D7ACB722FA9520709
This checks out okay; it's a master->signing cert, with an expiration time of 0x00065A5D ('Fri Jun 30 21:00:00 2017')
{{{
May 31 16:01:10.334 [info] dump_certs_cell(): certificate 5/5: type 7, body 9D6030D9CD31261C33BFAD4571D6171C83ADEF6F2D1FAD82EC9451301FA6EAD00007AB37800EA89BB84B2256FEE30A6842B4AE0F236489C826D4162C43EF434B8F9D965A1F4DDC44785B097E0431471F88783BFE0042214A619BD1037961441CA2F6AFEE56C042C5B6EC679BFC2262A37E65F8ECCC9C76014D56650040C1FD98A697B3DB956B7EE9B25C3A6CF30C33036E27FFD88F85C42C9D54547E1E0401A6FA5549A578
}}}
This one also looks okay (this time as a sign->link) but the signature does not validate as being correctly signed with the signing key from the first cert. Maybe it was signed with a different key? Maybe it's munged? This one has expiration of 0x657BB ('Fri Jun 2 19:00:00 2017').
The RSA->Ed cert seems to be a also signing the claimed identity key.
{{{
...
r greenlantern aYuHCM9OoC0nygknTzNRtPZqWDQ CkAtDfZZdApxyUSzF6FOq9OeC9A 2017-05-31 13:56:02 179.43.158.176 443 80
s Fast Guard HSDir Running Stable V2Dir Valid
v Tor 0.3.0.6
}}}
BINGO. I have a diagnosis for at least part of the bug. Specifically, this would explain the "At least one Ed25519 certificate was badly signed" case.
In load_ed_keys(), and in generate_ed_link_cert(), we check to see whether our SIGNING->AUTH certificate and our SIGNING->LINK certificate have a signed key that matches our current key... but we don't make sure that the signing key matches the current signing key. So when the signing key is updated, we'll keep serving certificates signed with the old signing key.
Furthermore, in rotate_x509_certificate_callback(), we don't call generate_ed_link_cert(), which means that our link cert will remain outdated for a while.
The above explanation explains the "At least one Ed25519 certificate was badly signed" thing, and it explains the "Link certificate does not match TLS certificate" thing.
I bet that the "The link certificate didn't match the TLS public key" thing is similar, but I'm not sure. More investigation needed.
The "Crosscert is expired" case is still mysterious. It looks as if we were passing 0 or -1 to load_ed_keys() for "now", but looking at the code in maint-0.3.0, I don't see how we could actually do that.
Bugfix for "At least one Ed25519 certificate was badly signed" thing, and it explains the "Link certificate does not match TLS certificate" in bug22460_030_01.
I bet that the "The link certificate didn't match the TLS public key" thing is similar, but I'm not sure.
It's similar, I think. I believe this bug happens when we send an x.509 cert as part of a TLS handshake, then rotate our TLS context, then send our certs cell from connection_or_send_certs_cell(). This timing for this rotation means that the link certificate in our certs cell won't match the one from the TLS handshake.
I bet that the "The link certificate didn't match the TLS public key" thing is similar, but I'm not sure.
It's similar, I think. I believe this bug happens when we send an x.509 cert as part of a TLS handshake, then rotate our TLS context, then send our certs cell from connection_or_send_certs_cell(). This timing for this rotation means that the link certificate in our certs cell won't match the one from the TLS handshake.
I'm trying to think of a good bugfix for this one that doesn't have a race condition. The problem with this case is that it means that my fix above is incomplete -- we need to update the signing->link certificate on TLS rotation, yes, but we should still serve the old signing->link certificate on all connections that existed before the TLS context rotated.
I think the answer may be to cache the signing->link certificate at the time that the connection is created? It's not perfectly elegant, but it would work.
Current patch list: bug22460_030_01 and bug22460_case2_029_01. The latter breaks the unit tests, the former needs a tweak as described above. Both are worth reviewing.
Trac: Summary: Received a bad CERTS cell: Link certificate does not match TLS certificate to Link handshake trouble: certificates and keys can get out of sync
lgtm bug22460_030_01. I unfortunately cannot confirm if this resolves the situation until I deploy this on the testnet (or arma on moria1) but code looks good.
For bug22460_case2_029_01, the comment of this function should mention that a newly allocated object is returned. We aren't returning a global value or something in a state somewhere.
+/** Return the cerficate we used on the connection, or NULL if somehow+ * we didn't use one. */+MOCK_IMPL(tor_x509_cert_t *,+tor_tls_get_own_cert,(tor_tls_t *tls))
Second thing, maybe tor_x509_cert_dup() should be unit test only for now? It's dead code if no unit tests.
bug22460_030_01: Looks good, but I get a bit anxious where we conditionally set s->own_link_cert, but then we unconditionally use it in connection_or_send_certs_cell():
Are we sure that there is no chance we will leave own_link_cert uninitialized? If there is such a chance, should we default to get_current_link_cert_cert() if own_link_cert is unset in connection_or_send_certs_cell()?
bug22460_case2_029_01: Seems like tor_tls_get_peer_cert() and tor_tls_get_own_cert() are the same function but with a different log message? Am i right, that we did that so that we can mock one function but not the other?
And is it for the same mocking reason we use both functions here, even tho the functionality is the same?
I guess the code duplication is a bit naughty here, as well as the name similarity between tor_tls_get_own_cert() and tor_tls_get_my_certs() (even tho they do a different thing IIUC). But if this is just for 0.2.9 then it's probably OK.
Thanks for the reviews! Except as noted, I've made the requested changes. George, you successfully found a major bug in the "bug22460_case2_029_01" branch: I should have been calling SSL_get_certificate(), not SSL_get_peer_certificate().
Second thing, maybe tor_x509_cert_dup() should be unit test only for now? It's dead code if no unit tests.
Good catch. If you don't mind, I'd like to leave it in: there are a few other places where we should be using it IIRC where we have silly kludges instead.
Are we sure that there is no chance we will leave own_link_cert uninitialized?
Take another look at add_ed25519_cert(): it is a no-op if cert is NULL. I'll update the documentation comment to make the behavior explicit, and add a tor_assert_nonfatal().
The make test-network is failing. It appears that some node aren't able to start due to this abort:
Tor 0.2.9.10-dev (git-ee0bd9760ba51f2e) died: Caught signal 11/home/dgoulet/Documents/git/tor/src/or/tor(+0x147869)[0x561a20e9a869]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(i2c_ASN1_INTEGER+0xf)[0x7f5f72640c6f]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(i2c_ASN1_INTEGER+0xf)[0x7f5f72640c6f]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(asn1_ex_i2c+0x154)[0x7f5f7264bad4]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13ebca)[0x7f5f7264bbca]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_item_ex_i2d+0x1d6)[0x7f5f7264bea6]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13f3ae)[0x7f5f7264c3ae]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_item_ex_i2d+0x3ab)[0x7f5f7264c07b]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13f2d8)[0x7f5f7264c2d8]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_item_ex_i2d+0x3ab)[0x7f5f7264c07b]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13f1df)[0x7f5f7264c1df]/home/dgoulet/Documents/git/tor/src/or/tor(+0x170046)[0x561a20ec3046]/home/dgoulet/Documents/git/tor/src/or/tor(connection_or_compute_authenticate_cell_body+0x349)[0x561a20e581b9]/home/dgoulet/Documents/git/tor/src/or/tor(+0xbb340)[0x561a20e0e340]/home/dgoulet/Documents/git/tor/src/or/tor(channel_tls_handle_var_cell+0x623)[0x561a20e11013]/home/dgoulet/Documents/git/tor/src/or/tor(+0x100ba3)[0x561a20e53ba3]/home/dgoulet/Documents/git/tor/src/or/tor(+0xf802e)[0x561a20e4b02e]/home/dgoulet/Documents/git/tor/src/or/tor(+0x4253e)[0x561a20d9553e]/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x6a0)[0x7f5f72bc7420]/home/dgoulet/Documents/git/tor/src/or/tor(do_main_loop+0x22d)[0x561a20d965ad]/home/dgoulet/Documents/git/tor/src/or/tor(tor_main+0x1c25)[0x561a20d99e55]/home/dgoulet/Documents/git/tor/src/or/tor(main+0x19)[0x561a20d92049]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f5f71afd3f1]/home/dgoulet/Documents/git/tor/src/or/tor(_start+0x2a)[0x561a20d9209a]
And also sometimes:
Tor 0.2.9.10-dev (git-ee0bd9760ba51f2e) died: Caught signal 11/home/dgoulet/Documents/git/tor/src/or/tor(+0x147869)[0x559a03755869]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_template_free+0x2a)[0x7fb67edb88ca]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_template_free+0x2a)[0x7fb67edb88ca]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13e672)[0x7fb67edb8672]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_item_free+0x15)[0x7fb67edb8885]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13a3cd)[0x7fb67edb43cd]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x13e698)[0x7fb67edb8698]/lib/x86_64-linux-gnu/libcrypto.so.1.0.0(ASN1_item_free+0x15)[0x7fb67edb8885]/home/dgoulet/Documents/git/tor/src/or/tor(tor_x509_cert_free+0x2a)[0x559a0377f91a]/home/dgoulet/Documents/git/tor/src/or/tor(connection_or_compute_authenticate_cell_body+0x1a6)[0x559a03713016]/home/dgoulet/Documents/git/tor/src/or/tor(+0xbb340)[0x559a036c9340]/home/dgoulet/Documents/git/tor/src/or/tor(channel_tls_handle_var_cell+0x623)[0x559a036cc013]/home/dgoulet/Documents/git/tor/src/or/tor(+0x100ba3)[0x559a0370eba3]/home/dgoulet/Documents/git/tor/src/or/tor(+0xf802e)[0x559a0370602e]/home/dgoulet/Documents/git/tor/src/or/tor(+0x4253e)[0x559a0365053e]/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x6a0)[0x7fb67f334420]/home/dgoulet/Documents/git/tor/src/or/tor(do_main_loop+0x22d)[0x559a036515ad]/home/dgoulet/Documents/git/tor/src/or/tor(tor_main+0x1c25)[0x559a03654e55]/home/dgoulet/Documents/git/tor/src/or/tor(main+0x19)[0x559a0364d049]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb67e26a3f1]/home/dgoulet/Documents/git/tor/src/or/tor(_start+0x2a)[0x559a0364d09a]
Squashed the second one as bug22460_case2_029_01_squashed . Merged it to 0.2.9, and merged it forward with conflicts (please review d1c1dc229e189c).
Merged the first to 0.3.0 and forward. Had to add 91f49bc0f0759d0e0a794fbfe8cce5a9bb07e607 for the tests to pass again, because the assert_nonfatal() I had added was triggered by some of the code that tested the pre-ed case.
Going to put in needs_review till somebody's looked at the mentioned commits, and had a chance to test it a lot more.
For the record, this appears to be fixed on the test network with these patches.
(But we only have a few authorities and relays, so we should check newer versions work on the public network as well.)
i still receiving this for my relays running Tor 0.4.1.6
Nov 17 03:32:04.000 [warn] {PROTOCOL} Received a bad CERTS cell: Link certificate does not match TLS certificateNov 17 03:32:04.000 [warn] {PROTOCOL} Received a bad CERTS cell from 130.149.80.199:9001: Invalid certificate chain!