Opened 21 months ago

Last modified 3 months ago

#21087 accepted defect

Separate truncated descriptor(s) from next complete descriptor

Reported by: atagar Owned by: wulder
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: metrics-help, metrics-2018
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi Karsten, a user reached out to me because Stem's validator warns about a CollecTor tarball. In particular it's surprised by @source annotations in the server descriptors.

Here's the *server-descriptors-2016-09/2/2/228e3ecf654e1b7b4f01a0027e599e7ba14b216c* descriptor from the tarball for an example...

@type server-descriptor 1.0 
router sauronkingofmortor 137.74.116.214 9001 0 9030
identity-ed25519
-----BEGIN ED25519 CERT-----
AQQABkAYAXe8xhBhoRVgI2ZswouGG50gLzYsWudXIp96bCAloSStAQAgBADs9XUH
7zgiFd+mjPWwFLUpvma8qvdtChcgp4K6WDDnU6ub3BDNZ7nGTDvYPHVmq4URzobG
uAsjOIPlf1vkU3YJdpBe0KGHy5JeuJ10TDQwlK1F761pSApIdH1ocIg4oAE=
-----END ED25519 CERT-----
master-key-ed25519 7PV1B+84IhXfpoz1sBS1Kb5mvKr3bQoXIKeCulgw51M
platform Tor 0.2.8.7 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 09:23:41
fingerprint 2D8A FA91 2E2B 8623 BB2C DACD 1933 2209 D524 D1A3
uptime 860586
bandwidth 12288000 12288000 7792456
extra-info-digest 2017D54A2C28B100CE173351E0799E15153B703B                                          D2vKVNwaxArp6bf11NWPRNoYGQ0lBgIwziSXNkL9TCw
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMNJzNJiDwd8y7ge4aXjkUCBKDncNhC91i5SQkxTHX4ZR/05+/liwR5O
TPgoIG0FDQSEUMYDPY92XsRmgPXkpHBSga0ojrhwnYutXAPMRuT4Dm24kpJctdbG
kwW6aovjNcoeJE3iB5ahUCv/TDnuiijioRSfjTPQsW68gHo1rOxJAgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAJhvAVj6wurlz3khW1Z/2x8sAnyr9lBdiHMp8UEAhYw+7ct1fdmuZXbA
I9aZbb7GEgR9UBW67qYd0aN1XHbDwb4OvAW+TOzcCjBmqiSLl5WACl0wIjuif7++
xNVcRw04kmmbBf7IyjmmuCc6ihjGeG02aREitZGBSkyZwt8SAz0fAgMBAAE=
-----END RSA PUBLIC KEY-----
onion-key-crosscert
-----BEGIN CROSSCERT-----
ct5RfDtMM5h5G6T6pFkRANCsJGcjwpPK+b47yWoQSdH7C0Y4yjWX5Z48l511fPK6
1v4IINEnuiCMkDp4HGpSW87aHatUaWP6MVo6pwQB2uqi8SpjPdlf6pJfSYNsvaZh
00P6ENAXzDnFFvcNla0WI7o6rIE2tuP3qd7bxazACUU=
-----END CROSSCERT-----
ntor-onion-key-crosscert 0
-----BEGIN ED25519 CERT-----
AQoABj/6Aez1dQfvOCIV36aM9bAUtSm+Zryq920KFyCngrpYMOdTANd0d0EMe6BU
CZrDB67jdOEX8P0T1MY1razuVMyvAjS1MPsM/F7uvCvgf1Su4NJFodWWPGLXWnHZ
RFSpVcHmmg8=
-----END ED25519 CERT-----
hidden-service-dir
contact luciole <luciolesauronkingofmortor@yopmail.com>
ntor-onion-key lhvzaL7Ze85GFMWMQscMgIt9IOx6srmOiXqD85kOekI=
reject *:@uploaded-at 2016-09-15 09:24:06
@source "82.1.128.70"
router torbeornottorbe 82.1.128.70 9001 0 0
platform Tor 0.2.4.27 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 09:24:06
fingerprint C6EE 9826 7F82 962C C2FC 1E9E 2AE5 F317 B2D2 D6F0
uptime 762082
bandwidth 1024000 2048000 106721
extra-info-digest 08D7C6A9FF860F6A5D12FB43BD2051ACC06BCE52
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMb/ajivr7C1z7cnVSz4dPe+T0cOvB6ickNb8vjquDM8eZh7mLecSACT
H1D5DO97aJ0L1Bw5oOLzU77zx/2e/UUnHftiyZ8sNLmAE7smgEdUvhqNZSY+VSgN
E1Qyc6CdBpJWdSRp1+/AbYq0XWXMTrkb7YvRyR0iuYDn03s82DU/AgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAK5CqRjTHbA+AHxLqSCWoEOpiVUNqpdiEUVTvdmu7aQgPcR2VI/fS/oc
tPmfC6L0l4eL0u1zZzbxJ8z5mop0M+0Wss8gWdpO7t7MNHu/GJ78gRRhb6Yz2JQf
jTZcVGyDsI8PJZoH+if3slVCUcq14zy85hb9sF9spaDhTEBbhx+rAgMBAAE=
-----END RSA PUBLIC KEY-----
family $5A8B78AB293475D6D55F1CBFA5D2A1CEEB09545B $EAE900D1DB28D56F4535C06F1BAEB92B9E3BFEE6
hidden-service-dir
contact A36F 07B9 285A C895 3E42 69F3 0CCC 0AF6 2FEC DB6E Random Person <nae AT blueyonder dot co   dot uk>
ntor-onion-key 90dT83YmTzH/uojnATf+KOtwJssKGURO/qdu3SR0XgE=
reject *:*
router-signature
-----BEGIN SIGNATURE-----
PohhIu5DPg4iK+5AV3/sLMbpiwCItMbnaNVWrve9nKXyHM18eskYpL1sLyj7/3Nk
YKmFheD/alawStTr3rHkopdR8yj+1LZmWPlSHTy3x/U+uAzQl+66YcECEdw1xKMY
oaYngrHlZSrCEgwDKwIS4GJ/rOYjGUl0HCC9z0OaZ5M=
-----END SIGNATURE-----

Seems this is two descriptors concatenated together with a @source in the middle? Any hints on where these come from?

Thanks!

Child Tickets

Change History (15)

comment:1 Changed 21 months ago by karsten

Keywords: metrics-help added
Type: enhancementdefect

Hi Damian!

This looks like a truncated descriptor (with the last characters being "reject *:") and another complete descriptor obtained from cached descriptor files (starting with "@uploaded-at").

CollecTor could indeed be smarter about separating those two descriptors. What it does is look for a descriptor start "^router " and the next descriptor end "\nrouter-signature\n" and consider anything between the descriptor. Here's the relevant code.

We did not notice this issue before, because we'd have discarded the descriptor after finding that it doesn't pass metrics-lib's parser. But as of five months ago, we're keeping those descriptors anyway.

A possible fix would be to check whether there's another startToken (or rather another string "\n" + startToken) before sigToken, and if there is, treat that substring as separate descriptor. In fact, there could be several truncated descriptors before the first complete descriptor.

This could be something that a new volunteer could hack on.

Thanks for the report!

comment:2 Changed 21 months ago by iwakeh

Maybe, I overlooked something, just point me there: where does the @source come from?

comment:3 Changed 21 months ago by karsten

The @source comes from cached-descriptors files and is added by directory authorities. Example from the current file:

@uploaded-at 2016-12-30 21:01:35
@source "178.32.53.94"
router ashtrayhat3 178.32.53.94 443 0 80
identity-ed25519
[...]

comment:4 Changed 19 months ago by amj703

Hi Damian,

This is happy stem user Aaron Johnson from NRL. I noticed a similar skip event while parsing a descriptor archive from CollecTor. When using stem 1.5.4 to parse the latest descriptor archive from this month yesterday, I got the following skip event notification: "ERROR [server-descriptors-2017-02.tar.xz]: Line contains invalid characters: ntor-oni@uploaded-at 2017-02-02 02:45:52". This looks like it may well be the same issue. Just thought I'd let you know that I saw it as well and recently!

comment:5 Changed 19 months ago by atagar

Thanks Aaron. Sadly not sure there's anything I can do on my side. Think this is in karsten's court.

comment:6 Changed 19 months ago by karsten

Priority: MediumHigh

amj703, thanks for the additional data point. atagar is right, this is something we should look into, rather than wait for that mystical new volunteer to appear.

comment:7 Changed 16 months ago by iwakeh

Cc: iwakeh added

Adding myself to cc to make trac mail updates.

comment:8 Changed 12 months ago by karsten

Summary: What is @source?Separate truncated descriptor(s) from next complete descriptor

Change the summary to reflect what a possible solution could be here.

comment:9 Changed 12 months ago by karsten

Keywords: metrics-2018 added

comment:10 Changed 12 months ago by karsten

Keywords: metrics-2017 added; metrics-2018 removed

comment:11 Changed 9 months ago by iwakeh

Keywords: metrics-2018 added; metrics-2017 removed

Will be completed in 2018.

comment:12 Changed 7 months ago by karsten

Priority: HighMedium

This ticket has not been modified in the last 2 months or even longer. Setting priority to medium.

comment:13 Changed 3 months ago by wulder

I think I'll pick up this ticket if no one else if working on it.

comment:14 Changed 3 months ago by irl

wulder: Please do. (:

If you click "Modify Ticket" and then select "accept" the owner of the ticket will be set to you.

comment:15 Changed 3 months ago by wulder

Owner: changed from metrics-team to wulder
Status: newaccepted
Note: See TracTickets for help on using tickets.