Opened 2 years ago

Last modified 2 weeks ago

#21087 accepted defect

Separate truncated descriptor(s) from next complete descriptor

Reported by: atagar Owned by: wulder
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: metrics-help, metrics-2018
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi Karsten, a user reached out to me because Stem's validator warns about a CollecTor tarball. In particular it's surprised by @source annotations in the server descriptors.

Here's the *server-descriptors-2016-09/2/2/228e3ecf654e1b7b4f01a0027e599e7ba14b216c* descriptor from the tarball for an example...

@type server-descriptor 1.0 
router sauronkingofmortor 137.74.116.214 9001 0 9030
identity-ed25519
-----BEGIN ED25519 CERT-----
AQQABkAYAXe8xhBhoRVgI2ZswouGG50gLzYsWudXIp96bCAloSStAQAgBADs9XUH
7zgiFd+mjPWwFLUpvma8qvdtChcgp4K6WDDnU6ub3BDNZ7nGTDvYPHVmq4URzobG
uAsjOIPlf1vkU3YJdpBe0KGHy5JeuJ10TDQwlK1F761pSApIdH1ocIg4oAE=
-----END ED25519 CERT-----
master-key-ed25519 7PV1B+84IhXfpoz1sBS1Kb5mvKr3bQoXIKeCulgw51M
platform Tor 0.2.8.7 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 09:23:41
fingerprint 2D8A FA91 2E2B 8623 BB2C DACD 1933 2209 D524 D1A3
uptime 860586
bandwidth 12288000 12288000 7792456
extra-info-digest 2017D54A2C28B100CE173351E0799E15153B703B                                          D2vKVNwaxArp6bf11NWPRNoYGQ0lBgIwziSXNkL9TCw
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMNJzNJiDwd8y7ge4aXjkUCBKDncNhC91i5SQkxTHX4ZR/05+/liwR5O
TPgoIG0FDQSEUMYDPY92XsRmgPXkpHBSga0ojrhwnYutXAPMRuT4Dm24kpJctdbG
kwW6aovjNcoeJE3iB5ahUCv/TDnuiijioRSfjTPQsW68gHo1rOxJAgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAJhvAVj6wurlz3khW1Z/2x8sAnyr9lBdiHMp8UEAhYw+7ct1fdmuZXbA
I9aZbb7GEgR9UBW67qYd0aN1XHbDwb4OvAW+TOzcCjBmqiSLl5WACl0wIjuif7++
xNVcRw04kmmbBf7IyjmmuCc6ihjGeG02aREitZGBSkyZwt8SAz0fAgMBAAE=
-----END RSA PUBLIC KEY-----
onion-key-crosscert
-----BEGIN CROSSCERT-----
ct5RfDtMM5h5G6T6pFkRANCsJGcjwpPK+b47yWoQSdH7C0Y4yjWX5Z48l511fPK6
1v4IINEnuiCMkDp4HGpSW87aHatUaWP6MVo6pwQB2uqi8SpjPdlf6pJfSYNsvaZh
00P6ENAXzDnFFvcNla0WI7o6rIE2tuP3qd7bxazACUU=
-----END CROSSCERT-----
ntor-onion-key-crosscert 0
-----BEGIN ED25519 CERT-----
AQoABj/6Aez1dQfvOCIV36aM9bAUtSm+Zryq920KFyCngrpYMOdTANd0d0EMe6BU
CZrDB67jdOEX8P0T1MY1razuVMyvAjS1MPsM/F7uvCvgf1Su4NJFodWWPGLXWnHZ
RFSpVcHmmg8=
-----END ED25519 CERT-----
hidden-service-dir
contact luciole <luciolesauronkingofmortor@yopmail.com>
ntor-onion-key lhvzaL7Ze85GFMWMQscMgIt9IOx6srmOiXqD85kOekI=
reject *:@uploaded-at 2016-09-15 09:24:06
@source "82.1.128.70"
router torbeornottorbe 82.1.128.70 9001 0 0
platform Tor 0.2.4.27 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 09:24:06
fingerprint C6EE 9826 7F82 962C C2FC 1E9E 2AE5 F317 B2D2 D6F0
uptime 762082
bandwidth 1024000 2048000 106721
extra-info-digest 08D7C6A9FF860F6A5D12FB43BD2051ACC06BCE52
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMb/ajivr7C1z7cnVSz4dPe+T0cOvB6ickNb8vjquDM8eZh7mLecSACT
H1D5DO97aJ0L1Bw5oOLzU77zx/2e/UUnHftiyZ8sNLmAE7smgEdUvhqNZSY+VSgN
E1Qyc6CdBpJWdSRp1+/AbYq0XWXMTrkb7YvRyR0iuYDn03s82DU/AgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAK5CqRjTHbA+AHxLqSCWoEOpiVUNqpdiEUVTvdmu7aQgPcR2VI/fS/oc
tPmfC6L0l4eL0u1zZzbxJ8z5mop0M+0Wss8gWdpO7t7MNHu/GJ78gRRhb6Yz2JQf
jTZcVGyDsI8PJZoH+if3slVCUcq14zy85hb9sF9spaDhTEBbhx+rAgMBAAE=
-----END RSA PUBLIC KEY-----
family $5A8B78AB293475D6D55F1CBFA5D2A1CEEB09545B $EAE900D1DB28D56F4535C06F1BAEB92B9E3BFEE6
hidden-service-dir
contact A36F 07B9 285A C895 3E42 69F3 0CCC 0AF6 2FEC DB6E Random Person <nae AT blueyonder dot co   dot uk>
ntor-onion-key 90dT83YmTzH/uojnATf+KOtwJssKGURO/qdu3SR0XgE=
reject *:*
router-signature
-----BEGIN SIGNATURE-----
PohhIu5DPg4iK+5AV3/sLMbpiwCItMbnaNVWrve9nKXyHM18eskYpL1sLyj7/3Nk
YKmFheD/alawStTr3rHkopdR8yj+1LZmWPlSHTy3x/U+uAzQl+66YcECEdw1xKMY
oaYngrHlZSrCEgwDKwIS4GJ/rOYjGUl0HCC9z0OaZ5M=
-----END SIGNATURE-----

Seems this is two descriptors concatenated together with a @source in the middle? Any hints on where these come from?

Thanks!

Child Tickets

Change History (16)

comment:1 Changed 2 years ago by karsten

Keywords: metrics-help added
Type: enhancementdefect

Hi Damian!

This looks like a truncated descriptor (with the last characters being "reject *:") and another complete descriptor obtained from cached descriptor files (starting with "@uploaded-at").

CollecTor could indeed be smarter about separating those two descriptors. What it does is look for a descriptor start "^router " and the next descriptor end "\nrouter-signature\n" and consider anything between the descriptor. Here's the relevant code.

We did not notice this issue before, because we'd have discarded the descriptor after finding that it doesn't pass metrics-lib's parser. But as of five months ago, we're keeping those descriptors anyway.

A possible fix would be to check whether there's another startToken (or rather another string "\n" + startToken) before sigToken, and if there is, treat that substring as separate descriptor. In fact, there could be several truncated descriptors before the first complete descriptor.

This could be something that a new volunteer could hack on.

Thanks for the report!

comment:2 Changed 2 years ago by iwakeh

Maybe, I overlooked something, just point me there: where does the @source come from?

comment:3 Changed 2 years ago by karsten

The @source comes from cached-descriptors files and is added by directory authorities. Example from the current file:

@uploaded-at 2016-12-30 21:01:35
@source "178.32.53.94"
router ashtrayhat3 178.32.53.94 443 0 80
identity-ed25519
[...]

comment:4 Changed 2 years ago by amj703

Hi Damian,

This is happy stem user Aaron Johnson from NRL. I noticed a similar skip event while parsing a descriptor archive from CollecTor. When using stem 1.5.4 to parse the latest descriptor archive from this month yesterday, I got the following skip event notification: "ERROR [server-descriptors-2017-02.tar.xz]: Line contains invalid characters: ntor-oni@uploaded-at 2017-02-02 02:45:52". This looks like it may well be the same issue. Just thought I'd let you know that I saw it as well and recently!

comment:5 Changed 2 years ago by atagar

Thanks Aaron. Sadly not sure there's anything I can do on my side. Think this is in karsten's court.

comment:6 Changed 2 years ago by karsten

Priority: MediumHigh

amj703, thanks for the additional data point. atagar is right, this is something we should look into, rather than wait for that mystical new volunteer to appear.

comment:7 Changed 2 years ago by iwakeh

Cc: iwakeh added

Adding myself to cc to make trac mail updates.

comment:8 Changed 19 months ago by karsten

Summary: What is @source?Separate truncated descriptor(s) from next complete descriptor

Change the summary to reflect what a possible solution could be here.

comment:9 Changed 19 months ago by karsten

Keywords: metrics-2018 added

comment:10 Changed 19 months ago by karsten

Keywords: metrics-2017 added; metrics-2018 removed

comment:11 Changed 16 months ago by iwakeh

Keywords: metrics-2018 added; metrics-2017 removed

Will be completed in 2018.

comment:12 Changed 14 months ago by karsten

Priority: HighMedium

This ticket has not been modified in the last 2 months or even longer. Setting priority to medium.

comment:13 Changed 11 months ago by wulder

I think I'll pick up this ticket if no one else if working on it.

comment:14 Changed 11 months ago by irl

wulder: Please do. (:

If you click "Modify Ticket" and then select "accept" the owner of the ticket will be set to you.

comment:15 Changed 11 months ago by wulder

Owner: changed from metrics-team to wulder
Status: newaccepted

comment:16 Changed 2 weeks ago by wulder

If we are keeping the truncated descriptors and treating them as separate descriptors we will need an alternative end token to determine the number of bytes to send to the archive writer.

Currently {{"\nrouter-signature\n";}}} is used as the end token but we won't have this for the truncated descriptors. Is it safe to use "reject *:" instead? Will this always be included at the end of a truncated descriptor?

Last edited 2 weeks ago by wulder (previous) (diff)
Note: See TracTickets for help on using tickets.