Opened 3 years ago

Closed 2 years ago

#20412 closed defect (fixed)

Skip bad archived descriptors rather than aborting the entire import

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords: metrics-2017
Cc: iwakeh Actual Points:
Parent ID: #20548 Points:
Reviewer: iwakeh Sponsor:

Description

The Onionoo mirror broke in late September for some reason I don't know, and the host didn't come back afterwards. We only noticed two weeks later and had to reimport September and October data. However, the September archives contain a bad descriptor that breaks the import. Here's the bad descriptor (3/8/384f93dbac20fdf293a731b391b3fc0757d9f78a in server-descriptors-2016-09.tar.xz):

@type server-descriptor 1.0
router Pegasus70 78.142.19.172 443 0 80
platform Tor 0.2.5.12 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 08:31:03
fingerprint E9C5 8383 DB9A E52A DAF3 5F91 88B2 741A 05F5 A02F
uptime 1016140
bandwidth 8746942 8955284 7957930
extra-info-digest C07612E283D3157219DA1DDDF3AE125268206412
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAPNQHBnVRiPl7H5cRC/GOMKIeGRSAuM/3Jzuxrg0idlL1YPoQtKAfaqI
LY9cGSEk88FGcOkgZdDiwSL9LAtBF1hpYB2ajGjNhTQkae00DC1NlWGzi89wkA/R
4qxSCm4mjoY7EEmfOLI/X/Rp9FE8rL7X39XK6q+nv5uyHI+T/7GHAgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMojXIJSwwavFOc6afWILpgIc4sAXd8KSsOh966rLjuGZyUsN3+gqta9
2QLV4HOBy9L24NRE3iXmySlfTiT2pxiSXo+h/B18Gw2clSewHx7xC1QnkT69xxL2
6AvOu5NDbu5SxHOtOi95FEnuE6VmKZhBawHD3KG6j6euZINilojrAgMBAAE=
-----END RSA PUBLIC KEY-----
family $9D14BAC27FFE7170601FC0EC792A927E1FC11A1D
hidden-service-dir
ntor-onion-key vk7nDH5FVjFWxflyapUT+9+em+CGO/aaYjaO6LGJ3B0=
reject 0.0.0.0/8:*
reject 169.254.0.0/16:*
reject 127.0.0.0/8:*
reject 192.168.0.0/16:*
reject 10.0.0.0/8:*
reject 172.16.0.0/12:*
reject 78.142.19.172:*
accept *:20-21
accept *:43
accept *:53
accept *:79-81
accept *:88
accept *:110
accept *:143
accept *:194
accept *:220
accept *:389
accept *:443
accept *:464
accept *:531
accept *:543-544
accept *:554
accept *:563
accept *:636
accept *:706
accept *:749
accept *:873
accept *:902-904
accept *:981
accept *:1194
accept *:1220
accept *:1293
accept *:1500
accept *:1533
accept *:1677
accept *:1723
accept *:1755
accept *:1863
accept *:2082
accept *:2083
accept *:2086-2087
accept *:2095-2096
accept *:2102-2104
accept *:3128
accept *:3389
accept *:3690
accept *:4321
accept *:4643
accept *:5050
accept *:5190
accept *:5222-5223
accept *:5228
accept *:5900
accept *:6697
accept *:8008
accept *:8074
accept *:8080
accept *:8082
accept *:8087-8088
@uploaded-at 2016-09-15 08:32:03
@source "79.134.255.35"
router Beluga 79.134.255.35 1979 0 0
identity-ed25519
-----BEGIN ED25519 CERT-----
AQQABkDiAfjGVdzeISYHVC86lkA1GRbNmgn80ndEWHoNfqq3apelAQAgBACmRaXq
1UdBqrNx7dYOhs62167xULhT4QoThd/IgiZw18mYn19eCtf0qfGiDmYv1v4d1INq
drh+i4yS1XGw8oypyLU27mt9BI5ezMXHMeKkEvRdgNwg5K2Levzw7PhK6go=
-----END ED25519 CERT-----
master-key-ed25519 pkWl6tVHQaqzce3WDobOtteu8VC4U+EKE4XfyIImcNc
platform Tor 0.2.8.7 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 08:31:58
fingerprint EC69 7C3D 5819 B16B B899 D29A 18B9 E7B6 095D FAEC
uptime 129602
bandwidth 2097152 3145728 1286199
extra-info-digest 387513B8F45EFB6711F40FA6869146DE62B058D5 l+9BGbujAcdSANxivZN210RJSHsHSQCQMPqOYg4VNSA
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBALKRXi8ClqAACiYtBCF+Ot4154CxhykXufXQFGEYR2KkyEI4wPp2E/hV
izLQjrjmIq+akyFUGNE/u/OY5seeUlcFtFnBHfsotrtBkL8yqMqmyheL5OG1CWX3
ROKd6UtzMP1ebIcalS0hdc7nlpOlzxd91IJjlE5eI/jKJyTKl4C1AgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBALL8VZK/RpoV8XkaSFTjFfchYDeTrzToWgiE8fFm68Pato+iQ5xArjkW
gKaj4DAqrUMZgR2rz0joiD9U6lEssEFhXM0laqlLpcuhoBB+6BbiLOFbcm6MHPcr
eVuNRjcwRIr71SCASCih52LHuyOCDwqMnNpdIyCLse+AAtdqmJ3TAgMBAAE=
-----END RSA PUBLIC KEY-----
onion-key-crosscert
-----BEGIN CROSSCERT-----
FZg86wGaQqd4v8nBC0GQm1bHo0ooZFh79XnZPVYzw/gBUIguy4vbumIb07sXd7+1
C7cNBlfpy/1UQH0t4l9Crqj6LqL9DNJr8xOKx0hbgw5dxDHJNu16+qMTsdo30RVQ
O1XBS12Wh6Or/suW8D+wRQ8TT804c0wdc5TMTJIe9Wo=
-----END CROSSCERT-----
ntor-onion-key-crosscert 0
-----BEGIN ED25519 CERT-----
AQoABj/5AaZFperVR0Gqs3Ht1g6GzrbXrvFQuFPhChOF38iCJnDXAJ4T5u2HXRQF
1B8/CNz0+VNfa1C+kP/CoAT+qxud/wJCpoUo3au/YzSZSrjTtstKTs8lv7chn+QT
JiqklO+iegM=
-----END ED25519 CERT-----
hidden-service-dir
contact BTC 1EfzsAj6rLnvYMuAZeTLBmhZ3gHcjxfUkp
ntor-onion-key V8SPDfH90+Fa8Q21xYzG5qqovaOsKqP2aZKdHBWAWBI=
reject *:*
tunnelled-dir-server
router-sig-ed25519 4CUXRGMVxOXkR+qgv/W+Jsz7WgGWVloER9qZgfqbNNEA3UyAo2W7odoAcBpwrrPFarAYoza1T7I2WmsigneFDw
router-signature
-----BEGIN SIGNATURE-----
aH+lYDSsISil+cUXxbXzv2H/M5rDguHOMKbMSZMBRSTIUAV6zrxaSEJgQweOJyr4
CiqAJjxp4sTcnUvsaqFZwrO8xsv+LZHabIbNu+DnuPtS66Xngh3q/wEJbx/VLWfq
yAexVH/GnxAfwplvB99GIiHqE20r+nLAdrgGnZ4x/9M=
-----END SIGNATURE-----

The issue is that this file contains one partial descriptor and another full descriptor without a @type annotation. Onionoo doesn't know how to process it. However, when we attempt to process this descriptor, we abort the entire import process. Even worse, when aborting the import process we're letting the descriptor reader thread continue reading descriptors until its queue runs full and it waits for us to accept more parsed descriptors. The result is that the Onionoo process, which ran in --single-run mode, did not finish within a few days until I killed it. Oh wow.

Quick fix: Skip the bad descriptor, that is, continue; rather than break;.

Longer-term fix: Come up with better rules how to handle bad input data, which is somewhat related to #19834.

Follow-up question: What went wrong that CollecTor produced this file?

Child Tickets

Change History (11)

comment:1 Changed 3 years ago by iwakeh

Parent ID: #20548

Depends on parent #20548.

comment:2 Changed 2 years ago by karsten

Summary: Don't let a single bad archived descriptor break the importSkip bad archived descriptors rather than aborting the entire import

Just closed #21370 as near-duplicate. FWIW, the suggestion to skip the bad descriptor rather than aborting the import refers to this line.

comment:3 Changed 2 years ago by karsten

Keywords: metrics-2018 added

comment:4 Changed 2 years ago by karsten

Keywords: metrics-2017 added; metrics-2018 removed

comment:5 Changed 2 years ago by karsten

Owner: set to metrics-team
Status: newassigned

comment:6 Changed 2 years ago by karsten

Status: assignedneeds_review

I just ran another test with the broken descriptor above and found that it even gets us an IllegalStateException:

Exception in thread "main" java.lang.IllegalStateException: Operation is not permitted before finishing to read.
	at org.torproject.descriptor.impl.DescriptorReaderImpl.getExcludedFiles(DescriptorReaderImpl.java:64)
	at org.torproject.onionoo.updater.DescriptorQueue.writeHistoryFile(DescriptorQueue.java:113)
	at org.torproject.onionoo.updater.DescriptorSource.readArchivedDescriptors(DescriptorSource.java:182)
	at org.torproject.onionoo.updater.DescriptorSource.readDescriptors(DescriptorSource.java:84)
	at org.torproject.onionoo.cron.Main.updateStatuses(Main.java:179)
	at org.torproject.onionoo.cron.Main.run(Main.java:128)
	at org.torproject.onionoo.cron.Main.runOrScheduleExecutions(Main.java:102)
	at org.torproject.onionoo.cron.Main.main(Main.java:34)

Eek. Let's do the quick fix now and skip any bad descriptors that we run into. Please review my task-20412 branch.

This quick fix may even be the longer-term fix, because we have to accept that invalid/unparseable descriptors may exist in the archives (as much as they exist in CollecTor's recent descriptors, where we simply skip them).

The follow-up question, what went wrong that CollecTor produced this file, is going to be answered in #21087.

I'd say as soon as the quick fix is merged, there's nothing else to do here.

comment:7 Changed 2 years ago by iwakeh

Reviewer: iwakeh

comment:8 Changed 2 years ago by iwakeh

Status: needs_reviewmerge_ready

The fix looks ok and ready for merge.
(Was the initial implementation just chance or was there a reason for halting the import?)

I'm wondering, if the actual issue isn't in metrics-lib.
Shouldn't there be a method finishReading that halts the reading process on client request? Clients shouldn't be forced to read all available descriptors. Afaik, the ISE is just due to calling the statistics methods before all was read. New ticket for metrics-lib?

comment:9 in reply to:  8 ; Changed 2 years ago by karsten

Replying to iwakeh:

The fix looks ok and ready for merge.

Will do! Thanks for checking!

(Was the initial implementation just chance or was there a reason for halting the import?)

Hmm? Do you mean the initial import? If so, I guess it worked out, because we only started including unparseable descriptors in CollecTor's output a few years ago.

I'm wondering, if the actual issue isn't in metrics-lib.
Shouldn't there be a method finishReading that halts the reading process on client request? Clients shouldn't be forced to read all available descriptors. Afaik, the ISE is just due to calling the statistics methods before all was read. New ticket for metrics-lib?

Yes, that's another issue that we should fix. To be clear, this fix here is also a valid one: we shouldn't stop the import just because we ran into a single unparseable descriptor.

But what you point out is another issue: we shouldn't be forced to read all available descriptors. I'm not certain what the best fix is there. Maybe a finishReading or abortReading might work. But that's something that we can discuss on the new ticket. Can you open one? Thanks!

comment:10 in reply to:  9 Changed 2 years ago by iwakeh

Replying to karsten:

Replying to iwakeh:

The fix looks ok and ready for merge.

Will do! Thanks for checking!

(Was the initial implementation just chance or was there a reason for halting the import?)

Hmm? Do you mean the initial import? If so, I guess it worked out, because we only started including unparseable descriptors in CollecTor's output a few years ago.

True.

I'm wondering, if the actual issue isn't in metrics-lib.
Shouldn't there be a method finishReading that halts the reading process on client request? Clients shouldn't be forced to read all available descriptors. Afaik, the ISE is just due to calling the statistics methods before all was read. New ticket for metrics-lib?

Yes, that's another issue that we should fix. To be clear, this fix here is also a valid one: we shouldn't stop the import just because we ran into a single unparseable descriptor.

Yes, my question above was an aside; not directed at the solution in this ticket.

But what you point out is another issue: we shouldn't be forced to read all available descriptors. I'm not certain what the best fix is there. Maybe a finishReading or abortReading might work. But that's something that we can discuss on the new ticket. Can you open one? Thanks!

See #24166.

comment:11 Changed 2 years ago by karsten

Resolution: fixed
Status: merge_readyclosed

Merged! Looks like all remaining work will be done on #24166. Closing. Thanks!

Note: See TracTickets for help on using tickets.