Add support for webstats tarballs
I started creating tarballs containing .xz
-compressed webstats files. When I attempt to feed them into DescriptorReader
, it fails with an exception like the following:
Cannot parse descriptor file ’in/webstats-2016-01.tar’.
��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
at org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
at org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
at java.lang.Thread.run(Thread.java:745)}
The tarballs I created contain files as follows:
$ tar tf webstats-2016-01.tar
[...]
webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz
When I extract tarball files before reading them with DescriptorReader
, this works just fine.
I think that the issue is that DescriptorParserImpl#detectTypeAndParseDescriptors()
looks at descriptorFile
rather than fileName
to obtain the file name. The effect is that it learns the tarball file name, rather than the file name of the contained log file:
- if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
+ if (fileName.contains(LogDescriptorImpl.MARKER)
The above is untested and probably insufficient. It's just supposed to start the bug hunting. Priority is medium, because we can just extract tarballs for now. But it's a bug, and it may confuse users as soon as we provide these tarballs and no working code to process them.
This is also related to #22695 (moved).
Assigning to iwakeh who said they'd like to grab it.