Stem's DescriptorReader misses 10% of descriptors in tarballs
When I parse server descriptors in a metrics tarball and in extracted form using DescriptorReader, I get different numbers of descriptors:
from stem.descriptor.reader import DescriptorReader
descriptors = 0
with DescriptorReader('server-descriptors-2012-12.tar') as reader:
for descriptor in reader:
descriptors += 1
print "%d descriptors in tarball." % (descriptors, )
descriptors = 0
with DescriptorReader('server-descriptors-2012-12/') as reader:
for descriptor in reader:
descriptors += 1
print "%d descriptors in extracted directory." % (descriptors, )
250048 descriptors in tarball. 279042 descriptors in extracted directory.
What happens to the rest?
This bug means that most of #7828 (closed) must be re-run. :(