Stem's DescriptorReader should provide an option to provide statuses vs. status entries

added component::archived/stem owner::atagar priority::medium resolution::implemented status::closed type::enhancement labels

Hi Karsten. I just pushed something that should make everyone happy...

https://gitweb.torproject.org/stem.git/commitdiff/ea0b73a5aa221fadafc2ba718a0ef42e151e5ad6

The DescriptorReader and parse_file() now have a 'document_handler' argument that has three options:

give me router status entries
give me a document with the router status entries
give me a document without reading the router status entries

https://stem.torproject.org/api/descriptor/descriptor.html#stem.descriptor.init.DocumentHandler

To use this simply provide one of the enum values. For instance...

from stem.descriptor import parse_file, DocumentHandler

with open('/path/to/my/cached-consensus') as document_file:
  document = next(parse_file(document_file, "network-status-consensus-3 1.0", document_handler = DocumentHandler.DOCUMENT))
  print "document version %i, had %i routers" % (document.version, len(document.routers))

The 'next()' call is because parse_file() gives you an iterator, in this case containing a single value that's a NetworkStatusDocumentV3 instance.

Feel free to reopen if this isn't what you wanted.

The alternative, to iterate over status entries and look at every referenced status document to see if I saw that before or not, seems complicated.

Not really. The documents all had the same reference so you could have simply kept a set...

seen_documents = set()

for entry in my_descriptor_reader:
  if not entry.document in seen_documents:
    seen_documents.add(entry.document)

    ... do stuff...

It probably doesn't even work for bandwidth weights which are parsed after the status entries.

As mentioned in our email exchange this is wrong. It reads the header and footer, then the router status entries in the middle.

Cheers! -Damian

Trac:
Status: new to closed
Resolution: N/A to implemented

Looks awesome! I'm mostly interested in the ability to use DescriptorReader with the new document handler. Here's what I did and what worked just fine:

from stem.descriptor import DocumentHandler
from stem.descriptor.reader import DescriptorReader

with DescriptorReader('in/consensuses-2013-01/',
    document_handler=DocumentHandler.DOCUMENT) as reader:
  for document in reader:
    print "document version %i, had %i routers" % (
        document.version, len(document.routers))

Thanks!

closed

Stem's DescriptorReader should provide an option to provide statuses vs. status entries

Child items 0

Activity