Opened 4 years ago

Closed 4 years ago

#8815 closed defect (fixed)

Stem's DescriptorReader should handle relative paths in processed files when given a target with a relative path

Reported by: wfn Owned by: atagar
Priority: Low Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords:
Cc: atagar@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


A bugfix for DescriptorReader._handle_file() when (one of the) target(s) descriptor directory is given by a relative path. Need to make sure it is an absolute path when comparing to the (always absolute) paths in _processed_files. Please find the linked commit and attached git diff.

A (probably unnecessarily) longer explanation: when stem.descriptor.reader.DescriptorReader is initialized with a relative path for a target, e.g.:

from stem.descriptor.reader import DescriptorReader
reader = DescriptorReader(['server-descriptors'], persistence_path='./used_desc')

The DescriptorReader._handle_file() method (which is used when the reader is accessed as an iterator, etc.) will skip over the loaded _processed_files, because the check for a given file (as 'target', which will be a relative path) will mismatch the one in the processed files dictionary (as '_processed_files', where the paths are always absolute) - stem/descriptor/, line 462, which attempts to get the 'previously last used' timestamp for a given target file:

last_used = self._processed_files.get(target)

Here, 'target' would in our example something of the following kind:


However in _processed_files (and in the 'used_desc' file), the corresponding key would be e.g.


We need to make 'target' always be an absolute path to avoid this kind of issue, and also to make sure that our 'new_processed_files' (to be used when e.g. the iterator is to be called again, i.e. when e.g. we want to re-iterate over our reader to see if anything new came up) also stores absolute paths.

Here is a link to a commit that makes sure the relevant paths are always absolute:

Ran Stem unit tests incl. for just in case, all good.

Attached please also find a sample script which makes use of this functionality by supplying a relative path to DescriptorReader, just in case. (I rsync'd 'relay-descriptors' in 'recent' for my Stem experiments.) See attached sample_output.txt

I'm also attaching a git diff output (git diff 1773ebaab470206653ce6d84c3ef1276f81c5d0a , last commit in just in case.

Child Tickets

Attachments (3)

wfn_relative_path.diff (620 bytes) - added by wfn 4 years ago. (645 bytes) - added by wfn 4 years ago.
sample_output.txt (1.0 KB) - added by wfn 4 years ago.
Sample output from

Download all attachments as: .zip

Change History (5)

Changed 4 years ago by wfn

Changed 4 years ago by wfn

Changed 4 years ago by wfn

Sample output from

comment:1 Changed 4 years ago by wfn

Hah, serves me right for being evil - in sample_output.txt, reported file count increases by 1 after creating a new bogus descriptor. I mistyped commands in my shell, so figured I'd just copy-paste the lines in the editor - kinda evil.

For what it's worth, the file count does indeed increase to 31748 in the script's output when I run it, meaning that the new file was successfully added to and subsequently read from used_desc.

/too much text for such a small thing, oh well..

comment:2 Changed 4 years ago by atagar

  • Resolution set to fixed
  • Status changed from new to closed

Hi wfn. Sorry about the delay and thanks for the patch! I've moved the normalization of targets to be absolute paths up into the constructor so the rest of the module (and skip listeners) can expect absolute paths. Thanks for the catch!

Note: See TracTickets for help on using tickets.