Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#13774 closed enhancement (implemented)

Mention pickling in "Mirror Mirror on the Wall"?

Reported by: mmcc Owned by: atagar
Priority: Very Low Milestone:
Component: Archived/Stem Version:
Severity: Keywords: Stem, Python, pickle, docs
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


This is a very minor suggestion, but it might be useful to mention the Python's pickle package when discussing stem.descriptor.remote.DescriptorDownloader() in the doc page "Mirror Mirror on the Wall". As you likely know, this package can save just about any variable to a file in a one-liner, and reload it any time in the future with another one-liner:

It was the solution I used when considering how to get descriptors for research. Mentioning it in the docs would probably prevent some lazy or inexperienced Python users from putting a lot of load on directory authorities by constantly downloading descriptors.

Child Tickets

Change History (4)

comment:1 Changed 6 years ago by atagar

Hi mmcc. The purpose of the DescriptorDownloader is to get the current descriptors. As for saving them to disk and reloading them personally I'd just dump the descriptor...

with open('/path/to/persist/it', 'w') as dump_file:

Then assuming that it's, say, a server descriptor you'd load it like...

from stem.descriptor import parse_file

with open('/path/to/persist/it') as dump_file:
  my_descriptor = parse_file(dump_file, 'server-descriptor 1.0')

(this is just off the top of my head - no promises the above doesn't have a typo)

I suppose pickling might have an advantage in terms of space or read time (... or could be worse in those regards - would need some testing), but regardless I'd trust the above a tad more.

All that said, adding a section on persisting and reloading descriptors sounds like a fine idea. Let me know if you can think of a compelling reason to do pickling rather than the above.

comment:2 Changed 6 years ago by mmcc

Hi, atagar. I was under the impression that Tor clients don't download full descriptors anymore, right? I was just thinking of a dead simple way to store all descriptors at once if they're going to be used repeatedly in a script that can use data that's a few minutes/hours old. Especially with dynamic languages like Python, one often ends up running scripts a bunch of times, and it would be too bad if people were downloading >6 MB of descriptor every time they tested something.

I wasn't aware of how versatile parse_file() is, though. That's probably the best solution.

Would it be worth it to briefly mention the ability to store the result of the download using parse_file() in the DescriptorDownloader section? It's definitely not a big deal either way.

comment:3 Changed 6 years ago by atagar

Resolution: implemented
Status: newclosed

comment:4 Changed 6 years ago by mmcc

@atagar: Thanks! That looks great.

Note: See TracTickets for help on using tickets.