Controller's cache isn't thread safe

added component::archived/stem controller easy owner::atagar priority::medium resolution::fixed status::closed type::defect labels

I've tried to make cache requests thread safe. Please take a look here:

https://github.com/axitkhurana/stem/commit/7384db06fe1167db3978262e5dfbac4720d689a6

Two functions created:

for reads: _get_cache for writes: _update_cache

Trac:
Username: axitkhurana
Status: new to needs_review

Hi Akshit. I'm presently at work so I was only able to give this a quick look, but we should definitely add a test for the CONF_CHANGED fix. The change you wrote actually looks to be a no-op due to whitespace (the add_event_listener() call is indented too far, so it's actually in the _conf_changed_listener block).

No tests necessary for the concurrency fix (that would be hard to repro). Let me know when the CONF_CHANGED addition has a test and I'll give it a more through review.

Thanks for the fixes! -Damian

PS. Minor nitpick, but please start your commit messages with a capital letter. Also, it would be nice if they mentioned the ticket that they're resolving.

Trac:
Status: needs_review to needs_revision

Thanks for taking a quick look, the CONF_CHANGED fix is failing one of the integration tests, I'll fix it, add a test and post it on it's ticket: https://trac.torproject.org/7713 . Sorry about the extra indentation, my vimrc was configured for 4 spaces per indent then.

For the concurrency fix, my earlier commit was breaking integration tests, but now they are passing. Also, capitalized the first letter of commit messages and added ticket link. Please take a look here: https://github.com/axitkhurana/stem/commit/4e0ec8cf702f5db6449d49cccc262fb8b6b5f97f

Thanks :) Akshit

Trac:
Username: axitkhurana
Status: needs_revision to needs_review

Great. Mostly just a few nitpicks.

+  def _get_cache(self, params, func=None):
+    """
+    Reads cached items

This, and the comment for _update_cache(), aren't terribly descriptive. It doesn't need to be much, but please at least make it a complete sentence with punctuation. ;)

+      for param in params:
+        if func:
+          cache_key = "%s.%s" % (func, param)
+        else:
+          cache_key = param
+        if cache_key in self._request_cache:
+          cached_values[param] = self._request_cache[cache_key]
+      return cached_values

Please don't scrunch code like this. Have an empty line before the second 'if' and 'return' clauses. This will help readability.

+  def _update_cache(self, params, strtolist=True):
+    """
+    Updates cache
+
+    :param dict params: **dict** of 'cache_key => value' pairs to be cached
+    """

'strtolist' isn't mentioned in the pydocs. Maybe the caller should be doing the list wrapper?

+        if cache_key.lower().endswith("exitpolicy") and "exit_policy" in self._request_cache:
+          del self._request_cache["exit_policy"]

This endswith() check might catch things that it isn't supposed to (ie, a cache key like 'something.new.exitpolicy').

Stem has a style checker. Please always run 'python run_tests.py --all' prior to submitting patches. The style checker requires pep8...

atagar@morrigan:~/Desktop/stem$ ./run_tests.py --style
======================================================================
                             INITIALISING                             
======================================================================

Performing startup activities...
  checking for orphaned .pyc files... done

STYLE ISSUES
* stem/control.py
  line 1326 - E128 continuation line under-indented for visual indent

TESTING PASSED (39 seconds)

Cheers! -Damian

Trac:
Status: needs_review to needs_revision

Updated the doc strings, I hope they are OK now.
Added empty line after if/else statements, long docstrings
Yeah, caller should do a list wrapper. Fixed.
Have to pass func argument to fix the endswith() check. Shouldn't be a problem now.
Oh, sorry about the pep8 errors, didn't realize it went through.
Also, renamed _update_cache to _set_cache.

Code: https://github.com/axitkhurana/stem/commit/7835e08f5122bf3215d36e68a4b98e015d953ac9

Sorry for the delay, was working on my final year project. Thanks for taking the time to review, will try to avoid these in future :)

Cheers

Trac:
Username: axitkhurana
Status: needs_revision to needs_review

Looks better! I tend to be very picking during code reviews so it isn't unusual for patches to take several iterations. ;)

-      cache_key = "getinfo.%s" % param.lower()
+    reply = self._get_cache(params, "getinfo")

We seem to be losing case insensitivity for these cache lookups (so get_info('version') followed by get_info('VERSION') won't be a cache hit).

+      else:
+        cache = self._get_cache(["version"])
+        if cache:
+          version = cache["version"]
+        else:
+          version = stem.version.Version(self.get_info("version"))
+          self._set_cache({"version": version})
 
-      return self._request_cache["version"]
+      return version

Looks good, just another spacing nit-pick. ;)

Also, this could be a little better without the 'cache' temporary parameter...

else:
  version = self._get_cache(["version"]).get("version", None)

  if not version:
    version = stem.version.Version(self.get_info("version"))
    self._set_cache({"version": version})

  return version

-        if not "exit_policy" in self._request_cache:
+        config_policy = self._get_cache(["exit_policy"])
+        if not "exit_policy" in config_policy:
           policy = []
 
           if self.get_conf("ExitPolicyRejectPrivate") == "1":
@@ -872,10 +879,10 @@ class Controller(BaseController):
             policy += policy_line.split(",")
 
           policy += self.get_info("exit-policy/default").split(",")
+          config_policy = stem.exit_policy.get_config_policy(policy)
+          self._set_cache({"exit_policy": config_policy})
 
-          self._request_cache["exit_policy"] = stem.exit_policy.get_config_policy(policy)
-
-        return self._request_cache["exit_policy"]
+        return config_policy

It looks to me like when you have a cache hit you're returning a dict of {"exit_policy": exit_policy} rather than the exiting policy itself. Maybe our tests need to be expanded to cover this. :/

Between this and the get_version() cache usage I'm starting to wonder if we actually want two methods for querying our cache, one for looking up a single key and another for looking up multiple. Thoughts?

Trac:
Status: needs_review to needs_revision

Thanks for reviewing :)

Moved all case lowering to cache functions (set/get), this should fix the cache lookup problem.
Yeah, .get version is cleaner, changed
get_exit_policy returns the value in case of cache hit now.

commit: https://github.com/axitkhurana/stem/commit/b4753ac54c40d992212e1428c19770061e5d1397

I've added another function for single key, similar to get_conf_map and get_conf. Is this the best way of handling single, multiple params?

commit: https://github.com/axitkhurana/stem/commit/2346c2464e66edfde147c4844ddd9286d297eb69

Cheers, Akshit

Trac:
Username: axitkhurana
Status: needs_revision to needs_review

Great! Just to set expectations I'm visiting with family this weekend, so I'll try to get to this early next week.

Cheers! -Damian

Hi Akshit. I'm really sorry about the delay. Most mornings what limited time I have for tor stuff has been gobbled up by GSoC correspondence.

+        cache_params = {}

Lets rename 'cache_params' to 'to_cache' so its purpose is a little clearer.

     # Config options are case insensitive and don't contain whitespace. Using
     # strip so the following check will catch whitespace-only params.
 
-    param = param.lower().strip()
+    param = param.strip()

I'd rather if we didn't move case insensitivity into the cache lookup helper. It's quite possible that we'll want case sensitive lookups later, the caveat that I brought up was that GETCONF and GETINFO queries are case insensitive. This is really the only gotcha I see (ready to push once this is changed).

I've added another function for single key, similar to get_conf_map and get_conf. Is this the best way of handling single, multiple params?

Yup, that looks perfect! I was going to suggest an isinstance() check so we could overload the method but on reflection I like your solution better.

Trac:
Status: needs_review to needs_revision

I'd rather if we didn't move case insensitivity into the cache lookup helper. It's quite possible that we'll want case sensitive lookups later, the caveat that I brought up was that GETCONF and GETINFO queries are case insensitive. This is really the only gotcha I see (ready to push once this is changed).

How about adding an extra parameter for case insensitivity in get and set cache functions, wouldn't that handle both case sensitive and insensitive lookups, remove duplicate code?

It would look something like this:

def _set_cache(self, params, case_insensitive=False, func=None):
  ....
    if case_insensitive:
      param = param.lower()
  ....

Trac:
Username: axitkhurana

Why not simply have the functions (get_info() and get_conf()) call lower() on their parameters?

Sorry for the delay, got busy with final exams.

Moved case insensitivity back to get_info and get_conf
- Used _case_insensitive_lookup to get original cached params back
Rebased over current master resolving conflicts like 'Except as' instead of comma

Commits:

I hope this fixes everything :)

Trac:
Username: axitkhurana
Status: needs_revision to needs_review

Thanks Akshit! Patch pushed with some revisions...

reply = {}

Unnecessary, "reply = {}" was already assigned above.

for key in cached_results.keys():

When you iterate over a dict it's by the keys so there's no need to call ".keys()" here.

>>> foo = {'name': 'damian', 'job': 'software engineer'}
>>> for field in foo:
...   print field
... 
job
name

user_expected_key = _case_insensitive_lookup(params, key, key)

We know that params has the key. If it doesn't then that probably indicates a stem bug so it would be good for this to raise an exception so we can more easily discover it.

The get_conf_map() change puzzles me a bit...

-        for key, value in response.entries.items():
-          self._request_cache["getconf.%s" % key.lower()] = value
+        to_cache = {}
+        for param, value in response.entries.items():
+          param = param.lower()
 
+          if isinstance(value, (bytes, unicode)):
+            value = [value]
+          to_cache[param] = value
+
+        self._set_cache(to_cache, "getconf")

In the old version it caches the response.entries values while in the new version it assumes that response.entries could have either list or str values and normalizes as lists. GetConfResponse's pydocs say that the values are always lists and by my read of the code that certainly seems to be the case. Narrowed this down to just...

if self.is_caching_enabled():
  to_cache = dict((k.lower(), v) for k, v in response.entries.items())
  self._set_cache(to_cache, "getconf")

if key.lower() == "exitpolicy" and "exit_policy" in self._request_cache: del self._request_cache["exit_policy"]

Moving this to _set_cache() seems unsafe to me. This means that if we, say, call 'GETINFO exitpolicy' it'll wipe our 'exit_policy' cache entry. Moved this back up to where we call SETCONF.

The _confchanged_listener also didn't account for the new thread safe methods. The new methods make it a lot cleaner. :)

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

Thanks for the revisions and for being so patient with me. :)

Trac:
Username: axitkhurana

closed

Controller's cache isn't thread safe

Child items ...

Activity