Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#8607 closed defect (fixed)

Controller's cache isn't thread safe

Reported by: atagar Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords: controller easy
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The Controller's _request_cache attribute isn't used in a thread safe manner. Once upon a time we only added to it so this wasn't an issue, but now that we're doing cache invalidation we need to be more careful. For instance...

# check for cached results
for param in list(lookup_params):
  cache_key = "getconf.%s" % param.lower()

  if cache_key in self._request_cache:
    reply[param] = self._request_cache[cache_key]
    lookup_params.remove(param)

Querying and changing the cache should be under methods that acquire a cache lock.

Child Tickets

Change History (14)

comment:1 Changed 5 years ago by axitkhurana

Status: newneeds_review

I've tried to make cache requests thread safe. Please take a look here:

https://github.com/axitkhurana/stem/commit/7384db06fe1167db3978262e5dfbac4720d689a6

Two functions created:

for reads: _get_cache
for writes: _update_cache

comment:2 Changed 5 years ago by atagar

Status: needs_reviewneeds_revision

Hi Akshit. I'm presently at work so I was only able to give this a quick look, but we should definitely add a test for the CONF_CHANGED fix. The change you wrote actually looks to be a no-op due to whitespace (the add_event_listener() call is indented too far, so it's actually *in* the _conf_changed_listener block).

No tests necessary for the concurrency fix (that would be hard to repro). Let me know when the CONF_CHANGED addition has a test and I'll give it a more through review.

Thanks for the fixes! -Damian

PS. Minor nitpick, but please start your commit messages with a capital letter. Also, it would be nice if they mentioned the ticket that they're resolving.

comment:3 Changed 5 years ago by axitkhurana

Status: needs_revisionneeds_review

Thanks for taking a quick look, the CONF_CHANGED fix is failing one of the integration tests, I'll fix it, add a test and post it on it's ticket: https://trac.torproject.org/7713 . Sorry about the extra indentation, my vimrc was configured for 4 spaces per indent then.

For the concurrency fix, my earlier commit was breaking integration tests, but now they are passing. Also, capitalized the first letter of commit messages and added ticket link. Please take a look here:
https://github.com/axitkhurana/stem/commit/4e0ec8cf702f5db6449d49cccc262fb8b6b5f97f

Thanks :)
Akshit

comment:4 Changed 5 years ago by atagar

Status: needs_reviewneeds_revision

Great. Mostly just a few nitpicks.

+  def _get_cache(self, params, func=None):
+    """
+    Reads cached items

This, and the comment for _update_cache(), aren't terribly descriptive. It doesn't need to be much, but please at least make it a complete sentence with punctuation. ;)

+      for param in params:
+        if func:
+          cache_key = "%s.%s" % (func, param)
+        else:
+          cache_key = param
+        if cache_key in self._request_cache:
+          cached_values[param] = self._request_cache[cache_key]
+      return cached_values

Please don't scrunch code like this. Have an empty line before the second 'if' and 'return' clauses. This will help readability.

+  def _update_cache(self, params, strtolist=True):
+    """
+    Updates cache
+
+    :param dict params: **dict** of 'cache_key => value' pairs to be cached
+    """

'strtolist' isn't mentioned in the pydocs. Maybe the caller should be doing the list wrapper?

+        if cache_key.lower().endswith("exitpolicy") and "exit_policy" in self._request_cache:
+          del self._request_cache["exit_policy"]

This endswith() check might catch things that it isn't supposed to (ie, a cache key like 'something.new.exitpolicy').

Stem has a style checker. Please always run 'python run_tests.py --all' prior to submitting patches. The style checker requires pep8...

atagar@morrigan:~/Desktop/stem$ ./run_tests.py --style
======================================================================
                             INITIALISING                             
======================================================================

Performing startup activities...
  checking for orphaned .pyc files... done

STYLE ISSUES
* stem/control.py
  line 1326 - E128 continuation line under-indented for visual indent

TESTING PASSED (39 seconds)

Cheers! -Damian

comment:5 Changed 5 years ago by axitkhurana

Status: needs_revisionneeds_review
  • Updated the doc strings, I hope they are OK now.
  • Added empty line after if/else statements, long docstrings
  • Yeah, caller should do a list wrapper. Fixed.
  • Have to pass func argument to fix the endswith() check. Shouldn't be a problem now.
  • Oh, sorry about the pep8 errors, didn't realize it went through.
  • Also, renamed _update_cache to _set_cache.

Code: https://github.com/axitkhurana/stem/commit/7835e08f5122bf3215d36e68a4b98e015d953ac9

Sorry for the delay, was working on my final year project. Thanks for taking the time to review, will try to avoid these in future :)

Cheers

comment:6 Changed 5 years ago by atagar

Status: needs_reviewneeds_revision

Looks better! I tend to be very picking during code reviews so it isn't unusual for patches to take several iterations. ;)

-      cache_key = "getinfo.%s" % param.lower()
+    reply = self._get_cache(params, "getinfo")

We seem to be losing case insensitivity for these cache lookups (so get_info('version') followed by get_info('VERSION') won't be a cache hit).

+      else:
+        cache = self._get_cache(["version"])
+        if cache:
+          version = cache["version"]
+        else:
+          version = stem.version.Version(self.get_info("version"))
+          self._set_cache({"version": version})
 
-      return self._request_cache["version"]
+      return version

Looks good, just another spacing nit-pick. ;)

Also, this could be a little better without the 'cache' temporary parameter...

else:
  version = self._get_cache(["version"]).get("version", None)

  if not version:
    version = stem.version.Version(self.get_info("version"))
    self._set_cache({"version": version})

  return version
-        if not "exit_policy" in self._request_cache:
+        config_policy = self._get_cache(["exit_policy"])
+        if not "exit_policy" in config_policy:
           policy = []
 
           if self.get_conf("ExitPolicyRejectPrivate") == "1":
@@ -872,10 +879,10 @@ class Controller(BaseController):
             policy += policy_line.split(",")
 
           policy += self.get_info("exit-policy/default").split(",")
+          config_policy = stem.exit_policy.get_config_policy(policy)
+          self._set_cache({"exit_policy": config_policy})
 
-          self._request_cache["exit_policy"] = stem.exit_policy.get_config_policy(policy)
-
-        return self._request_cache["exit_policy"]
+        return config_policy

It looks to me like when you have a cache hit you're returning a dict of {"exit_policy": exit_policy} rather than the exiting policy itself. Maybe our tests need to be expanded to cover this. :/

Between this and the get_version() cache usage I'm starting to wonder if we actually want two methods for querying our cache, one for looking up a single key and another for looking up multiple. Thoughts?

comment:7 Changed 5 years ago by axitkhurana

Status: needs_revisionneeds_review

Thanks for reviewing :)

  • Moved all case lowering to cache functions (set/get), this should fix the cache lookup problem.
  • Yeah, .get version is cleaner, changed
  • get_exit_policy returns the value in case of cache hit now.

commit: https://github.com/axitkhurana/stem/commit/b4753ac54c40d992212e1428c19770061e5d1397

I've added another function for single key, similar to get_conf_map and get_conf. Is this the best way of handling single, multiple params?

commit: https://github.com/axitkhurana/stem/commit/2346c2464e66edfde147c4844ddd9286d297eb69

Cheers,
Akshit

comment:8 Changed 5 years ago by atagar

Great! Just to set expectations I'm visiting with family this weekend, so I'll try to get to this early next week.

Cheers! -Damian

comment:9 Changed 4 years ago by atagar

Status: needs_reviewneeds_revision

Hi Akshit. I'm really sorry about the delay. Most mornings what limited time I have for tor stuff has been gobbled up by GSoC correspondence.

+        cache_params = {}

Lets rename 'cache_params' to 'to_cache' so its purpose is a little clearer.

     # Config options are case insensitive and don't contain whitespace. Using
     # strip so the following check will catch whitespace-only params.
 
-    param = param.lower().strip()
+    param = param.strip()

I'd rather if we didn't move case insensitivity into the cache lookup helper. It's quite possible that we'll want case sensitive lookups later, the caveat that I brought up was that GETCONF and GETINFO queries are case insensitive. This is really the only gotcha I see (ready to push once this is changed).

I've added another function for single key, similar to get_conf_map and get_conf. Is this the best way of handling single, multiple params?

Yup, that looks perfect! I was going to suggest an isinstance() check so we could overload the method but on reflection I like your solution better.

comment:10 in reply to:  9 Changed 4 years ago by axitkhurana

I'd rather if we didn't move case insensitivity into the cache lookup helper. It's quite possible that we'll want case sensitive lookups later, the caveat that I brought up was that GETCONF and GETINFO queries are case insensitive. This is really the only gotcha I see (ready to push once this is changed).

How about adding an extra parameter for case insensitivity in get and set cache functions, wouldn't that handle both case sensitive and insensitive lookups, remove duplicate code?

It would look something like this:

def _set_cache(self, params, case_insensitive=False, func=None):
  ....
    if case_insensitive:
      param = param.lower()
  ....

comment:11 Changed 4 years ago by atagar

Why not simply have the functions (get_info() and get_conf()) call lower() on their parameters?

comment:12 Changed 4 years ago by axitkhurana

Status: needs_revisionneeds_review

Sorry for the delay, got busy with final exams.

  • Moved case insensitivity back to get_info and get_conf
    • Used _case_insensitive_lookup to get original cached params back
  • Rebased over current master resolving conflicts like 'Except as' instead of comma

Commits:

  1. cache lock: https://github.com/axitkhurana/stem/commit/5132265177cc2ac95bf5d8b2edac4568c3a43be0
  2. get_cache_map: https://github.com/axitkhurana/stem/commit/7744c92770ff5f3ca54d7d6fcf66bdb593418cf8

I hope this fixes everything :)

comment:13 Changed 4 years ago by atagar

Resolution: fixed
Status: needs_reviewclosed

Thanks Akshit! Patch pushed with some revisions...

reply = {}

Unnecessary, "reply = {}" was already assigned above.

for key in cached_results.keys():

When you iterate over a dict it's by the keys so there's no need to call ".keys()" here.

>>> foo = {'name': 'damian', 'job': 'software engineer'}
>>> for field in foo:
...   print field
... 
job
name

user_expected_key = _case_insensitive_lookup(params, key, key)

We know that params has the key. If it doesn't then that probably indicates a stem bug so it would be good for this to raise an exception so we can more easily discover it.

The get_conf_map() change puzzles me a bit...

-        for key, value in response.entries.items():
-          self._request_cache["getconf.%s" % key.lower()] = value
+        to_cache = {}
+        for param, value in response.entries.items():
+          param = param.lower()
 
+          if isinstance(value, (bytes, unicode)):
+            value = [value]
+          to_cache[param] = value
+
+        self._set_cache(to_cache, "getconf")

In the old version it caches the response.entries values while in the new version it assumes that response.entries could have either list or str values and normalizes as lists. GetConfResponse's pydocs say that the values are always lists and by my read of the code that certainly seems to be the case. Narrowed this down to just...

if self.is_caching_enabled():
  to_cache = dict((k.lower(), v) for k, v in response.entries.items())
  self._set_cache(to_cache, "getconf")

if key.lower() == "exitpolicy" and "exit_policy" in self._request_cache:

del self._request_cacheexit_policy?

Moving this to _set_cache() seems unsafe to me. This means that if we, say, call 'GETINFO exitpolicy' it'll wipe our 'exit_policy' cache entry. Moved this back up to where we call SETCONF.

The _confchanged_listener also didn't account for the new thread safe methods. The new methods make it a lot cleaner. :)

comment:14 Changed 4 years ago by axitkhurana

Thanks for the revisions and for being so patient with me. :)

Note: See TracTickets for help on using tickets.