Stem getconf cache doesn't clear for CONF_CHANGED events; probably should set value

added component::archived/stem controller owner::dmr priority::medium resolution::fixed reviewer::atagar severity::normal status::closed type::defect labels

Here's the code as of [[https://gitweb.torproject.org/stem.git/tree/stem/control.py?id=82b22046f40f49579ff37c4247db50050334998a#n1071|82b22046f40f49579ff37c4247db50050334998a]]:

    def _confchanged_listener(event):
      if self.is_caching_enabled():
        self._set_cache(dict((k, None) for k in event.config), 'getconf')

        # ...

    self.add_event_listener(_confchanged_listener, EventType.CONF_CHANGED)

This code actually usually won't clear the cache when it intends to. The cache key is lower()ed when set in set_options() ([link]):

      if self.is_caching_enabled():
        to_cache = {}

        for param, value in params:
          param = param.lower()

          # ...

          to_cache[param] = value

          # ...

        # reset any getinfo parameters that can be changed by a SETCONF

        self._set_cache(dict([(k.lower(), None) for k in CACHEABLE_GETINFO_PARAMS_UNTIL_SETCONF]), 'getinfo')
        self._set_cache(None, 'listeners')

        self._set_cache(to_cache, 'getconf')
        self._set_cache({'get_custom_options': None})

... but we can't guarantee that the event we receive will have a given case. In my tests, it was always the canonical torrc case, but that's not concretely specified in the control spec ([keywords are case-sensitive")].

At first, this looks like a simple change to fix (use k.lower()), but when we consider what happens when a config options is set, we see it's not so straightforward - the cache gets cleared.

Here's code for an adhoc test:

# run this example interactively (don't copy/paste everything) in tor-prompt
def print_getconf_cache_contents():
  global controller
  
  with controller._cache_lock:
    cache_keys = [k for k in controller._request_cache.keys() if k.startswith('getconf.')]
    if cache_keys:
      print("printing cache")
      for k in cache_keys:
        value = controller._request_cache[k]
        print("config cache %s has value: %s" % (k, value))
    else:
      print("cache is empty")

def print_conf_event_and_cache(event):
  # blank line to make prettier at a prompt
  print('')
  
  for k,v in event.config.items():
    print("Config %s set to: %s" % (k, v))
  
  print_getconf_cache_contents()

# in tor-prompt, controller is simply available for us to use
controller.add_event_listener(print_conf_event_and_cache, stem.control.EventType.CONF_CHANGED)

# for example simplicity WAIT for the output between each set_conf() call
controller.set_conf('ContactInfo', 'hi there')
# ^ generate output via our listener
# with k.lower() implementation, note that the cache doesn't have 'getconf.contactinfo'

# simulate a SETCONF from another controller
SETCONF ContactInfo="hello back"
# ^ generate output via our listener
# with unchanged implementation, note that the cache is still set to "hi there"
# with k.lower() implementation, note that the cache doesn't have 'getconf.contactinfo'

# simulate a SETCONF from another controller
SETCONF ContactInfo=
# ^ generate output via our listener
# with unchanged implementation, note that the cache is still set to "hi there"
# with k.lower() implementation, note that the cache doesn't have 'getconf.contactinfo'

# clean up
controller.remove_event_listener(print_conf_event_and_cache)

To do a simple change to the implementation, change this line in _confchanged_listener() from:

        self._set_cache(dict((k, None) for k in event.config), 'getconf')

to:

        self._set_cache(dict((k.lower(), None) for k in event.config), 'getconf')

(adding .lower() after k)

You can see that the cache doesn't clear, but when it does with a simple .lower() change... it doesn't really retain data for long (because the set_conf() causes a CONF_CHANGED, which clears the cache).

So I propose that stem changes its cache behavior to //set// the 'getconf' cache //only// within _confchanged_listener(), to the value(s) provided by the event. It seems superfluous to do it within set_options() if it's going to be set soon after.

However, one advantage I could see to setting the cache in set_options() is so that code like this would work intuitively without issuing a GETCONF again:

SETCONF ContactInfo=me

def some_actions_together():
  global controller
  
  contact_1 = controller.get_conf('ContactInfo')
  controller.set_conf('ContactInfo', 'someone else')
  contact_2 = controller.get_conf('ContactInfo')
  print(contact_2)

some_actions_together()

But code like this //currently doesn't// work intuitively because load_conf() doesn't have the configuration values [in its response]:

SETCONF ContactInfo=me

def some_actions_together():
  global controller
  
  contact_1 = controller.get_conf('ContactInfo')
  
  new_config = controller.get_info('config-text').replace("ContactInfo me", "ContactInfo someone else")
  controller.load_conf(new_config)
  
  contact_2 = controller.get_conf('ContactInfo')

some_actions_together()

(that could be improved if load_conf() clears the 'getconf' cache.)

No matter what we pick, I think there's potential for race conditions in complex scenarios...

atagar: thoughts?

atagar: FYI, I'm happy to take on this ticket, but I won't be able to get to it for another week or so.

Hi dmr. Apologies, this is pretty long but if this concerns the #25423 (closed) caching bug then I expect this is handled by...

https://gitweb.torproject.org/stem.git/commit/?id=331406cb5ffe8e4e216e6d1396664f34d6c3cb28

If this is still an issue then feel free to reopen.

Trac:
Resolution: N/A to fixed
Status: new to closed

Replying to atagar:

If this is still an issue then feel free to reopen.

Re-opening. atagar and I discussed this over IRC.

Some points from the discussion below...

Replying to dmr:

So I propose that stem changes its cache behavior to //set// the 'getconf' cache //only// within _confchanged_listener(), to the value(s) provided by the event. It seems superfluous to do it within set_options() if it's going to be set soon after.

_confchanged_listener() will set the cache to value(s) provided in the event
caching will be removed from set_options()

However, one advantage I could see to setting the cache in set_options() is so that code like this would work intuitively without issuing a GETCONF again: {{{ ... }}}

GETCONF is cheap/fast; we don't need to worry about that

But code like this //currently doesn't// work intuitively because load_conf() doesn't have the configuration values [in its response]: {{{ ... }}} (that could be improved if load_conf() clears the 'getconf' cache.)

to be filed as separate issue
load_conf() should clear the entire cache (not just 'getconf' cache)

Trac:
Resolution: fixed to N/A
Status: closed to reopened

Assigning self with atagar as reviewer.

Trac:
Status: reopened to assigned
Reviewer: N/A to atagar
Owner: atagar to dmr

Hi dmr, the bit about 'as reviewer' caught my eye. Checked your repo but not spotting anything. Is there something for me to review?

Ok, pushed changes; please review :). And sorry for the confusion before! I thought I'd assign you as a reviewer before the needs_review step.

== git request-pull output I'm planning to submit patch notifications in this boilerplate format now, to reduce overhead for prepping. Let me know if I should change it! git request-pull master https://github.com/dmr-x/stem 25821-getconf-cache:

The following changes since commit 38ed5bca90ad6a71b9e45d8916030ea972513b21:

  Reword changelog description of recent CONF_CHANGED fix (2018-05-17 10:29:15 -0700)

are available in the git repository at:

  https://github.com/dmr-x/stem 25821-getconf-cache

for you to fetch changes up to f03f001d4b412d597e766fae4734b863ce4c938d:

  Changelog entry for configuration-changing cache improvements (#25821) (2018-05-18 17:22:27 +0000)

----------------------------------------------------------------
Me (6):
      Refactor dropping uncacheable GETCONF params into _set_cache()
      Refactor cache invalidation resulting from config change into new method
      Fix bug in cache invalidation for CONF_CHANGED
      Save value rather than invalidate getconf cache in CONF_CHANGED events
      Drop cache values for set_options() params instead of setting cache
      Changelog entry for configuration-changing cache improvements (#25821)

 docs/change_log.rst |  1 +
 stem/control.py     | 71 ++++++++++++++++++++++++++++++++++++++++++++---------------------------
 2 files changed, 45 insertions(+), 27 deletions(-)

== Commits background/discussion Here's a bit more background on the commits; some refactors in there, that also probably fix bugs.

==== Refactor 1: into _set_cache() - commit 9e517998c59a437e7050ebfb685066d249abb21e I did this refactor because I wanted new code to be DRY and not rely on every implementer having to remember to include it. It's a bit unclean - _set_cache() previously was namespace-agnostic and now has namespace-specific behavior, but I think it's the best overall solution.

I think this refactor //also// fixed a bug: set_options() calls self._set_cache(to_cache, 'getconf') without removing the UNCACHEABLE_GETCONF_PARAMS from to_cache first. So I believe moving the code here fixed a subtle bug in that case.

But the aforementioned code in set_options() has been removed from another commit, so it's //moot// now.

==== Refactor 2: into new _confchanged_cache_invalidation() - commit fdd34f81b986fdb21e782993ca380d5e65036494 Upon reviewing set_options(), realized that there are a lot of cache-invalidation things that occur when a config option changes. As far as I can tell, these are all applicable to our _confchanged_listener() too.

So this "refactor" probably fixes a few bugs, too.

==== Bug fix - intermediate commit: 17782bc44f9afbb7dea49085afad29a936b7a1a3 This fixes a bug, but the code gets replaced thereafter. Separated out just for posterity.

==== Use change and unset attributes - commit fa1614d20ab57c57e55fcbdb39bece229026bc2b The bulk of the functional work - now CONF_CHANGED should work well for single- or multiple-controller cases.

==== Drop cache, set_options() - commit 1eb527579d83ac6d2bc5c15eceeaabe0fa5f927b It also seems important to drop existing cache values to reduce the timing window for race conditions, in set_options(). Race conditions still can exist; discussed later.

==== Changelog only - commit f03f001d4b412d597e766fae4734b863ce4c938d //No further explanation needed.//

== Other notes Some other notes about the changes.

==== Lack of testing Overall I didn't get to writing good tests here. I'd like to use some of the test code from #25423 (closed) that never got merged in. Just didn't get a chance to do that. The cache overall doesn't have much in the way of testing.

==== Potential for race conditions There's still a lot of potential here for race conditions. //(I don't think we can fully eliminate that, by the very nature of the controlport architecture.)//

I haven't fully thought about it, but a case that pops out to me is non-atomicity of values because of the caching layer. Ideally we should change the API to allow for consumers to request that calling get_conf_map() retrieves all values fresh, so that the set is atomic - for consumers that need/care about that. Perhaps as get_conf_map(self, params, default = UNDEFINED, multiple = True, force_request=False).

Trac:
Status: assigned to needs_review

Might as well add a keyword.

Trac:
Keywords: N/A deleted, controller added

Thanks Dave, looks good! Merged. Honestly over time our Controller's caching code has grown more complicated than I'd like. I'll give thought to simplifying it at some point but no reason that needs to be part of this. :)

Trac:
Resolution: N/A to fixed
Status: needs_review to closed

Hey Damian, the merge //functionally// looks good, but the commit message mentions the wrong ticket (#17873 (moved)); see [[https://gitweb.torproject.org/stem.git/commit/?id=fd5f1cd120fce96c8a4a58de2ea277b4337e899c|gitweb]].

Not sure how you want to handle this, if at all. My commits referenced the Trac number in them and the changelog has it too, so the history is still fairly clear. Maybe add a git commit --allow-empty commit afterwards with just a message?

closed

Stem getconf cache doesn't clear for CONF_CHANGED events; probably should set value

Child items ...

Activity