Opened 8 months ago

Closed 7 months ago

Last modified 7 months ago

#25821 closed defect (fixed)

Stem getconf cache doesn't clear for CONF_CHANGED events; probably should set value

Reported by: dmr Owned by: dmr
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords: controller
Cc: atagar Actual Points:
Parent ID: Points:
Reviewer: atagar Sponsor:

Description

As mentioned in #25423, I found a cache-invalidation bug in stem.control.Controller.

The tl;dr is that the getconf cache currently doesn't get cleared properly by the listener that intends to clear it, but clearing probably isn't what we actually want to do.

Child Tickets

Change History (11)

comment:1 Changed 8 months ago by dmr

Here's the code as of `82b22046f40f49579ff37c4247db50050334998a`:

    def _confchanged_listener(event):
      if self.is_caching_enabled():
        self._set_cache(dict((k, None) for k in event.config), 'getconf')

        # ...

    self.add_event_listener(_confchanged_listener, EventType.CONF_CHANGED)

This code actually usually won't clear the cache when it intends to. The cache key is lower()ed when set in set_options() (code link):

      if self.is_caching_enabled():
        to_cache = {}

        for param, value in params:
          param = param.lower()

          # ...

          to_cache[param] = value

          # ...

        # reset any getinfo parameters that can be changed by a SETCONF

        self._set_cache(dict([(k.lower(), None) for k in CACHEABLE_GETINFO_PARAMS_UNTIL_SETCONF]), 'getinfo')
        self._set_cache(None, 'listeners')

        self._set_cache(to_cache, 'getconf')
        self._set_cache({'get_custom_options': None})

... but we can't guarantee that the event we receive will have a given case. In my tests, it was always the canonical torrc case, but that's not concretely specified in the control spec ("most keywords are case-sensitive").

At first, this looks like a simple change to fix (use k.lower()), but when we consider what happens when a config options is set, we see it's not so straightforward - the cache gets cleared.

Here's code for an adhoc test:

# run this example interactively (don't copy/paste everything) in tor-prompt
def print_getconf_cache_contents():
  global controller
  
  with controller._cache_lock:
    cache_keys = [k for k in controller._request_cache.keys() if k.startswith('getconf.')]
    if cache_keys:
      print("printing cache")
      for k in cache_keys:
        value = controller._request_cache[k]
        print("config cache %s has value: %s" % (k, value))
    else:
      print("cache is empty")

def print_conf_event_and_cache(event):
  # blank line to make prettier at a prompt
  print('')
  
  for k,v in event.config.items():
    print("Config %s set to: %s" % (k, v))
  
  print_getconf_cache_contents()

# in tor-prompt, controller is simply available for us to use
controller.add_event_listener(print_conf_event_and_cache, stem.control.EventType.CONF_CHANGED)

# for example simplicity WAIT for the output between each set_conf() call
controller.set_conf('ContactInfo', 'hi there')
# ^ generate output via our listener
# with k.lower() implementation, note that the cache doesn't have 'getconf.contactinfo'

# simulate a SETCONF from another controller
SETCONF ContactInfo="hello back"
# ^ generate output via our listener
# with unchanged implementation, note that the cache is still set to "hi there"
# with k.lower() implementation, note that the cache doesn't have 'getconf.contactinfo'

# simulate a SETCONF from another controller
SETCONF ContactInfo=
# ^ generate output via our listener
# with unchanged implementation, note that the cache is still set to "hi there"
# with k.lower() implementation, note that the cache doesn't have 'getconf.contactinfo'

# clean up
controller.remove_event_listener(print_conf_event_and_cache)

To do a simple change to the implementation, change this line in _confchanged_listener() from:

        self._set_cache(dict((k, None) for k in event.config), 'getconf')

to:

        self._set_cache(dict((k.lower(), None) for k in event.config), 'getconf')

(adding .lower() after k)

You can see that the cache doesn't clear, but when it does with a simple .lower() change... it doesn't really retain data for long (because the set_conf() causes a CONF_CHANGED, which clears the cache).

comment:2 Changed 8 months ago by dmr

So I propose that stem changes its cache behavior to set the 'getconf' cache only within _confchanged_listener(), to the value(s) provided by the event.
It seems superfluous to do it within set_options() if it's going to be set soon after.

However, one advantage I could see to setting the cache in set_options() is so that code like this would work intuitively without issuing a GETCONF again:

SETCONF ContactInfo=me

def some_actions_together():
  global controller
  
  contact_1 = controller.get_conf('ContactInfo')
  controller.set_conf('ContactInfo', 'someone else')
  contact_2 = controller.get_conf('ContactInfo')
  print(contact_2)

some_actions_together()

But code like this currently doesn't work intuitively because load_conf() doesn't have the configuration values specified in its response:

SETCONF ContactInfo=me

def some_actions_together():
  global controller
  
  contact_1 = controller.get_conf('ContactInfo')
  
  new_config = controller.get_info('config-text').replace("ContactInfo me", "ContactInfo someone else")
  controller.load_conf(new_config)
  
  contact_2 = controller.get_conf('ContactInfo')

some_actions_together()

(that could be improved if load_conf() clears the 'getconf' cache.)

No matter what we pick, I think there's potential for race conditions in complex scenarios...

atagar: thoughts?

comment:3 Changed 8 months ago by dmr

atagar: FYI, I'm happy to take on this ticket, but I won't be able to get to it for another week or so.

comment:4 Changed 7 months ago by atagar

Resolution: fixed
Status: newclosed

Hi dmr. Apologies, this is pretty long but if this concerns the #25423 caching bug then I expect this is handled by...

https://gitweb.torproject.org/stem.git/commit/?id=331406cb5ffe8e4e216e6d1396664f34d6c3cb28

If this is still an issue then feel free to reopen.

comment:5 in reply to:  2 Changed 7 months ago by dmr

Resolution: fixed
Status: closedreopened

Replying to atagar:

If this is still an issue then feel free to reopen.

Re-opening.
atagar and I discussed this over IRC.

Some points from the discussion below...

Replying to dmr:

So I propose that stem changes its cache behavior to set the 'getconf' cache only within _confchanged_listener(), to the value(s) provided by the event.
It seems superfluous to do it within set_options() if it's going to be set soon after.

  • _confchanged_listener() will set the cache to value(s) provided in the event
  • caching will be removed from set_options()

However, one advantage I could see to setting the cache in set_options() is so that code like this would work intuitively without issuing a GETCONF again:

...
  • GETCONF is cheap/fast; we don't need to worry about that

But code like this currently doesn't work intuitively because load_conf() doesn't have the configuration values specified in its response:

...

(that could be improved if load_conf() clears the 'getconf' cache.)

  • to be filed as separate issue
  • load_conf() should clear the entire cache (not just 'getconf' cache)

comment:6 Changed 7 months ago by dmr

Owner: changed from atagar to dmr
Reviewer: atagar
Status: reopenedassigned

Assigning self with atagar as reviewer.

comment:7 Changed 7 months ago by atagar

Hi dmr, the bit about 'as reviewer' caught my eye. Checked your repo but not spotting anything. Is there something for me to review?

comment:8 Changed 7 months ago by dmr

Status: assignedneeds_review

Ok, pushed changes; please review :). And sorry for the confusion before! I thought I'd assign you as a reviewer before the needs_review step.

git request-pull output

I'm planning to submit patch notifications in this boilerplate format now, to reduce overhead for prepping. Let me know if I should change it!
git request-pull master https://github.com/dmr-x/stem 25821-getconf-cache:

The following changes since commit 38ed5bca90ad6a71b9e45d8916030ea972513b21:

  Reword changelog description of recent CONF_CHANGED fix (2018-05-17 10:29:15 -0700)

are available in the git repository at:

  https://github.com/dmr-x/stem 25821-getconf-cache

for you to fetch changes up to f03f001d4b412d597e766fae4734b863ce4c938d:

  Changelog entry for configuration-changing cache improvements (#25821) (2018-05-18 17:22:27 +0000)

----------------------------------------------------------------
Me (6):
      Refactor dropping uncacheable GETCONF params into _set_cache()
      Refactor cache invalidation resulting from config change into new method
      Fix bug in cache invalidation for CONF_CHANGED
      Save value rather than invalidate getconf cache in CONF_CHANGED events
      Drop cache values for set_options() params instead of setting cache
      Changelog entry for configuration-changing cache improvements (#25821)

 docs/change_log.rst |  1 +
 stem/control.py     | 71 ++++++++++++++++++++++++++++++++++++++++++++---------------------------
 2 files changed, 45 insertions(+), 27 deletions(-)

Commits background/discussion

Here's a bit more background on the commits; some refactors in there, that also probably fix bugs.

Refactor 1: into _set_cache() - commit 9e517998c59a437e7050ebfb685066d249abb21e

I did this refactor because I wanted new code to be DRY and not rely on every implementer having to remember to include it.
It's a bit unclean - _set_cache() previously was namespace-agnostic and now has namespace-specific behavior, but I think it's the best overall solution.

I think this refactor also fixed a bug:
set_options() calls self._set_cache(to_cache, 'getconf') without removing the UNCACHEABLE_GETCONF_PARAMS from to_cache first.
So I believe moving the code here fixed a subtle bug in that case.

But the aforementioned code in set_options() has been removed from another commit, so it's moot now.

Refactor 2: into new _confchanged_cache_invalidation() - commit fdd34f81b986fdb21e782993ca380d5e65036494

Upon reviewing set_options(), realized that there are a lot of cache-invalidation things that occur when a config option changes.
As far as I can tell, these are all applicable to our _confchanged_listener() too.

So this "refactor" probably fixes a few bugs, too.

Bug fix - intermediate commit: 17782bc44f9afbb7dea49085afad29a936b7a1a3

This fixes a bug, but the code gets replaced thereafter. Separated out just for posterity.

Use change and unset attributes - commit fa1614d20ab57c57e55fcbdb39bece229026bc2b

The bulk of the functional work - now CONF_CHANGED should work well for single- or multiple-controller cases.

Drop cache, set_options() - commit 1eb527579d83ac6d2bc5c15eceeaabe0fa5f927b

It also seems important to drop existing cache values to reduce the timing window for race conditions, in set_options().
Race conditions still can exist; discussed later.

Changelog only - commit f03f001d4b412d597e766fae4734b863ce4c938d

No further explanation needed.

Other notes

Some other notes about the changes.

Lack of testing

Overall I didn't get to writing good tests here. I'd like to use some of the test code from #25423 that never got merged in. Just didn't get a chance to do that.
The cache overall doesn't have much in the way of testing.

Potential for race conditions

There's still a lot of potential here for race conditions. (I don't think we can fully eliminate that, by the very nature of the controlport architecture.)

I haven't fully thought about it, but a case that pops out to me is non-atomicity of values because of the caching layer.
Ideally we should change the API to allow for consumers to request that calling get_conf_map() retrieves all values fresh, so that the set is atomic - for consumers that need/care about that. Perhaps as get_conf_map(self, params, default = UNDEFINED, multiple = True, force_request=False).

comment:9 Changed 7 months ago by dmr

Keywords: controller added

Might as well add a keyword.

comment:10 Changed 7 months ago by atagar

Resolution: fixed
Status: needs_reviewclosed

Thanks Dave, looks good! Merged. Honestly over time our Controller's caching code has grown more complicated than I'd like. I'll give thought to simplifying it at some point but no reason that needs to be part of this. :)

comment:11 Changed 7 months ago by dmr

Hey Damian, the merge functionally looks good, but the commit message mentions the wrong ticket (#17873); see gitweb.

Not sure how you want to handle this, if at all. My commits referenced the Trac number in them and the changelog has it too, so the history is still fairly clear. Maybe add a git commit --allow-empty commit afterwards with just a message?

Note: See TracTickets for help on using tickets.