Opened 5 years ago

Closed 4 years ago

#7713 closed defect (fixed)

Stem should invalidate GETCONF cache

Reported by: atagar Owned by: atagar
Priority: Low Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords: controller easy
Cc: sreenatha.dev@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Stem caches the result of GETCONF queries, but only updates its cache if SETCONF is done through that Controller. There could be other controllers that alter tor's configuration out from under us.

Now that we have CONF_CHANGED events this should be easy enough to fix (#1692). The Controller class should add its own listener for CONF_CHANGED events that does cache updates similar to set_options().

Child Tickets

Change History (11)

comment:1 Changed 5 years ago by atagar

  • Keywords controller added

comment:2 Changed 5 years ago by atagar

  • Keywords easy added

comment:3 Changed 4 years ago by sreenatha

  • Cc sreenatha.dev@… added

Added a listener to the Controller class for CONF_CHANGED events that updates getconf cache.

https://github.com/lucyd/stem/commit/c2972edda6cd20d350cd208319611ea5195d35fa

comment:4 Changed 4 years ago by sreenatha

  • Status changed from new to needs_review

comment:5 Changed 4 years ago by atagar

  • Status changed from needs_review to needs_revision

Hi Sreenatha. This looks good except that it caused a couple issues that would have been revealed by running the tests...

  1. Your patch uses tab indentations...
atagar@morrigan:~/Desktop/stem$ ./run_tests.py --style
======================================================================
                             INITIALISING                             
======================================================================

  checking stem version...                             1.0.1-dev
  checking python version...                           2.7.1
  checking pycrypto version...                         2.6
  checking pyflakes version...                         0.5.0
  checking pep8 version...                             1.4
  checking for orphaned .pyc files...                  done
  checking for unused tests...                         done

STATIC CHECKS
* /home/atagar/Desktop/stem/stem/control.py
  line 681  - indentation has a tab
  line 683  - indentation has a tab
  line 684  - indentation has a tab
  line 685  - indentation has a tab
  line 686  - indentation has a tab
  line 687  - indentation has a tab
  line 688  - indentation has a tab
  line 689  - indentation has a tab
  line 691  - indentation has a tab
  line 692  - indentation has a tab

TESTING PASSED (32 seconds)
  1. This broke our integration tests.
======================================================================
FAIL: test_getconf
----------------------------------------------------------------------
Traceback:
  File "/home/atagar/Desktop/stem/test/integ/control/controller.py", line 441, in test_getconf
    self.assertEqual([",".join(n) for n in nodefamilies], controller.get_conf("nodefamily", multiple = True))
AssertionError: Lists differ: ['abc,xyz,pqrs', 'mno,tuv,wxyz... != ['mno,tuv,wxyz']

First differing element 0:
abc,xyz,pqrs
mno,tuv,wxyz

First list contains 1 additional elements.
First extra element 1:
mno,tuv,wxyz

- ['abc,xyz,pqrs', 'mno,tuv,wxyz']
+ ['mno,tuv,wxyz']

----------------------------------------------------------------------
Ran 31 tests in 11.210s

FAILED

Running './run_tests.py --all' is a good way of catching these kinds of issues. I've pushed a version of your patch that simply invalidates the cache (this avoids the bug the issue spotted by the integ test)...

https://gitweb.torproject.org/stem.git/commitdiff/de576cad4bbdd4901ee1857334a7420593ef2458

... however if you wouldn't mind a couple things would make this better...

  1. Mind looking into why this integ test was failing, and revise the cache updating you had to account for it?
  1. It would be useful to have a unit or integ test that demonstrates the caching bug that this patch solves so we don't have this kind of regression in the future.

Thanks! -Damian

comment:6 Changed 4 years ago by sreenatha

It seems that in test_getconf (test.integ.control.controller.TestController), where the test is failing, the following command is being sent through the controller.

SETCONF nodefamily="abc,xyz,pqrs" nodefamily="mno,tuv,wxyz"

I've also noticed that CONF_CHANGED events with the following config are being sent to the listener.

{'NodeFamily': 'mno,tuv,wxyz'}
{'NodeFamily': None}

So I'm guessing that the listener updates the cached value of 'nodefamily' parameter to 'mno,tuv,wxyz' before controller.get_conf() is called in test_getconf, resulting in a single value for 'nodefamily'. This causes the AssertionError in the integ test. One way to make the test work would be to set an option for not updating the cache in the test and then check it in the listener. Implementation for this that works for the style tests as well is available at https://github.com/lucyd/stem/commit/21e1542c8f3ae6e28b062615896dd541ed4f4ec1

I also modified the test to check if the cache is updated when another controller changes the config. https://github.com/lucyd/stem/commit/78da5aece13701d2b6b3d509d2aaeb8cdb70a040

comment:7 Changed 4 years ago by atagar

I've also noticed that CONF_CHANGED events with the following config are being sent to the listener.

{'NodeFamily': 'mno,tuv,wxyz'}
{'NodeFamily': None}

That seems really odd. Do you know where that second one is coming from? Please try connecting to a tor instance with telnet and seeing what raw CONF_CHANGED events that tor provides in response to that SETCONF. We might be uncovering a tor bug here. ;)

One way to make the test work would be to set an option for not updating the cache in the test and then check it in the listener.

If the cache ever disagrees with what 'GETCONF' provides then that is a stem bug. Disabling the cache for the test would simply mask the issue.

I also modified the test to check if the cache is updated when another controller changes the config. https://github.com/lucyd/stem/commit/78da5aece13701d2b6b3d509d2aaeb8cdb70a040

Thanks! Please add this as its own test (there's no need to bundle it with another).

comment:8 Changed 4 years ago by arma

SETCONF nodefamily="abc,xyz,pqrs" nodefamily="mno,tuv,wxyz"
650-CONF_CHANGED
650-NodeFamily=abc,xyz,pqrs
650-NodeFamily=mno,tuv,wxyz
650 OK
250 OK
getconf nodefamily
250-NodeFamily=abc,xyz,pqrs
250 NodeFamily=mno,tuv,wxyz

comment:9 Changed 4 years ago by atagar

Thanks, definitely not a tor issue then. Any idea where that anomalous event is coming from, Sreenatha?

comment:10 Changed 4 years ago by neena

The assertion is failing because it is executed before the _confchanged_listener can finish updating the cached value. One way to make the test pass would be to have it sleep for a few 100 milliseconds before running

      self.assertEqual(['mno,tuv,wxyz'], controller.get_conf("nodefamily", multiple = True))

The right way to do this would probably be to, umm, lock a lock when CONF_CHANGED occurs and wait for it to be released in the get_conf call.

comment:11 Changed 4 years ago by atagar

  • Resolution set to fixed
  • Status changed from needs_revision to closed

The right way to do this would probably be to, umm, lock a lock when CONF_CHANGED occurs and wait for it to be released in the get_conf call.

Well drats. Thanks for the catch, Ravi!

We can't really lock and await the CONF_CHANGED event since that is happening in a separate loop. The actual issue here seems to be that we're using msg() rather than set_conf(). If we used set_conf() then it would be updating the cache for us, so we could be sure the following get_conf() would be consistent.

We're probably using msg() on purpose in this test to better simulate what it is like to have a second controller attached.

Actually, now that I think about it more I'm less concerned that get_conf() provided the wrong response - if a separate controller makes asynchronous SETCONF calls then all bets are off. I am, however, concerned that we got this kind of half-way response. We didn't return either "" (the old value) or "['abc,xyz,pqrs', 'mno,tuv,wxyz']" (the new value) but rather a mix of the two because the get_conf() came *between* events.

I suspect the best way for us to handle this is the change I made (which is to invalidate the cache rather than attempt to update it). If you have a better idea then let me know!

Note: See TracTickets for help on using tickets.