Opened 5 years ago

Closed 5 years ago

#6114 closed task (implemented)

Implement GETCONF parsing in Stem

Reported by: neena Owned by: neena
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords:
Cc: atagar Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Stem's controller class must be able to parse and return GETCONF responses.

I've implemented this in my git branch, here:

http://repo.or.cz/w/stem/neena.git/shortlog/refs/heads/getconf-parsing

Child Tickets

Change History (16)

comment:1 Changed 5 years ago by neena

  • Owner changed from atagar to neena
  • Status changed from new to assigned

comment:2 Changed 5 years ago by neena

  • Status changed from assigned to needs_review
  • Summary changed from Implement GETINFO parsing in Stem to Implement GETCONF parsing in Stem

Few things to note:

  • I added another exception class for when the request made by the user is invalid instead of raising a ProtocolError. (Since, this isn't really an error with the tor following protocol). If this the right way to do it, the GETINFO code should probably use it too.
  • This code could probably be made to use a dict mapping strings to classes
  import stem.response.getinfo
  import stem.response.getconf
  import stem.response.protocolinfo
  
  if not isinstance(message, ControlMessage):
    raise TypeError("Only able to convert stem.response.ControlMessage instances")
  
  if response_type == "GETINFO":
    response_class = stem.response.getinfo.GetInfoResponse
  elif response_type == "GETCONF":
    response_class = stem.response.getconf.GetConfResponse
  elif response_type == "PROTOCOLINFO":
    response_class = stem.response.protocolinfo.ProtocolInfoResponse
  else: raise TypeError("Unsupported response type: %s" % response_type)

I didn't do this because merging would be a pain (with SAFECOOKIE using the preset if-elif chain).

  • That said, atagar, would it be easier for you if I did all my controller class work in a single branch? Since, merging multiple branches dealing with the same code might be painful? Or is that not a problem?

comment:3 follow-up: Changed 5 years ago by atagar

  • Status changed from needs_review to needs_revision

Looks great! Only a few minor quibbles or questions.

  • I added another exception class for when the request made by the user is invalid instead of raising a ProtocolError. (Since, this isn't really an error with the tor following protocol). If this the right way to do it, the GETINFO code should probably use it too.

Good idea. Agreed that we should add it to GETINFO.

  • That said, atagar, would it be easier for you if I did all my controller class work in a single branch? Since, merging multiple branches dealing with the same code might be painful? Or is that not a problem?

Lets do separate branches. They'll cause minor merge conflicts, but they should be easy for me to resolve since we have tests to exercise all of this.

+ """
+ Base Exception class for invalid requests
+ """
+ pass

Turns out that you actually don't need the 'pass'. If I still have other exceptions with it then please correct them.

+ if code == '552':
+ try:
+ # to parse: 552 Unrecognized configuration key "zinc"
+ unrecognized_keywords.append(re.search('"(["]+)"', line).groups()[0])

It looks like the error message is static, and if it changes then we'll likely break, so lets instead change this to...

if code == '552' and line.startswith('552 Unrecognized configuration key "') and line.endswith('"'):

unrecognized_keywords.append(line[36:-1])

If you think that this is more hacky then I'm happy to discuss it. :)

+ raise stem.response.InvalidRequest("GETCONF request contained unrecognized keywords: %s\n" \
+ % ', '.join(unrecognized_keywords))

Hm, I wonder if the caller would find the unrecognized_keywords to be useful as an attribute. Thoughts? If so then maybe we should have an InvalidArgument or InvalidInput as a InvalidRequest subclass. On the other hand, maybe a bad idea. Up to you.

+ if '=' in line:
+ if line[line.find("=") + 1] == "\"":
+ key, value = line.pop_mapping(True)
+ else:
+ key, value = line.split("=", 1)
+ else:
+ key, value = (line, None)

Alternatively I'm pretty sure that this could be...

if line.is_next_mapping(quoted = True):

key, value = line.pop_mapping(True)

elif line.is_next_mapping(quoted = False):

key, value = line.pop_mapping(False)

else:

key, value = line.pop(), None

We could change pop_mapping to allow for "maybe its a quoted value or maybe not" if that would help.

The unit tests that you have look great, but should include one for quoted values (what is an example of a configuration option that's quoted?). Also, what about multiple getconf values (for instance, ExitPolity)?

+ :raises:
+ :class:stem.socket.ControllerError if the call fails, and we weren't provided a default response
+ :class:stem.socket.InvalidRequest if configuration option requested was invalid.

Extra punctuation (the :param: and :returns: haven't been including it so far)

+ control_port = str(runner.get_tor_socket().get_port())
+ torrc_path = runner.get_torrc_path()
+ self.assertEqual(control_port, controller.get_conf("ControlPort"))
+ self.assertEqual(control_port, controller.get_conf("ControlPort", "la-di-dah"))

The torrc_path is unused. Does this test work when you run with a ControlSocket target?

comment:4 in reply to: ↑ 3 Changed 5 years ago by neena

  • Status changed from needs_revision to needs_review

Replying to atagar:

+ if code == '552':
+ try:
+ # to parse: 552 Unrecognized configuration key "zinc"
+ unrecognized_keywords.append(re.search('"(["]+)"', line).groups()[0])

It looks like the error message is static, and if it changes then we'll likely break, so lets instead change this to...

if code == '552' and line.startswith('552 Unrecognized configuration key "') and line.endswith('"'):

unrecognized_keywords.append(line[36:-1])

Done.

+ raise stem.response.InvalidRequest("GETCONF request contained unrecognized keywords: %s\n" \
+ % ', '.join(unrecognized_keywords))

Hm, I wonder if the caller would find the unrecognized_keywords to be useful as an attribute. Thoughts? If so then maybe we should have an InvalidArgument or InvalidInput as a InvalidRequest subclass. On the other hand, maybe a bad idea. Up to you.

I did consider this, wasn't sure if I should add another exception class. Done.

+ if '=' in line:
+ if line[line.find("=") + 1] == "\"":
+ key, value = line.pop_mapping(True)
+ else:
+ key, value = line.split("=", 1)
+ else:
+ key, value = (line, None)

Alternatively I'm pretty sure that this could be...

if line.is_next_mapping(quoted = True):

key, value = line.pop_mapping(True)

elif line.is_next_mapping(quoted = False):

key, value = line.pop_mapping(False)

else:

key, value = line.pop(), None

We could change pop_mapping to allow for "maybe its a quoted value or maybe not" if that would help.

The pop_* methods treat the space as a seperator, GETCONF responses can be of the form

250 DataDirectory=/tmp/fake dir

The pop_mapping would return ("DataDirectory", "/tmp/fake") and leave "dir" in self._remainder for the next pop. Until some other control message's response has similar formatting, I'd like to leave this as it is.

The unit tests that you have look great, but should include one for quoted values (what is an example of a configuration option that's quoted?). Also, what about multiple getconf values (for instance, ExitPolity)?

Multiple values would've failed. I missed that. Implemented it now.
I'm not sure what a quoted value would be. Can't find any, yet.

+ :class:stem.socket.InvalidRequest if configuration option requested was invalid.

Extra punctuation (the :param: and :returns: haven't been including it so far)

Fixed.

The torrc_path is unused.

ugh, removed.

Does this test work when you run with a ControlSocket target?

It wouldn't have, fixed it to make it work.

comment:5 follow-up: Changed 5 years ago by atagar

  • Status changed from needs_review to needs_revision

Just a small bit of feedback since I'm heading to the bus in a few minutes.

The pop_mapping would return ("DataDirectory", "/tmp/fake") and leave "dir" in self._remainder for the next pop. Until some other control message's response has similar formatting, I'd like to leave this as it is.

Gotcha. This is actually the second time that I've wanted a 'pop everything remaining on the line as a mapping' method (I ran into this same issue with GETINFO responses). Maybe we should add a 'remainder' flag to pop_mapping() to process everything remaining on the line as the value?

I'm not sure what a quoted value would be. Can't find any, yet.

Did the spec somehow imply that there are quoted values? If so then maybe you should ask Nick (he'd, of course, be the most familiar with what it might be for).

+ Queries the control socket for the values of given configuration options. If
+ provided a default then that's returned as the if the GETCONF option is undefined

Minor grammatical issue...
s/returned as the if the GETCONF/returned if the GETCONF

+ * :class:stem.socket.InvalidArguments the request's arguments are invalid

Lets state the reply types that this exception might be raised by (since they're the minority). Also, please make the GetInfo response do this too.

+def _getval(dictionary, key):
+ try:
+ return dictionary[key]
+ except KeyError:
+ pass

You want the dictionary's get() method - it's very handy, suppressing KeyErrors and returning an optional default value (otherwise returning None if the key doesn't exist).

>>> my_dict = {"hello": "world"}
>>> my_dict.get("hello")
'world'
>>> my_dict.get("damian")
>>> my_dict.get("damian", "blah")
'blah'

+def _split_line(line):
+ try:
+ if '=' in line:
+ if line[line.find("=") + 1] == "\"":
+ return line.pop_mapping(True)
+ else:
+ return line.split("=", 1)
+ else:
+ return (line, None)
+ except IndexError:
+ return (line[:-1], None)

I'm pretty sure that this can be replaced by...

if line.is_next_mapping(quoted = False):
  return line.split("=", 1).items()[0] # TODO: make this part of the ControlLine?
elif line.is_next_mapping(quoted = True):
  return line.pop_mapping(True).items()[0]
else:
  return (line.pop(), None)

Though I suspect that we can do better.

Looks close to being ready to be pushed. I'll give this a closer look after you've replied to this. Also, please rebase onto the current master (note that it includes a whitespace checker fix which might uncover issues in this change).

Cheers! -Damian

comment:6 in reply to: ↑ 5 Changed 5 years ago by neena

  • Status changed from needs_revision to needs_review

Replying to atagar:

The pop_mapping would return ("DataDirectory", "/tmp/fake") and leave "dir" in self._remainder for the next pop. Until some other control message's response has similar formatting, I'd like to leave this as it is.

Gotcha. This is actually the second time that I've wanted a 'pop everything remaining on the line as a mapping' method (I ran into this same issue with GETINFO responses). Maybe we should add a 'remainder' flag to pop_mapping() to process everything remaining on the line as the value?


Implementing the 'remaining' option was a little more work. To do it right, I had to modify _parse_entry and with the existing quoted and escaped switches, it was a little more effort than I expected. I'm going to do it this weekend.

I'm not sure what a quoted value would be. Can't find any, yet.

Did the spec somehow imply that there are quoted values? If so then maybe you should ask Nick (he'd, of course, be the most familiar with what it might be for).

Yes, it did, here https://gitweb.torproject.org/torspec.git/blob/master:/control-spec.txt#l264

 264   Value may be a raw value or a quoted string.  Tor will try to use
 265   unquoted values except when the value could be misinterpreted through
 266   not being quoted.

+ Queries the control socket for the values of given configuration options. If
+ provided a default then that's returned as the if the GETCONF option is undefined

Minor grammatical issue...
s/returned as the if the GETCONF/returned if the GETCONF

Fixed. Not sure why these are happening. :/

+ * :class:stem.socket.InvalidArguments the request's arguments are invalid

Lets state the reply types that this exception might be raised by (since they're the minority). Also, please make the GetInfo response do this too.

Done (in the same branch)

+def _getval(dictionary, key):
+ try:
+ return dictionary[key]
+ except KeyError:
+ pass

You want the dictionary's get() method - it's very handy, suppressing KeyErrors and returning an optional default value (otherwise returning None if the key doesn't exist).

Ah, yes. Done.

+def _split_line(line):
+ try:
+ if '=' in line:
+ if line[line.find("=") + 1] == "\"":
+ return line.pop_mapping(True)
+ else:
+ return line.split("=", 1)
+ else:
+ return (line, None)
+ except IndexError:
+ return (line[:-1], None)

I'm pretty sure that this can be replaced by...

if line.is_next_mapping(quoted = False):
  return line.split("=", 1).items()[0] # TODO: make this part of the ControlLine?
elif line.is_next_mapping(quoted = True):
  return line.pop_mapping(True).items()[0]
else:
  return (line.pop(), None)

Copied shamelessly.

comment:7 follow-up: Changed 5 years ago by atagar

  • Status changed from needs_review to needs_revision

Hi Ravi. Looks close! Mostly just minor issues, though the API change (and supporting the accursed HiddenService options) will require some more work.

+ def get_conf(self, param, default = None):

This should be...

  def get_conf(self, param, default = UNDEFINED):

... since None will be a commonly desired default value. The following except clause should be change too, of course.

+ :returns:
+ Response depends upon how we were called as follows...
+
+ * str with the response if our param was a str
+ * dict with the param => response mapping if our param was a list
+ * default if one was provided and our call failed

Is this right? If I query a LineList entry like the 'ExitPolicy' then do I get a str or a list? The dict response is certainly more complicated since GetConfResponse provides str => (str or list) mappings, which will be problematic for users since it means that they need to check the type before it can be used for anything.

Please see arm's src/util/torTools.py, the getOption() and getOptionMap() methods (and feel free to steal any code or pydocs you want!). I suspect that will provide a better API for users.

+ :raises:
+ :class:stem.socket.ControllerError if the call fails, and we weren't provided a default res
+ :class:stem.socket.InvalidArguments if configuration options requested was invalid

Good, though we should add stem.socket.InvalidArguments to get_info()'s pydocs too.

+ if requested_params != reply_params:
+ requested_label = ", ".join(requested_params)
+ reply_label = ", ".join(reply_params)
+
+ raise stem.socket.ProtocolError("GETCONF reply doesn't match the parameters that we requested. Queried '%s' but got '%s'." % (requested_label, reply_label))

This actually doesn't always apply to GETINFO responses. Read arm's getOptionMap() pydocs and you'll understand. The HiddenService options are a real pain in the ass.

+def _split_line(line):
+ if line.is_next_mapping(quoted = False):
+ return line.split("=", 1) # TODO: make this part of the ControlLine?
+ elif line.is_next_mapping(quoted = True):
+ return line.pop_mapping(True).items()[0]
+ else:
+ return (line.pop(), None)

Lets do this in _parse_message(). Helper methods are nice, but in this case neither the helper nor _parse_message() are especially complicated, so it mostly just gives another place look to figure out how GETCONF is parsed.

As discussed on irc, we should cite the ticket you filed about quotes here.

+ if not self.is_ok():
+ unrecognized_keywords = []
+ for code, _, line in self.content():
+ if code == "552" and line.startswith("Unrecognized configuration key \"") and line.endswith("\""):
+ unrecognized_keywords.append(line[32:-1])
+
+ if unrecognized_keywords:
+ raise stem.socket.InvalidArguments("552", "GETCONF request contained unrecognized keywords: %s\n" \
+ % ', '.join(unrecognized_keywords), unrecognized_keywords)
+ else:
+ raise stem.socket.ProtocolError("GETCONF response contained a non-OK status code:\n%s" % self)

Are you sure that Tor provides multiple 552 lines for multiple invalid keys rather than a single 552 that lists all of them? Please include a test case in both the unit and integ tests for a single request with multiple unrecognized keys. We should also include integ tests for a LineList attribute (like ExitPolicy), and if we can test for the HiddenService attributes then that would be useful since they're so weird.

+ while remaining_lines:
+ line = remaining_lines.pop(0)
+
+ key, value = _split_line(line)
+ entry = self.entries.get(key, None)
+
+ if type(entry) == str and entry != value:
+ self.entries[key] = [entry]
+ self.entries[key].append(value)
+ elif type(entry) == list and not value in entry:
+ self.entries[key].append(value)
+ else:
+ self.entries[key] = value

I see the appeal of having both a 'str => str' and 'str => list' mapping, but I suspect that it'll actually make things harder on the caller. *If* you want to switch to just 'str => set' mappings (to keep the value deduplication) then the following should do the trick...

  key, value = _split_line(line)
  self.entries.setdefault(key, set()).add(value)

'setdefault' gets the value of 'key' if it exists and, if it doesn't, inserts the second value into the dict and returns that. Something new that I learned at the python interest group last night - neat trick that I'll probably abuse immensely. :P

+InvalidRequest - Invalid request.

+ +- InvalidArguments - Invalid request parameters.

+- SocketError - Communication with the socket failed.

Missing pipe...

  +    |- InvalidRequest - Invalid request.
  +    |  +- InvalidArguments - Invalid request parameters.
       +- SocketError - Communication with the socket failed.

+ :var str code: The error code returned by Tor (if applicable)
+ :var str message: The error message returned by Tor (if applicable) or a human
+ readable error message

Very, very minor thing but please start :param: and :var: entries with a lowercase (that's what I've been doing throughout the codebase).

+class InvalidRequest(ControllerError):
+class InvalidArguments(InvalidRequest):

Now that I think about it more, what could make a request invalid besides its arguments? Maybe this division isn't so useful... thoughts?

+ :param str code: The error code returned by Tor (if applicable)
+ :param str message: The error message returned by Tor (if applicable) or a
+ human readable error message

Instead of saying '(if applicable)' it's usually better to tell the user what exactly will happen. For instance...

  :param str code: error code provided by tor, this is None if nothing was provided

However, is there any case where we won't have a status code and message? I can't think of any, so lets simplify it to...

  :param str code: error code provided by tor
  :param str message: message describing the issue

+ socket = runner.get_tor_socket()
+ if isinstance(socket, stem.socket.ControlPort):
+ socket = str(socket.get_port())
+ config_key = "ControlPort"

Dynamic languages like python allow you to reuse a variable for completely different things, but it hurts code readability. Maybe call the first 'socket' and the second 'connection_value'?

comment:8 in reply to: ↑ 7 Changed 5 years ago by neena

  • Status changed from needs_revision to needs_review

Replying to atagar:

Hi Ravi. Looks close! Mostly just minor issues, though the API change (and supporting the accursed HiddenService options) will require some more work.

... since None will be a commonly desired default value. The following except clause should be change too, of course.

Ah, yes. Fixed.

Please see arm's src/util/torTools.py, the getOption() and getOptionMap() methods (and feel free to steal any code or pydocs you want!). I suspect that will provide a better API for users.

Done.
I added optional keyword arguments to the stem.response.convert method which get passed on to the Response classes' _parse_message method.

Good, though we should add stem.socket.InvalidArguments to get_info()'s pydocs too.

Done.

+ raise stem.socket.ProtocolError("GETCONF reply doesn't match the parameters that we requested. Queried '%s' but got '%s'." % (requested_label, reply_label))

This actually doesn't always apply to GETINFO responses. Read arm's getOptionMap() pydocs and you'll understand. The HiddenService options are a real pain in the ass.

Ah. I've removed this check entirely. Were you suggesting something else?

+def _split_line(line):

...
Lets do this in _parse_message(). Helper methods are nice, but in this case neither the helper nor _parse_message() are especially complicated, so it mostly just gives another place look to figure out how GETCONF is parsed.

Fine.

Are you sure that Tor provides multiple 552 lines for multiple invalid keys rather than a single 552 that lists all of them?

Yes.

Please include a test case in both the unit and integ tests for a single request with multiple unrecognized keys.

Done. Found a bug while doing this. Fixed it.

We should also include integ tests for a LineList attribute (like ExitPolicy),

I added two fake NodeFamily options to the BASE_TORRC to do this.

and if we can test for the HiddenService attributes then that would be useful since they're so weird.

I'm thinking it would be better to this if/when we have a hidden service target?
If you want to add this to BASE_TORRC, I could do that (quickly).

*If* you want to switch to just 'str => set' mappings (to keep the value deduplication) then the following should do the trick...

Maybe doing deduplication isn't such a good idea. I initially decided to do it because

getconf log log
250-Log=notice stdout
250 Log=notice stdout

But, if something like exitpolicy has duplicate lines in the configuration file like

ControlPort 9100
ExitPolicy accept 34.3.4.5
ExitPolicy accept 3.4.53.3
ExitPolicy accept 3.4.53.3
ExitPolicy reject 23.245.54.3

then the deduplication will remove the duplicate in the configuration file

getconf exitpolicy exitpolicy
250-ExitPolicy=accept 34.3.4.5
250-ExitPolicy=accept 3.4.53.3
250-ExitPolicy=accept 3.4.53.3
250-ExitPolicy=reject 23.245.54.3
250-ExitPolicy=accept 34.3.4.5
250-ExitPolicy=accept 3.4.53.3
250-ExitPolicy=accept 3.4.53.3
250 ExitPolicy=reject 23.245.54.3

Not 100% sure if this is the right thing to do, but, I would assume if there's ever a tool that checks the configuration file for duplicates etc (a torrclint), it would want to get the duplicates as is.

Very, very minor thing but please start :param: and :var: entries with a lowercase (that's what I've been doing throughout the codebase).

Whoops, sorry. I'm trying. (When good habits turn bad...)

+class InvalidRequest(ControllerError):
+class InvalidArguments(InvalidRequest):

Now that I think about it more, what could make a request invalid besides its arguments? Maybe this division isn't so useful... thoughts?

There are other things that could make a request invalid, such as this in SETCONF responses.
https://gitweb.torproject.org/torspec.git/blob/master:/control-spec.txt#l223

 223   "513 syntax error in configuration values" reply on syntax error, or a
 224   "553 impossible configuration setting" reply on a semantic error.

However, is there any case where we won't have a status code and message? I can't think of any, so lets simplify it to...

Done. I added that because of the possibility of getting multiple status codes. In retrospect, I don't think that ever happens either.

Dynamic languages like python allow you to reuse a variable for completely different things, but it hurts code readability. Maybe call the first 'socket' and the second 'connection_value'?

Okay. Done.

comment:9 Changed 5 years ago by neena

I came across #6172 while implementing this.

We're working around this for now. Once that's fixed in Tor (hopefully 0.2.4.x) and once that meets the requirements in stem's support policy we should remove this hack.

comment:10 follow-up: Changed 5 years ago by atagar

  • Status changed from needs_review to needs_revision

+ :param bool multiple: if True, the value(s) provided are lists of all returned values,
+ otherwise this just provides the first value

Sounds good, but lets have the get_conf() handle the 'multiple' flag. There's really little need for this to influence how we parse the response.

We will have three use cases for GETCONF...

  1. Simple use case of querying a single item. This will be the majority use case, and is what arm's getOption() method does. Here we *only* want either a string or a list of strings.
  1. Request for a mapping value, of which HiddenServiceOptions is the only one that exists at present. This is what arm's getOptionMap() is for and we're still missing this capability in stem.
  1. Batch request, which like getOptionMap() produces a dictionary of 'option => value' mappings. This is something that arm did not handle, but we should. This can be nicely melded with the getOptionMap() method.

I would suggest that we add two methods to the controller...

1. get_conf with parameters...
  - param (str)
  - default (object)
  - multiple (bool)

2. get_conf_map with parameters...
  - param (str or list for batch call)
  - default (object)

Thoughts? Also please note that arm's getOption() is actually pretty smart about querying mapped values (like HiddenServiceDir). The more I look at it, the more I like how arm was handling this...

Ah. I've removed this check entirely. Were you suggesting something else?

On reflection, the check is appropriate for the first of those two methods.

I'm thinking it would be better to this if/when we have a hidden service target?
If you want to add this to BASE_TORRC, I could do that (quickly).

Are hidden service options something that we can setconf to enable, test against, then resetconf to revert them? If not, then maybe we should add a RUN_HIDDEN_SERVICE target for the integ tests.

HiddenServiceOptions is a weird use case so we really should include it in both our integ and unit tests.

Maybe doing deduplication isn't such a good idea.

Agreed. On reflection my suggestion to use a set is also fraught with peril since we'll need to retain the ordering for list values (oops).

There are other things that could make a request invalid, such as this in SETCONF responses.

Nice catch.

comment:11 in reply to: ↑ 10 Changed 5 years ago by neena

  • Status changed from needs_revision to needs_review

Replying to atagar:

+ :param bool multiple: if True, the value(s) provided are lists of all returned values,
+ otherwise this just provides the first value

Sounds good, but lets have the get_conf() handle the 'multiple' flag. There's really little need for this to influence how we parse the response.

done.

I would suggest that we add two methods to the controller...

  1. get_conf_map with parameters...
    • param (str or list for batch call)
    • default (object)

}}}

Not sure why you didn't add a 'multiple' there. Did you want it to default to multiple, or mixed (string or list of strings)?
(I added a multiple parameter)

Ah. I've removed this check entirely. Were you suggesting something else?

On reflection, the check is appropriate for the first of those two methods.

Added, to both of the methods. It checks as long as none of the requested values were weird, i.e., needed to be retrieved by querying for a different value.

I'm thinking it would be better to this if/when we have a hidden service target?

Are hidden service options something that we can setconf to enable, test against, then resetconf to revert them? If not, then maybe we should add a RUN_HIDDEN_SERVICE target for the integ tests.

HiddenServiceOptions is a weird use case so we really should include it in both our integ and unit tests.

I have added these tests in my setconf-wrapper branch.

comment:12 Changed 5 years ago by atagar

  • Status changed from needs_review to needs_revision

So many changes to review. Oh well. Once more unto the breach, dear friends, once more.

Just for clarification, I'm reviewing 'git diff origin/master neena/getconf-parsing'.

  1. get_conf_map with parameters...
    • param (str or list for batch call)
    • default (object)

}}}

Not sure why you didn't add a 'multiple' there. Did you want it to default
to multiple, or mixed (string or list of strings)?
(I added a multiple parameter)

The get_conf() method should have a multiple parameter, but I left it out of get_conf_map() because it seemed simpler for it to always provide a dict of 'key => [values...]'. I can see how it might be useful for batch calls so I'm on the fencepost about including it. Up to you.

On reflection, the check is appropriate for the first of those two methods.

Added, to both of the methods. It checks as long as none of the requested
values were weird, i.e., needed to be retrieved by querying for a
different value.

Good idea, though wouldn't it be better to check that an individual parameter isn't a key or value in self._mapped_config_keys? I'm a little unsure what your check is trying to do...

+ if not set(self._mapped_config_keys.values()) &

Won't this always evaluate to False since you're essentially evaulating...

not bool(len(set(self._mapped_config_keys.values()))) =>
not bool(len(["HiddenServiceOptions"])) =>
not bool(1) =>
not True

+ _mapped_config_keys = {
+ "HiddenServiceDir": "HiddenServiceOptions",
+ "HiddenServicePort": "HiddenServiceOptions",
+ "HiddenServiceVersion": "HiddenServiceOptions",
+ "HiddenServiceAuthorizeClient": "HiddenServiceOptions",
+ "HiddenServiceOptions": "HiddenServiceOptions"
+ }

There's no reason for this to be an instance variable. Lets make it a global constant.

+ :class:stem.socket.ControllerError if the call fails, and we weren't provided a default response

Very, very minor nitpick but we don't need a comma there (both in get_conf() and get_conf_map()).

+ if param.lower() in self._mapped_config_keys.keys():
+ return self.get_conf_map(self._mapped_config_keys[param], default, multiple)[param]
+ else:
+ return self.get_conf_map(param, default, multiple)[param]

Three things...

  1. You're checking if 'param.lower()' is among the keys but fetching just 'param', you you'll get a KeyError if the case doesn't match. We should probably just do 'param = param.lower() # case insensitive' right away.
  1. The default check for 'in' is the keys...
>>> my_dict = {"hello": "world"}
>>> "hello" in my_dict
True

>>> "world" in my_dict
False
  1. An alternative is...
param = param.lower() # case insensitive
return self.get_conf_map(self._mapped_config_keys.get(param, param), default, multiple)[param]

Though your version, with an explicit 'if/else', might be a bit more readable. Up to you.

+ def get_conf_map(self, param, default = UNDEFINED, multiple = False):

Did you find arm's description of the various config option types to be helpful? If so then lets preserve it here (modified slightly to reflect what we're doing here, further improvements welcome)...

     There's three use cases for GETCONF:
     - a single value is provided
     - multiple values are provided for the option queried
     - a set of options that weren't necessarily requested are returned (for
       instance querying HiddenServiceOptions gives HiddenServiceDir,
       HiddenServicePort, etc)
     
     The vast majority of the options fall into the first two categories, in
     which case calling get_conf() is sufficient. However, for batch queries or
     the special options that give a set of values this provides back the full
     response. As of tor version 0.2.1.25 HiddenServiceOptions was the only
     option like this.
     
     The get_conf() and get_conf_map() functions both try to account for these
     special mappings, so queried like get_conf("HiddenServicePort") should
     behave as you'd expect.

+ :returns:
+ Response depends upon how we were called as follows...
+
+ * dict with param (str) => response mappings (str) if multiple was False
+ * dict with param (str) => response mappings (list) if multiple was True
+ * dict with param (str) => default mappings if a default value was provided and our call failed

The 'dict with param (str) => ' is a bit redundant. Lets shorten this a bit to something like...

:returns:
  Response depends upon how we were called as follows...

  * dict of 'config key => value' mappings, the value is a list if 'multiple' is True and otherwise is a str of just the first value
  * default if one was provided and our call failed

+ if param == [""] or param == []:
+ raise stem.socket.InvalidRequest("Received empty parameter")

Hm. If 'param == []' then we should probably just return an empty dictionary. If 'param == [""]' then that depends, does "GETCONF " error or provide nothing? If the former then that should return an empty dictionary too.

Note that if we have a default parameter then we should *never* raise an exception (our caller won't be trying to catch one).

+ requested_params = set(map(lambda x: x.lower(), param))
+ reply_params = set(map(lambda x: x.lower(), response.entries.keys()))

You don't need the lamdas.

>>> map(str.lower, ["HELLO", "WORLD"])
['hello', 'world']

+ raise stem.socket.ProtocolError("GETCONF reply doesn't match the parameters that we requested. Queried '%s' but got '%s'." % (requested_label, reply_label))

Again, we can't raise if there's a default. How I usually handled this in arm was to have a 'result' and 'raised_exception' parameter. Then, rather than raising exceptions, I assigned it to 'raised_exception' then handled the "if I have a default do this, if not and I have an exception do that" logic at the end.

Personally I don't like this pattern, so if you're able to find something that's better then great.

+ * :class:stem.socket.InvalidArguments the arguments given as input are
+ invalid. Raised when converting GETINFO or GETCONF requests

Does this render correctly in sphinx? I've needed to sacrifice having sane line lengths at a few other places in the documentation because it then renders the first line as being part of the list, and the second line as a new paragraph.

Btw, to make the documentation all you need to do is...

cd stem/docs
make html

Then point firefox at it, in my case "file:///home/atagar/Desktop/stem/docs/_build/html/index.html". If you see something that looks out of date then do a 'make clean' first (the only thing that I've ever seen get stale due to caching is the "[source]" contents).

+ """
+ Reply for a GETCONF query.
+
+ :var dict entries: mapping between the queried options (string) and...

Not entirely true. With the hidden service options it's the real parameter rather than what we queried. Another option...

Reply for a GETCONF query.

Note that configuration parameters won't match what we queried for if it's one
of the special mapping options (ex. "HiddenServiceOptions").

:var dict entries: mapping between the config parameter (string) and...

+ else:
+ key, value = (line.pop(), None)
+
+ entry = self.entries.get(key, None)
+
+ self.entries.setdefault(key, []).append(value)

What is the 'entry = self.entries.get(key, None)' line doing here? Looks like a no-op.

I forgot that we might have None values. Guess that our get_conf() and get_conf_map() should warn about this?

+ :var str code: The error code returned by Tor (if applicable)
+ :var str message: The error message returned by Tor (if applicable) or a human
+ readable error message

Trivial nitpicks again but should start with a lowercase (actually, I'd just remove the word 'The' since it reads better without it), and I'm pretty sure that :var: entries need to be on a single line to render correctly.

+class InvalidRequest(ControllerError):
...
+ def init(self, code = None, message = None):
+ """
+ Initializes an InvalidRequest object.
+
+ :param str code: The error code returned by Tor
+ :param str message: The error message returned by Tor
+
+ :returns: object of InvalidRequest class
+ """
+
+ self.code = code
+ self.message = message

Subclasses should *always* call the init for their parent class. Doing otherwise can lead to very, very confusing errors (speaking as someone who's been bitten by that a few times :P). Same goes for InvalidArguments rather than having it set code and message itself.

+ with runner.get_tor_controller() as controller:
+ # successful single query
+ socket = runner.get_tor_socket()

Nope, definite badness here. Two issues...

  1. You're creating a control socket without closing it. This is why we almost always use the 'with' keyword when we get either a controller or socket.
  1. In essence what you're saying here is "Give me a controller with an initialized control socket" followed by "Give me an initialized control socket". Note that the controller already has a socket so you now have two. I'd suggest removing the "runner.get_tor_socket()" call and using "controller.get_socket()" instead.

Maybe we want a '_get_test_case(controller)' method which returns the (config_key, expected_value) tuple? Might make things a bit more readable - up to you.

In the future I'll probably break all of these tests up into individual use cases (ie, test_getconf_batch, test_getconf_empty, etc) but guess we should keep it similar to what I did with getinfo for now.

+NodeFamily dummystemnodexyz1x1,dummystemnodexyz1x2,dummystemnodexyz1x3
+NodeFamily dummystemnodexyz2x1,dummystemnodexyz2x2,dummystemnodexyz2x3

Could test_get_conf() issue a SETCONF for this then test against that rather than making it part of our default torrc (reverting after the test finishes)? It feels kinda wrong to include test data here.

+ try:
+ stem.response.convert("GETINFO", control_message)
+ except stem.socket.InvalidArguments, exc:
+ self.assertEqual(exc.arguments, blackhole?)

After the "stem.response.convert()" we should probably call "self.fail()" just in case we have a bug where it succeeds. Same goes for the similar getconf unit test.

Would you like us to merge your getconf-parsing, setconf-wrapper, and saveconf-wrapper branches all at once or one at a time? If the former then this'll take a while but feel free to make them all one 'tor-config' branch and make revisions on top of that.

If we are doing them individually then you'll want to make revisions to getconf-parsing then rebase the other two on top of that.

Cheers! -Damian

comment:13 Changed 5 years ago by neena

  • Status changed from needs_revision to needs_review

My commit revising the code here --> http://repo.or.cz/w/stem/neena.git/commit/5bf45dd3cddbd02b990b3b8af4d8225f0a33d4fc

Replying to atagar:

The get_conf() method should have a multiple parameter, but I left it out of get_conf_map() because it seemed simpler for it to always provide a dict of 'key => [values...]'. I can see how it might be useful for batch calls so I'm on the fencepost about including it. Up to you.

Making multiple = True the default for get_conf_map, but, including it as a parameter.

Good idea, though wouldn't it be better to check that an individual parameter isn't a key or value in self._mapped_config_keys? I'm a little unsure what your check is trying to do...

+ if not set(self._mapped_config_keys.values()) &

Won't this always evaluate to False since you're essentially evaulating...

Nope, the '&' in

not set(self._mapped_config_keys.values()) & requested_params

is an intersection of both the sets. So, that reads 'if the intersection between requested_params and the set of keys that are mapped to other keys is empty'.

+ _mapped_config_keys = {
+ "HiddenServiceDir": "HiddenServiceOptions",

There's no reason for this to be an instance variable. Lets make it a global constant.

It's a class variable when defined like that. But, if you feel it should be a global constant, I could change that.

+ :class:stem.socket.ControllerError if the call fails, and we weren't provided a default response

Very, very minor nitpick but we don't need a comma there (both in get_conf() and get_conf_map()).

Fixed.

+ if param.lower() in self._mapped_config_keys.keys():
+ return self.get_conf_map(self._mapped_config_keys[param], default, multiple)[param]
+ else:
+ return self.get_conf_map(param, default, multiple)[param]

  1. An alternative is...
    param = param.lower() # case insensitive
    return self.get_conf_map(self._mapped_config_keys.get(param, param), default, multiple)[param]
    

Much better. Using this.

     There's three use cases for GETCONF:
     - a single value is provided
     - multiple values are provided for the option queried
     - a set of options that weren't necessarily requested are returned (for
       instance querying HiddenServiceOptions gives HiddenServiceDir,
       HiddenServicePort, etc)
     
     The vast majority of the options fall into the first two categories, in
     which case calling get_conf() is sufficient. However, for batch queries or
     the special options that give a set of values this provides back the full
     response. As of tor version 0.2.1.25 HiddenServiceOptions was the only
     option like this.
     
     The get_conf() and get_conf_map() functions both try to account for these
     special mappings, so queried like get_conf("HiddenServicePort") should
     behave as you'd expect.

Stolen.

The 'dict with param (str) => ' is a bit redundant. Lets shorten this a bit to something like...

:returns:
  Response depends upon how we were called as follows...

  * dict of 'config key => value' mappings, the value is a list if 'multiple' is True and otherwise is a str of just the first value
  * default if one was provided and our call failed

Nicked.

+ if param == [""] or param == []:
+ raise stem.socket.InvalidRequest("Received empty parameter")

Hm. If 'param == []' then we should probably just return an empty dictionary. If 'param == [""]' then that depends, does "GETCONF " error or provide nothing? If the former then that should return an empty dictionary too.

I was confused about how this should behave, decided to do this because the get_info does it too
From def test_getinfo in test/integ/control/controller.py

      # empty input
      
      self.assertRaises(stem.socket.ControllerError, controller.get_info, "")
      self.assertEqual("ho hum", controller.get_info("", "ho hum"))
      
      self.assertEqual({}, controller.get_info([]))
      self.assertEqual({}, controller.get_info([], {}))

Want to talk about this on IRC before we decide what to do. Accepting empty
strings will be a little tricky

Note that if we have a default parameter then we should *never* raise an exception (our caller won't be trying to catch one).

:?
These exceptions are caught at

     except stem.socket.ControllerError, exc:
      if default != UNDEFINED: return dict([(p, default) for p in param])
      else: raise exc

and then either the default is used or the exception is raised. I borrowed this style from the get_info method you wrote, 'tis neat.

+ requested_params = set(map(lambda x: x.lower(), param))

You don't need the lamdas.

>>> map(str.lower, ["HELLO", "WORLD"])
['hello', 'world']

Ah, yes. Doing this.

+ raise stem.socket.ProtocolError("GETCONF reply doesn't match the parameters that we requested. Queried '%s' but got '%s'." % (requested_label, reply_label))

Again, we can't raise if there's a default. How I usually handled this in arm was to have a 'result' and 'raised_exception' parameter. Then, rather than raising exceptions, I assigned it to 'raised_exception' then handled the "if I have a default do this, if not and I have an exception do that" logic at the end.
Personally I don't like this pattern, so if you're able to find something that's better then great.

You did find something better. I used that here... see above.

+ * :class:stem.socket.InvalidArguments the arguments given as input are
+ invalid. Raised when converting GETINFO or GETCONF requests

Does this render correctly in sphinx? I've needed to sacrifice having sane line lengths at a few other places in the documentation because it then renders the first line as being part of the list, and the second line as a new paragraph.

Yup, looks fine. http://ompldr.org/vZWp5Ng/tracimg-doc.png

Not entirely true. With the hidden service options it's the real parameter rather than what we queried. Another option...

Reply for a GETCONF query.

Note that configuration parameters won't match what we queried for if it's one
of the special mapping options (ex. "HiddenServiceOptions").

:var dict entries: mapping between the config parameter (string) and...

Pirated.

+ else:
+ key, value = (line.pop(), None)
+
+ entry = self.entries.get(key, None)
+
+ self.entries.setdefault(key, []).append(value)

What is the 'entry = self.entries.get(key, None)' line doing here? Looks like a no-op.

I... don't remember, removed. This came in during the nasty rebase. :/

I forgot that we might have None values. Guess that our get_conf() and get_conf_map() should warn about this?

+ :var str code: The error code returned by Tor (if applicable)
+ :var str message: The error message returned by Tor (if applicable) or a human
+ readable error message

Trivial nitpicks again but should start with a lowercase (actually, I'd just remove the word 'The' since it reads better without it), and I'm pretty sure that :var: entries need to be on a single line to render correctly.

My bad, sorry. This happened during the horrible rebase too.

+class InvalidRequest(ControllerError):
...
+ def init(self, code = None, message = None):
+ """
+ Initializes an InvalidRequest object.
+
+ :param str code: The error code returned by Tor
+ :param str message: The error message returned by Tor
+
+ :returns: object of InvalidRequest class
+ """
+
+ self.code = code
+ self.message = message

Subclasses should *always* call the init for their parent class. Doing otherwise can lead to very, very confusing errors (speaking as someone who's been bitten by that a few times :P). Same goes for InvalidArguments rather than having it set code and message itself.

Done.

+ with runner.get_tor_controller() as controller:
+ # successful single query
+ socket = runner.get_tor_socket()

  1. You're creating a control socket without closing it. This is why we almost always use the 'with' keyword when we get either a controller or socket.
  2. In essence what you're saying here is "Give me a controller with an initialized control socket" followed by "Give me an initialized control socket". Note that the controller already has a socket so you now have two. I'd suggest removing the "runner.get_tor_socket()" call and using "controller.get_socket()" instead.

Fixed.

+NodeFamily dummystemnodexyz1x1,dummystemnodexyz1x2,dummystemnodexyz1x3
+NodeFamily dummystemnodexyz2x1,dummystemnodexyz2x2,dummystemnodexyz2x3

Could test_get_conf() issue a SETCONF for this then test against that rather than making it part of our default torrc (reverting after the test finishes)? It feels kinda wrong to include test data here.

yeap, okay, done.

+ try:
+ stem.response.convert("GETINFO", control_message)
+ except stem.socket.InvalidArguments, exc:
+ self.assertEqual(exc.arguments, blackhole?)

After the "stem.response.convert()" we should probably call "self.fail()" just in case we have a bug where it succeeds. Same goes for the similar getconf unit test.

Hmm, using an else clause here.

Would you like us to merge your getconf-parsing, setconf-wrapper, and saveconf-wrapper branches all at once or one at a time? If the former then this'll take a while but feel free to make them all one 'tor-config' branch and make revisions on top of that.

Seperately is probably better. Don't want to end with a large unmergable, out-of-sync branch.

If we are doing them individually then you'll want to make revisions to getconf-parsing then rebase the other two on top of that.

Yup, that's what I'm doing.

comment:14 Changed 5 years ago by neena

  • Status changed from needs_review to needs_revision

comment:15 Changed 5 years ago by cypherpunks

  • Status changed from needs_revision to needs_review

Pushed my fixes

-- neena

comment:16 Changed 5 years ago by atagar

  • Resolution set to implemented
  • Status changed from needs_review to closed

Feature merged and pushed with a little twiddling on my part...
https://gitweb.torproject.org/stem.git/commitdiff/c09b80ef912bb590a37e0d4533a34bbba14cabf9

Thanks!

Note: See TracTickets for help on using tickets.