"Arm has a readily available class for representing a Tor exit policy ( in torTools.py). We need to adapt this class to stem's coding styles, write a constructor for it (see the getExitPolicy() and getRelayExitPolicy() methods for examples), and see if we can improve on it. This will also need quite a bit of testing.."
Preliminary stab at this - Just to make sure I'm going in the right direction.
Thanks for getting this started. I agree that we should have an ExitPolicy class that contains the list of entries. Some things from the first pass...
Part of porting things from arm to stem is to fix the conventions. A few changes are...
Arm used camel case variables (myVar) and stem uses the more official, pythony underscores (my_var).
Pydocs in stem have a very specific, standardized format - see any of the other files for examples. They also have header pydocs.
Everything else in stem has the indentation of two spaces. This one has four.
Something to think about is how we want to handle descriptor exit policies verses microdescriptor exit policies. Arm had code for handling them as an ExitPolicy but they lack some information (no ip addresses or private keyword) so I'm not sure if that's the best solution.
We don't need integration tests for this (descriptor and consensus integ tests will cover it), but this module does need a lot of unit testing to cover all the weird use cases that will commonly trip it up. To start with though we need some basic smoke tests to start exercising this code.
Arm used camel case variables (myVar) and stem uses the more official, pythony underscores (my_var).
Fixed
Pydocs in stem have a very specific, standardized format - see any of the other files for examples. They also have header pydocs.
Fixed to an extent - Needs a little more work.
Everything else in stem has the indentation of two spaces. This one has four.
Fixed. Sorry, OONI uses 4 and I got a little confused.
Something to think about is how we want to handle descriptor exit policies verses microdescriptor exit policies. Arm had code for handling them as an ExitPolicy but they lack some information (no ip addresses or private keyword) so I'm not sure if that's the best solution.
I'm not sure of the difference - I'll read up on the microdescriptor exit policies, but it looks like we can subclass ExitPolicyLine for the microdescriptor exit policies?
We don't need integration tests for this (descriptor and consensus integ tests will cover it), but this module does need a lot of unit testing to cover all the weird use cases that will commonly trip it up. To start with though we need some basic smoke tests to start exercising this code.
I'll start on the unit test soon.
(Also, I'm currently adding the get_summary() method to ExitPolicy. )
I would suggest rebasing your branch onto the current master and squash what you have so far...
git rebase -i master
Generally a commit corresponds to a single task, so squashing all of your commits is appropriate (all of your work so far have been formatting fixes for the arm module we're copying).
Keeping a clean history isn't vital (I can squash or change commits myself prior to pushing), but it makes our repository easier to make sense of. Just in case you're unaware, you can easily fix your last commit with...
git commit --amend
(this takes anything that is staged and applies it to your last commit, making it handy for correcting your prior work)
exit_policies = stem.version.ExitPolicy()
Copy and paste error. ;)
Header looks good for now. It might be good to tweak it later but that is usually the last thing I do since function names and usage are likely to change a few times.
elif policy.is_ip_wildcard and self.is_port_wildcard: return False
This is just an attribute of the ExitPolicyLine so the "self.is_port_wildcard" will cause an error.
def isExitingAllowed(self):
Should follow the underscore convention.
On a quick glance the formatting looks good so we can move on to the testing. Only unit tests are needed (integ tests don't make sense for this), but we need a lot of good ones since exit policies are highly error prone. This is actually the vast majority of the work behind this task.
I'm not sure of the difference - I'll read up on the microdescriptor exit policies, but it looks like we can subclass ExitPolicyLine for the microdescriptor exit policies?
From what I understand the microdescriptor policy is simply a series of accepted and rejected port numbers or ranges (no ip addresses nor keywords like private) though you should read the spec to figure out exactly what it consists of.
My guess is that we'll want an abstract parent class (ExitPolicyLine) and a couple for the implementations (ConsensusExitPolicyLine which is what ExitPolicyLine is now and MicrodescriptorExitPolicyLine for microdescriptors). But maybe I'm wrong. Please read through the spec though and see if something else makes better sense (and better names!).
Replying to atagar:
Sorry about the delay, exams are finally over, getting back to stem hacking..
I've ported the ExitPolicy.get_summary() method.
I would suggest rebasing your branch onto the current master and squash what you have so far...
Done.
exit_policies = stem.version.ExitPolicy()
Copy and paste error. ;)
Fixed.
elif policy.is_ip_wildcard and self.is_port_wildcard: return False
This is just an attribute of the ExitPolicyLine so the "self.is_port_wildcard" will cause an error.
Fixed.
def isExitingAllowed(self):
Should follow the underscore convention.
Fixed.
My guess is that we'll want an abstract parent class (ExitPolicyLine) and a couple for the implementations (ConsensusExitPolicyLine which is what ExitPolicyLine is now and MicrodescriptorExitPolicyLine for microdescriptors). But maybe I'm wrong. Please read through the spec though and see if something else makes better sense (and better names!).
According to the microdescriptor spec - "In the first version, the microdescriptor should contain the onion-key element, and the family element from the router descriptor, and the exit policy summary as currently specified in dir-spec.txt."
I can't seem to find the exit policy summary spec in dir-spec.txt, Is there some place else I should check?
I'll get started on the unit tests once I figure out how to implement the exit policy for microdesriptors.
WIP. I don't think it's time to review it now. I just want a sanity check on the logic/API.
Still need to add exceptions, docstrings. But I have the core functionality/logic finished. Still looks hackish - I'll do some more polishing soonish.
I went with another class for microdescriptors rather than creating a new ExitPolicyLine and then subclassing that. The mircodescriptor exit policies don't care about IP's so this makes more sense.
Right now, the API looks like this
# just get the exitpolicy from the micro-descriptor and parse it> exit_policy = new MircodescriptorExitPolicy()> exit_policy.add("accept 80,443")> print exit_policyaccept 80,443# ask the user to add the exit policy rules and strip out ip address> exit_policy = new MircodescriptorExitPolicy()> exit_policy.add("accept *:80")> exit_policy.add("accept 443")> exit_policy.add("reject *:*")> print exit_policyaccept 80,443
Ok this might be a little late, but in retrospect, do we really need a MicrodescriptorExitPolicy class? What is the point of it? It doesn't really provide any useful information(we can't use it to check if exiting to a particular ip is allowed - only the ports are given in the microdesc exit policy). I think it'd be better to have just one generic exit policy class. We can easily convert the Consensus exit policy to microdesc exit policy - I think ExitPolicy.get_summary() does this, otherwise I can hack it up.
Ok this might be a little late, but in retrospect, do we really need a MicrodescriptorExitPolicy class? What is the point of it?
What we want from this ticket is this:
A class that represents an exit policy, which we can use to see 'can I exit to this ip/port'
That's all. Anything that achieves that objective is good by me. :)
The reason for having separate classes for descriptor and microdescriptor exit policies is that they have different formats and (I assume) different rules for what we can exit to since microdescriptors don't contain as much information. It's also more proper from an object-oriented aspect since they are different things.
My suggestion would be to put the bulk of the logic into an ExitPolicy class, then have just the divergent methods in the DescriptorExitPolicy and MicrodescriptorExitPolicy subclasses (or whatever you call them). This would be similar to how we do the ControlSocket...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/socket.py
What we want from this ticket is this:
A class that represents an exit policy, which we can use to see 'can I exit to this ip/port'
From what I can understand, this isn't possible with the microdescriptor exit policy because it only contains a list of ports.
I prefer having a common API for both the exit policies and we can have a ExitPolicy.convert_to_microdescriptor_policy()(or sth with a better name) to create the list of a ports in the format specified by the microdesc exit policy summary spec.
There's no point in having another class for this.
From what I can understand, this isn't possible with the microdescriptor exit policy because it only contains a list of ports.
That seems weird. What is the use of the microdescriptor exit policy then? The point of an exit polity is to let the client know if you're willing to service a particular destination or not. If it doesn't answer that question, then how is it used by a tor client?
We probably need to answer this question before we can figure out a decent object model. Sebastian might be a good person to ask since I vaguely recall that he was involved with the introduction of microdescriptor exit policies.
Note that this doesn't block us from making a class for descriptor exit policies. Making that and tests would be a fine first step (and be something that we could finally tie into the Descriptor class).
That seems weird. What is the use of the microdescriptor exit policy then? The point of an exit polity is to let the client know if you're willing to service a particular destination or not. If it doesn't answer that question, then how is it used by a tor client?
I'm pretty clueless here.
Note that this doesn't block us from making a class for descriptor exit policies. Making that and tests would be a fine first step (and be something that we could finally tie into the Descriptor class).
Okay, let me do that. Also do you want me to check the exit policy string with a regexp for errors? I'm guessing this would take a while if we have to match every other pattern and check for correctness(like octets should be less than 256 etc)?
Okay, let me do that. Also do you want me to check the exit policy string with a regexp for errors? I'm guessing this would take a while if we have to match every other pattern and check for correctness(like octets should be less than 256 etc)?
Yes and no. We should be validating the ExitPolicy string, much as we do with everything else, and giving an error if any of it is invalid (malformed ips, invalid port numbers, etc). However, we shouldn't be doing this via a regex.
That seems weird. What is the use of the microdescriptor exit policy then? The point of an exit polity is to let the client know if you're willing to service a particular destination or not. If it doesn't answer that question, then how is it used by a tor client?
Nick could answer this better, but AIUI, the purpose of an exit policy summary is to help the client make good path selection decisions without bloating directories. With any summary, there may be false positives (connections that the relay doesn't support but which the summary implied would work) and false negatives (connections that the relay would support but which aren't contained in the summary). The client will learn if the chosen relay supports the connection it attempts to make after building the circuit. The idea is to stop wasting large amounts of client bandwidth on downloading descriptors, and exit policies can waste a lot of space there.
Nick could answer this better, but AIUI, the purpose of an exit policy...
Yup, I realize that the purpose of the microdescriptors is to be smaller. However, we should get clarification from Sebastian, Nick, or someone else knowledgeable about how tor utilizes those policies since they're a subset of the information in the full policies. It would be a pity for us to develop a MicrodescriptorExitPolicy class only to base it on incorrect assumptions and need to rewrite it later.
With any summary, there may be false positives
With the normal exit policy these false positives don't exist. Does microdescriptor based path selection try a path and, if it isn't allowed, construct another circuit? Do they somehow say 'look at the descriptor instead' when they can't represent the exit policy? Something else?
Here's what Nick wrote in dir-spec.txt based on my microdescriptor specification patches (#3038 (moved)):
[With microdescriptors, clients don't learn exact exit policies:clients can only guess whether a relay accepts their request, try theBEGIN request, and might get end-reason-exit-policy if they guessedwrong, in which case they'll have to try elsewhere.]
As Karsten said, the microdesc exit policies can only guess if the exiting to a particular ip+port is allowed. I feel we must warn the users that the MicrodescriptorExitPolicy.check() can only guess, but I'm not sure how to do that?
They're similar, but not the same. An exitpattern is better defined and stricter in what it'll accept. For instance, ports are not optional and it does not contain the 'private' alias.
I'll take over this task and rewrite the parser when I'm able (unless you're eager to dig into writing a strict parser). Sorry about the confusion.
Next I plan to revise the ExitPolicy class and expand its tests, though between Strawberry Festival and Defcon I'll be a bit distracted these next couple weeks. Cheers! -Damian
The ServerDescriptor class is now using it, which by extension means that the ExitPolicy class is parsing (and validating) all of the policies within your cached descriptors when you run the integ tests. Worked on the first try with was a pleasant surprise. Probably thanks to some pretty extensive and anal unit testing. Those tests... didn't exactly work on the first try. Nor the second. Nor the third... ;)
Trac: Resolution: N/Ato implemented Status: needs_revision to closed