Opened 5 years ago

Closed 5 years ago

#5454 closed enhancement (implemented)

ExitPolicy Class for stem

Reported by: gsathya Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Keywords:
Cc: gsathya, neenaoffline@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

"Arm has a readily available class for representing a Tor exit policy ( in torTools.py). We need to adapt this class to stem's coding styles, write a constructor for it (see the getExitPolicy() and getRelayExitPolicy() methods for examples), and see if we can improve on it. This will also need quite a bit of testing.."

Preliminary stab at this - Just to make sure I'm going in the right direction.

https://gitweb.torproject.org/user/gsathya/stem.git/shortlog/refs/heads/exit_policy

Child Tickets

Change History (27)

comment:1 Changed 5 years ago by gsathya

Status: newneeds_review

comment:2 Changed 5 years ago by atagar

Status: needs_reviewneeds_revision

Thanks for getting this started. I agree that we should have an ExitPolicy class that contains the list of entries. Some things from the first pass...

  • Part of porting things from arm to stem is to fix the conventions. A few changes are...
    • Arm used camel case variables (myVar) and stem uses the more official, pythony underscores (my_var).
    • Pydocs in stem have a very specific, standardized format - see any of the other files for examples. They also have header pydocs.
    • Everything else in stem has the indentation of two spaces. This one has four.
  • Something to think about is how we want to handle descriptor exit policies verses microdescriptor exit policies. Arm had code for handling them as an ExitPolicy but they lack some information (no ip addresses or private keyword) so I'm not sure if that's the best solution.
  • We don't need integration tests for this (descriptor and consensus integ tests will cover it), but this module does need a lot of unit testing to cover all the weird use cases that will commonly trip it up. To start with though we need some basic smoke tests to start exercising this code.

comment:3 in reply to:  2 Changed 5 years ago by gsathya

Status: needs_revisionneeds_review

Thanks a lot for the code review!

https://gitweb.torproject.org/user/gsathya/stem.git/commit/3bbb6cb193447f3c07129d0c1b96ebaad3db8a25

  • Arm used camel case variables (myVar) and stem uses the more official, pythony underscores (my_var).

Fixed

  • Pydocs in stem have a very specific, standardized format - see any of the other files for examples. They also have header pydocs.

Fixed to an extent - Needs a little more work.

  • Everything else in stem has the indentation of two spaces. This one has four.

Fixed. Sorry, OONI uses 4 and I got a little confused.

  • Something to think about is how we want to handle descriptor exit policies verses microdescriptor exit policies. Arm had code for handling them as an ExitPolicy but they lack some information (no ip addresses or private keyword) so I'm not sure if that's the best solution.

I'm not sure of the difference - I'll read up on the microdescriptor exit policies, but it looks like we can subclass ExitPolicyLine for the microdescriptor exit policies?

  • We don't need integration tests for this (descriptor and consensus integ tests will cover it), but this module does need a lot of unit testing to cover all the weird use cases that will commonly trip it up. To start with though we need some basic smoke tests to start exercising this code.

I'll start on the unit test soon.

(Also, I'm currently adding the get_summary() method to ExitPolicy. )

comment:4 Changed 5 years ago by atagar

Status: needs_reviewneeds_revision

Hi gsathya, sorry about the delay.

I would suggest rebasing your branch onto the current master and squash what you have so far...
git rebase -i master

Generally a commit corresponds to a single task, so squashing all of your commits is appropriate (all of your work so far have been formatting fixes for the arm module we're copying).

Keeping a clean history isn't vital (I can squash or change commits myself prior to pushing), but it makes our repository easier to make sense of. Just in case you're unaware, you can easily fix your last commit with...
git commit --amend

(this takes anything that is staged and applies it to your last commit, making it handy for correcting your prior work)

exit_policies = stem.version.ExitPolicy()

Copy and paste error. ;)

Header looks good for now. It might be good to tweak it later but that is usually the last thing I do since function names and usage are likely to change a few times.

elif policy.is_ip_wildcard and self.is_port_wildcard: return False

This is just an attribute of the ExitPolicyLine so the "self.is_port_wildcard" will cause an error.

def isExitingAllowed(self):

Should follow the underscore convention.

On a quick glance the formatting looks good so we can move on to the testing. Only unit tests are needed (integ tests don't make sense for this), but we need a lot of good ones since exit policies are highly error prone. This is actually the vast majority of the work behind this task.

I'm not sure of the difference - I'll read up on the microdescriptor exit policies, but it looks like we can subclass ExitPolicyLine for the microdescriptor exit policies?

From what I understand the microdescriptor policy is simply a series of accepted and rejected port numbers or ranges (no ip addresses nor keywords like private) though you should read the spec to figure out exactly what it consists of.

My guess is that we'll want an abstract parent class (ExitPolicyLine) and a couple for the implementations (ConsensusExitPolicyLine which is what ExitPolicyLine is now and MicrodescriptorExitPolicyLine for microdescriptors). But maybe I'm wrong. Please read through the spec though and see if something else makes better sense (and better names!).

comment:5 in reply to:  4 Changed 5 years ago by gsathya

Replying to atagar:
Sorry about the delay, exams are finally over, getting back to stem hacking..

I've ported the ExitPolicy.get_summary() method.

I would suggest rebasing your branch onto the current master and squash what you have so far...

Done.

exit_policies = stem.version.ExitPolicy()

Copy and paste error. ;)

Fixed.

elif policy.is_ip_wildcard and self.is_port_wildcard: return False

This is just an attribute of the ExitPolicyLine so the "self.is_port_wildcard" will cause an error.

Fixed.

def isExitingAllowed(self):

Should follow the underscore convention.

Fixed.

My guess is that we'll want an abstract parent class (ExitPolicyLine) and a couple for the implementations (ConsensusExitPolicyLine which is what ExitPolicyLine is now and MicrodescriptorExitPolicyLine for microdescriptors). But maybe I'm wrong. Please read through the spec though and see if something else makes better sense (and better names!).

According to the microdescriptor spec - "In the first version, the microdescriptor should contain the onion-key element, and the family element from the router descriptor, and the exit policy summary as currently specified in dir-spec.txt."
I can't seem to find the exit policy summary spec in dir-spec.txt, Is there some place else I should check?

I'll get started on the unit tests once I figure out how to implement the exit policy for microdesriptors.

Thanks!

comment:6 Changed 5 years ago by atagar

I can't seem to find the exit policy summary spec in dir-spec.txt, Is there some place else I should check?

My best guess is that it's talking about exitpattern in section 2.3 (Nonterminals in router descriptors). Not sure, though.

I'll get started on the unit tests once I figure out how to implement the exit policy for microdesriptors.

Optionally we could get the normal exit policy completely done and tested, then move on to microdescriptors - up to you.

Glad to be working with you again on stem! -Damian

comment:7 in reply to:  6 Changed 5 years ago by karsten

Replying to atagar:

I can't seem to find the exit policy summary spec in dir-spec.txt, Is there some place else I should check?

My best guess is that it's talking about exitpattern in section 2.3 (Nonterminals in router descriptors). Not sure, though.

I didn't follow the previous discussion, but do you mean the exit policy summary here? https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1789

comment:8 Changed 5 years ago by atagar

I didn't follow the previous discussion, but do you mean the exit policy summary here?
https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1789

Ah ha! Indeed. Missed it since I grepped for 'exit'.

comment:9 Changed 5 years ago by gsathya

Status: needs_revisionneeds_information

https://gitweb.torproject.org/user/gsathya/stem.git/blobdiff/c6b692c42ebe09bc0f522cd242d3149abe30c460..cf5978f221d85a54511be5dc55272eb74794ed97:/stem/exit_policy.py

WIP. I don't think it's time to review it now. I just want a sanity check on the logic/API.

Still need to add exceptions, docstrings. But I have the core functionality/logic finished. Still looks hackish - I'll do some more polishing soonish.

I went with another class for microdescriptors rather than creating a new ExitPolicyLine and then subclassing that. The mircodescriptor exit policies don't care about IP's so this makes more sense.

Right now, the API looks like this

# just get the exitpolicy from the micro-descriptor and parse it
> exit_policy = new MircodescriptorExitPolicy()
> exit_policy.add("accept 80,443")
> print exit_policy
accept 80,443

# ask the user to add the exit policy rules and strip out ip address
> exit_policy = new MircodescriptorExitPolicy()
> exit_policy.add("accept *:80")
> exit_policy.add("accept 443")
> exit_policy.add("reject *:*")
> print exit_policy
accept 80,443

Thanks!

comment:10 Changed 5 years ago by gsathya

Found 2 bugs -

  • Doesnt accept a rule or policy after adding policy, but accepts a policy after adding rules.
  • Single value policies aren't recognized - need to change the API a bit

Patching it up ..

comment:11 Changed 5 years ago by gsathya

Ok this might be a little late, but in retrospect, do we really need a MicrodescriptorExitPolicy class? What is the point of it? It doesn't really provide any useful information(we can't use it to check if exiting to a particular ip is allowed - only the ports are given in the microdesc exit policy). I think it'd be better to have just one generic exit policy class. We can easily convert the Consensus exit policy to microdesc exit policy - I think ExitPolicy.get_summary() does this, otherwise I can hack it up.

comment:12 Changed 5 years ago by atagar

Ok this might be a little late, but in retrospect, do we really need a MicrodescriptorExitPolicy class? What is the point of it?

What we want from this ticket is this:
A class that represents an exit policy, which we can use to see 'can I exit to this ip/port'

That's all. Anything that achieves that objective is good by me. :)

The reason for having separate classes for descriptor and microdescriptor exit policies is that they have different formats and (I assume) different rules for what we can exit to since microdescriptors don't contain as much information. It's also more proper from an object-oriented aspect since they *are* different things.

My suggestion would be to put the bulk of the logic into an ExitPolicy class, then have just the divergent methods in the DescriptorExitPolicy and MicrodescriptorExitPolicy subclasses (or whatever you call them). This would be similar to how we do the ControlSocket...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/socket.py

Thoughts?

comment:13 in reply to:  12 Changed 5 years ago by gsathya

What we want from this ticket is this:
A class that represents an exit policy, which we can use to see 'can I exit to this ip/port'

From what I can understand, this isn't possible with the microdescriptor exit policy because it only contains a list of ports.

I prefer having a common API for both the exit policies and we can have a ExitPolicy.convert_to_microdescriptor_policy()(or sth with a better name) to create the list of a ports in the format specified by the microdesc exit policy summary spec.
There's no point in having another class for this.

This common API can be -

>>> exit_policies = stem.exit_policy.ExitPolicy()
>>> exit_policies.add("accept *:80")
>>> exit_policies.add("accept *:443")
>>> exit_policies.add("reject *:*")
>>> print exit_policies
accept *:80 , accept *:443, reject *:*
>>> print exit_policies.convert_to_mircodesc_policy()
accept 80, 443
>>> exit_policies.check("74.125.236.176", "80")
True
>>> exit_policies.check("74.125.236.176", "22")
False

comment:14 Changed 5 years ago by atagar

From what I can understand, this isn't possible with the microdescriptor exit policy because it only contains a list of ports.

That seems weird. What is the use of the microdescriptor exit policy then? The point of an exit polity is to let the client know if you're willing to service a particular destination or not. If it doesn't answer that question, then how is it used by a tor client?

We probably need to answer this question before we can figure out a decent object model. Sebastian might be a good person to ask since I vaguely recall that he was involved with the introduction of microdescriptor exit policies.

Note that this doesn't block us from making a class for *descriptor* exit policies. Making that and tests would be a fine first step (and be something that we could finally tie into the Descriptor class).

comment:15 in reply to:  14 Changed 5 years ago by gsathya

That seems weird. What is the use of the microdescriptor exit policy then? The point of an exit polity is to let the client know if you're willing to service a particular destination or not. If it doesn't answer that question, then how is it used by a tor client?

I'm pretty clueless here.

Note that this doesn't block us from making a class for *descriptor* exit policies. Making that and tests would be a fine first step (and be something that we could finally tie into the Descriptor class).

Okay, let me do that. Also do you want me to check the exit policy string with a regexp for errors? I'm guessing this would take a while if we have to match _every_ other pattern _and_ check for correctness(like octets should be less than 256 etc)?

comment:16 Changed 5 years ago by atagar

Okay, let me do that. Also do you want me to check the exit policy string with a regexp for errors? I'm guessing this would take a while if we have to match _every_ other pattern _and_ check for correctness(like octets should be less than 256 etc)?

Yes and no. We should be validating the ExitPolicy string, much as we do with everything else, and giving an error if any of it is invalid (malformed ips, invalid port numbers, etc). However, we shouldn't be doing this via a regex.

You can find some helper functions for checking validity in the connection util...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/util/connection.py#l15

comment:17 in reply to:  16 Changed 5 years ago by gsathya

Replying to atagar:

You can find some helper functions for checking validity in the connection util...
https://gitweb.torproject.org/stem.git/blob/HEAD:/stem/util/connection.py#l15

Ah, perfect! I've changed these a bit - https://gitweb.torproject.org/user/gsathya/stem.git/commitdiff/7e27dd70f83699f8ac10dbdaefae9be097fa9728 and used it in - https://gitweb.torproject.org/user/gsathya/stem.git/commitdiff/86c96aafb43015ed94da437d85576364279812d0

I'm currently adding more unit tests ( https://gitweb.torproject.org/user/gsathya/stem.git/blob/bf8d24ae8ae72dd1a5995b82939616feffedaae5:/test/unit/exit_policy.py ). I'll change the status to needs_review after I add a couple of more tests, but if you can go through these changes and let me know if they look ok, then that would be great.

Thanks!

comment:18 in reply to:  14 Changed 5 years ago by karsten

Replying to atagar:

That seems weird. What is the use of the microdescriptor exit policy then? The point of an exit polity is to let the client know if you're willing to service a particular destination or not. If it doesn't answer that question, then how is it used by a tor client?

Nick could answer this better, but AIUI, the purpose of an exit policy summary is to help the client make good path selection decisions without bloating directories. With any summary, there may be false positives (connections that the relay doesn't support but which the summary implied would work) and false negatives (connections that the relay would support but which aren't contained in the summary). The client will learn if the chosen relay supports the connection it attempts to make after building the circuit. The idea is to stop wasting large amounts of client bandwidth on downloading descriptors, and exit policies can waste a lot of space there.

comment:19 Changed 5 years ago by atagar

Nick could answer this better, but AIUI, the purpose of an exit policy...

Yup, I realize that the purpose of the microdescriptors is to be smaller. However, we should get clarification from Sebastian, Nick, or someone else knowledgeable about how tor utilizes those policies since they're a subset of the information in the full policies. It would be a pity for us to develop a MicrodescriptorExitPolicy class only to base it on incorrect assumptions and need to rewrite it later.

With any summary, there may be false positives

With the normal exit policy these false positives don't exist. Does microdescriptor based path selection try a path and, if it isn't allowed, construct another circuit? Do they somehow say 'look at the descriptor instead' when they can't represent the exit policy? Something else?

comment:20 Changed 5 years ago by karsten

Here's what Nick wrote in dir-spec.txt based on my microdescriptor specification patches (#3038):

[With microdescriptors, clients don't learn exact exit policies:
clients can only guess whether a relay accepts their request, try the
BEGIN request, and might get end-reason-exit-policy if they guessed
wrong, in which case they'll have to try elsewhere.]

comment:21 Changed 5 years ago by atagar

Perfect, thanks Karsten.

comment:22 Changed 5 years ago by gsathya

Status: needs_informationneeds_review

comment:23 Changed 5 years ago by gsathya

As Karsten said, the microdesc exit policies can only *guess* if the exiting to a particular ip+port is allowed. I feel we must warn the users that the MicrodescriptorExitPolicy.check() can only guess, but I'm not sure how to do that?

comment:24 Changed 5 years ago by atagar

Status: needs_reviewneeds_revision

Damn, I screwed up on this. Arm is parsing a torrc's exit policy rather than the dir-spec's exitpattern...
https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l924

They're similar, but not the same. An exitpattern is better defined and stricter in what it'll accept. For instance, ports are not optional and it does not contain the 'private' alias.

I'll take over this task and rewrite the parser when I'm able (unless you're eager to dig into writing a strict parser). Sorry about the confusion.

comment:25 Changed 5 years ago by neena

Cc: neenaoffline@… added

comment:26 Changed 5 years ago by atagar

Guess from the added cc that there's some interest in this ticket so here's a quick status report. I spent this weekend writing an ExitPolicyRule class and its associated tests. Unlike arm's implementation, this supports IPv6 and masks...
https://gitweb.torproject.org/user/atagar/stem.git/commitdiff/aa0e4c0415561161b76c65d39f480549bcb131e3

Next I plan to revise the ExitPolicy class and expand its tests, though between Strawberry Festival and Defcon I'll be a bit distracted these next couple weeks. Cheers! -Damian

comment:27 Changed 5 years ago by atagar

Resolution: implemented
Status: needs_revisionclosed

Merged and pushed the exit_policy branch...
https://gitweb.torproject.org/stem.git/commitdiff/78a997ea34aaee8cc4515382fb0f0ec4814c32a4?hp=e68adb7fdd465e1780bc4ee1243e0c226add77d9

The ServerDescriptor class is now using it, which by extension means that the ExitPolicy class is parsing (and validating) all of the policies within your cached descriptors when you run the integ tests. Worked on the first try with was a pleasant surprise. Probably thanks to some pretty extensive and anal unit testing. Those tests... didn't exactly work on the first try. Nor the second. Nor the third... ;)

Note: See TracTickets for help on using tickets.