Opened 3 years ago

Closed 3 years ago

#18881 closed defect (user disappeared)

speed up is_match() of stem/exit_policy.py

Reported by: toralf Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I turns out to me, that the following code snipped

        for conn in connections:
          raddr, rport, lport = conn.remote_address, conn.remote_port, conn.local_port
          if policy.can_exit_to(raddr, rport):
            Curr.setdefault(rport, []).append(str(lport) + ':' + raddr)

takes much longer than expected due to the fact, that policy.can_exit() calls is_match() which itself makes a a lot of checks eg.:

    # validate our input and check if the argument doesn't match our address type

    if address is not None:

For a relay having about 5,000 connections and about 100 ExitPolicy rules thoses checks are repeated 500,000 times w/o too much profit, or ?

I do wonder if is_match() could get a wrapper which just does the given address+port test against the exit policy ?

Child Tickets

Change History (4)

comment:1 Changed 3 years ago by atagar

Hi toralf, which part of is_match() in particular are you concerned with? An 'is None' check, even done repeatedly, should be negligible. Feel free to attach a patch if there's something you'd care to propose.

comment:2 Changed 3 years ago by toralf

Hi,

"commenting out this section :

    # validate our input and check if the argument doesn't match our address type

    """
    if address is not None:
      address_type = self.get_address_type()
      if stem.util.connection.is_valid_ipv4_address(address):
        if address_type == AddressType.IPv6:
          return False
      elif stem.util.connection.is_valid_ipv6_address(address, allow_brackets = True):
        if address_type == AddressType.IPv4:
          return False

        address = address.lstrip('[').rstrip(']')
      else:
        raise ValueError("'%s' isn't a valid IPv4 or IPv6 address" % address)

    if port is not None and not stem.util.connection.is_valid_port(port):
      raise ValueError("'%s' isn't a valid port" % port)
    """

speeds up the execution time by a factor of 2 and more.
So I do wonder if both checks have to be made always ?

Well, the background is to have a reasonable output by this script https://github.com/toralf/torutils/blob/master/ps.py w/o powering a whole CPU by 100% all the time (as it is currently doing).

comment:3 Changed 3 years ago by atagar

Hmmm. Those checks aren't purely input validation. If they were then adding a 'bypass validation' argument makes sense for such a performance boon. But if you drop this whole block you'll get the wrong answer when matching IPv4 exit policies with IPv6 addresses and vice versa (that's what most of the code does).

Per chance do you know what it is in here that's slow? If a single call (like is_valid_ipv4_address()) is the bottleneck then improving that might be a better route forward.

comment:4 Changed 3 years ago by atagar

Resolution: user disappeared
Status: newclosed

Think I'm gonna resolve this. Feel free to reopen if you'd care to help with digging in further.

Note: See TracTickets for help on using tickets.