Opened 8 years ago

Closed 8 years ago

#4127 closed defect (fixed)

get_search_urls_for_filetype(self, filetype, number) does not return a tuple

Reported by: aagbsn Owned by: mikeperry
Priority: Medium Milestone:
Component: Core Tor/Torflow Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

get_search_urls_for_filetype(self, filetype, number) does not return a tuple and is used inconsistently

  def refill_targets(self):
    for ftype in self.scan_filetypes:
      targets_needed = self.results_per_type - len(self.targets.bykey(ftype))
      if targets_needed > 0:
        plog("NOTICE", self.proto+" scanner short on "+ftype+" targets. Adding more")
        map(self.add_target, self.get_search_urls_for_filetype(ftype,targets_needed))

for comparison, see SearchBasedHTTPTest.get_targets():

  def get_targets(self):
    '''
    construct a list of urls based on the wordlist, filetypes and protocol.
    '''
    plog('INFO', 'Searching for relevant sites...')

    urllist = set([])
    for filetype in self.scan_filetypes:
      urllist.update(map(lambda x: (x, filetype), self.get_search_urls_for_filetype(filetype, 

    return list(urllist)

Testing the following modification:

index be0dde0..aabb3e4 100755
--- a/NetworkScanners/ExitAuthority/soat.py
+++ b/NetworkScanners/ExitAuthority/soat.py
@@ -1883,9 +1883,12 @@ class SearchBasedHTTPTest(SearchBasedTest, BaseHTTPTest):
   def refill_targets(self):
     for ftype in self.scan_filetypes:
       targets_needed = self.results_per_type - len(self.targets.bykey(ftype))
+      urllist = set([])
       if targets_needed > 0:
         plog("NOTICE", self.proto+" scanner short on "+ftype+" targets. Adding more")
-        map(self.add_target, self.get_search_urls_for_filetype(ftype,targets_needed))
+        #map(self.add_target, self.get_search_urls_for_filetype(ftype,targets_needed))
+        urllist.update(map(lambda x: (x, ftype), self.get_search_urls_for_filetype(ftype, tar
+        map(self.add_target, urllist)

This issue was raised in #4097 but should have its own ticket.

Child Tickets

Change History (3)

comment:1 Changed 8 years ago by aagbsn

Here's an example of the error this causes:

INFO[Fri Sep 23 06:52:52 2011]:HTTPTest decided to fetch 1 urls of types: [u't']
INFO[Fri Sep 23 06:52:52 2011]:[(u'h', u't')]
INFO[Fri Sep 23 06:52:52 2011]:Conducting an http test with destination h
DEBUG[Fri Sep 23 06:52:52 2011]:Starting request for: h
Traceback (most recent call last):
  File "soat.py", line 379, in http_request
    reply = opener.open(request)
  File "/usr/lib/python2.6/urllib2.py", line 383, in open
    protocol = req.get_type()
  File "/usr/lib/python2.6/urllib2.py", line 244, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: h
INFO[Fri Sep 23 06:52:52 2011]:Completed HTTP Reqest for: h
DEBUG[Fri Sep 23 06:52:52 2011]:Got last exit of E7C49248095E77968BA4390DA2BBAC9D7E4560F3
NOTICE[Fri Sep 23 06:52:52 2011]:$E7C49248095E77968BA4390DA2BBAC9D7E4560F3 had error -15.0 fetching content for h
DEBUG[Fri Sep 23 06:52:52 2011]:Starting request for: h
Traceback (most recent call last):
  File "soat.py", line 379, in http_request
    reply = opener.open(request)
  File "/usr/lib/python2.6/urllib2.py", line 383, in open
    protocol = req.get_type()
  File "/usr/lib/python2.6/urllib2.py", line 244, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: h

comment:3 Changed 8 years ago by aagbsn

Resolution: fixed
Status: needs_reviewclosed
Note: See TracTickets for help on using tickets.