Opened 5 years ago

Last modified 9 months ago

#10609 needs_review defect

aggregate.py should ignore empty scan-data files

Reported by: ln5 Owned by:
Priority: Medium Milestone:
Component: Core Tor/Torflow Version:
Severity: Normal Keywords: easy intro
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

ERROR[Sat Jan 11 06:40:03 2014]:Exception during aggregate: empty string for float()
Traceback (most recent call last):
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 876, in <module>
    main(sys.argv)
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 424, in main
    timestamp = float(fp.readline())
ValueError: empty string for float()

Could it be that we have a race here?

$ cat .git/refs/heads/master 
229e5e64680a1a3caf496ce2c1e5d064b5edd080
$ git submodule status
 4fdd2031e6b231ed4bbaa79940f67e9b8f691382 TorCtl (2013-10-16)

Child Tickets

Change History (16)

comment:1 Changed 5 years ago by ln5

Component: - Select a componentTorflow
Owner: set to aagbsn

comment:2 Changed 5 years ago by aagbsn

can you figure out which file was being processed at the time (and perhaps attach it here?)

comment:3 Changed 5 years ago by ln5

AFAICT from reading code all bws-*-done* are being processed and there's nothing in the traceback indicating which it's stumbling over. I have 1874 matching files in my system atm.

Furthermore, this error has shown once and that was several hours ago. The problematic file is apparently not problematic any more.

comment:4 Changed 5 years ago by aagbsn

Perhaps a bws-* file was still open and being written to while aggregate.py was trying to read it? We could add exclusive locking (see fcntl) to make sure that the bws-* files are written atomically.

comment:5 Changed 5 years ago by ln5

For the record, I'm seeing this again.

ERROR[Wed Apr 02 08:40:04 2014]:Exception during aggregate: empty string for float()
Traceback (most recent call last):
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 876, in <module>
    main(sys.argv)
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 424, in main
    timestamp = float(fp.readline())
ValueError: empty string for float()

comment:6 Changed 5 years ago by ln5

Again.

ERROR[Wed Apr 30 21:40:04 2014]:Exception during aggregate: empty string for float()
Traceback (most recent call last):
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 876, in <module>
    main(sys.argv)
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 424, in main
    timestamp = float(fp.readline())
ValueError: empty string for float()

comment:7 Changed 4 years ago by micah

I've started getting this now on longclaw. It appears that I'm not submitting bwauth data because of this. Its slightly different error, but looks like the exact same part of the code:

ERROR[Sat Jan 24 21:45:02 2015]:Exception during aggregate: could not convert string to float:
Traceback (most recent call last):
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 876, in <module>
    main(sys.argv)
  File "/home/bwscanner/torflow/NetworkScanners/BwAuthority/aggregate.py", line 424, in main
    timestamp = float(fp.readline())
ValueError: could not convert string to float:

I can't tell which scanner, or file it is operating, as there are four scanners and hundreds of files. Perhaps it would be useful to add some debug information when this happens to try and narrow it down?

comment:8 Changed 4 years ago by micah

I just added an exception to the part of the code to print out which file it is:

  for da in argv[1:-1]:
    # First, create a list of the most recent files in the                                                                                                                                                                                                                                                                     
    # scan dirs that are recent enough                                                                                                                                                                                                                                                                                         
    for root, dirs, f in os.walk(da):
      for ds in dirs:
        if re.match("^scanner.[\d+]$", ds):
          newest_timestamp = 0
          for sr, sd, files in os.walk(da+"/"+ds+"/scan-data"):
            for f in files:
              if re.search("^bws-[\S]+-done-", f):
                try:
                  fp = file(sr+"/"+f, "r")
                  slicenum = sr+"/"+fp.readline()
                  timestamp = float(fp.readline())
                  fp.close()
                except ValueError:
                  print("ValueError on file: "+f)

and then I ran it, and it said:

ValueError on file: bws-41.4:42.1-done-2015-01-21-17:03:13

comment:9 Changed 4 years ago by micah

I included also the information about which scanner directory was in play, by adding:

print("slicenum: "+slicenum)

that showed me:

ValueError on file: bws-41.4:42.1-done-2015-01-21-17:03:13
slicenum: /home/bwscanner/torflow/NetworkScanners/BwAuthority/data/scanner.3/scan-data/

and so looking at that file, I found:

bwscanner@longclaw:~/torflow/NetworkScanners/BwAuthority/data/scanner.3/scan-data$ ls -l bws-41.4:42.1-done-2015-01-21-17:03:13
-rw-r--r-- 1 bwscanner bwscanner 0 Jan 21 17:03 bws-41.4:42.1-done-2015-01-21-17:03:13

aha! Its zero bytes... probably was truncated before it was written when the system was rebooted.

I removed that file and now I get no cron errors. So perhaps the code should be a little smarter here about dealing with zero byte files?

comment:10 Changed 19 months ago by teor

Severity: Blocker
Summary: aggregate.py unhappy with input dataaggregate.py should ignore empty scan-data files

comment:11 Changed 19 months ago by teor

Keywords: easy intro added
Owner: changed from aagbsn to tom
Severity: BlockerNormal
Status: newassigned

comment:12 Changed 19 months ago by teor

Owner: changed from tom to teor

Please see my github branch bug10609, which ignores, logs an error, and tries to remove files that fail parsing.

comment:13 Changed 19 months ago by teor

Status: assignedneeds_review

comment:14 Changed 19 months ago by teor

Priorities and Severities in torflow are meaningless, setting them all to Medium/Normal.

comment:15 Changed 9 months ago by teor

Owner: teor deleted
Status: needs_reviewassigned

Disowning this ticket, I don't think we'll ever fix torflow.

comment:16 Changed 9 months ago by teor

Status: assignedneeds_review
Note: See TracTickets for help on using tickets.