Opened 11 months ago

Closed 2 months ago

#24153 closed defect (fixed)

Make DescriptorCollector resume previously aborted downloads

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: irl Sponsor:

Description

Today I noticed that DescriptorCollector skips files when it runs into a temporary file from a previously aborted download. Example:

[WARN] Cannot fetch remote file server-descriptors-2017-10.tar.xz from https://collector.torproject.org.  Skipping that file.
java.nio.file.FileAlreadyExistsException: backup/archive/relay-descriptors/server-descriptors/.server-descriptors-2017-10.tar.xz
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
        at java.nio.file.Files.newOutputStream(Files.java:216)
        at java.nio.file.Files.copy(Files.java:3016)
        at org.torproject.descriptor.index.DescriptorIndexCollector.fetchRemoteFiles(DescriptorIndexCollector.java:123)
        at org.torproject.descriptor.index.DescriptorIndexCollector.collectDescriptors(DescriptorIndexCollector.java:83)
        at Archive.main(Archive.java:12)

This is rather unexpected. The main reason for using a temporary file is to avoid overwriting an existing file and risking to leave the user with only a broken file if the download gets interrupted. But the temporary file should not prevent us from making a download that may succeed.

Let's change the behavior here to delete any temporary files we run into or simply overwriting them.

Child Tickets

Change History (11)

comment:1 Changed 8 months ago by karsten

iwakeh, does the above make sense? (I ran into this once again today while testing something.)

comment:2 Changed 8 months ago by iwakeh

Yes. The file should just be written anew no matter if a previous temp file existed, b/c that is in an undefined state anyway. It might also be an option to use Files.createTempFile and friends?

comment:3 Changed 8 months ago by karsten

Files.createTempFile sounds promising! Want to write a patch or a quick sketch how that would work here?

comment:4 Changed 6 months ago by iwakeh

Owner: changed from metrics-team to iwakeh
Status: newaccepted

comment:5 Changed 6 months ago by irl

Cc: metrics-team added

Adding metrics-team to cc

comment:6 Changed 4 months ago by iwakeh

Owner: changed from iwakeh to metrics-team
Status: acceptedassigned

Won't be starting work on this next week. -> re-assigning to metrics-team.

comment:7 Changed 3 months ago by karsten

Owner: changed from metrics-team to karsten
Status: assignedaccepted

I wrote a tiny patch in commit 31c9250 of my task-24153 branch. That patch only fixes the bug that we skipped a remote file that we did not finish downloading before. Please review. Thanks!

What this patch does not do is clean up temporary files that we for whatever reason do not reattempt to download in a later run. That seems more like a (useful) enhancement than a defect, though. Using Files.createTempFiles seems like a good way to implement that, though I'm not entirely sure how to turn a temporary file into a permanent one using that approach. I'd say, let's create a new ticket for that and just fix this bug now.

comment:8 Changed 3 months ago by karsten

Status: acceptedneeds_review

comment:9 Changed 2 months ago by irl

Reviewer: irl

Will start review this week.

comment:10 Changed 2 months ago by irl

Status: needs_reviewmerge_ready

Looks good to me.

For the new ticket: Files.createTempFile only creates the empty file and gives you a Path. The file is actually permanent (although the operating system may clean it up once the process exits) unless you open it with DELETE_ON_CLOSE on use .deleteOnExit(). (ref) One benefit might be that using tmpfs for intermediate files reduces disk I/O and speeds things up. If we're memory constrained and it becomes an issue though, we should make sure that the default temporary directory is overridden to a location on disk.

comment:11 Changed 2 months ago by karsten

Resolution: fixed
Status: merge_readyclosed

Created #26699 for the Files.createTempFile() idea. Also rebased and pushed my branch above to resolve this issue. Closing. Thanks!

Note: See TracTickets for help on using tickets.