Opened 4 years ago

Closed 4 years ago

#14780 closed defect (fixed)

very large file

Reported by: iwakeh Owned by:
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Once in a while the backend creates these huge files of several MB.
Here the log excerpt:

o.t.o.d.DocumentStore:474 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=57396605

When inspecting the file there is regular data and lots of binary garbage:

"desc_published":"2015-01-25 02:23:50","last_restarted":"2015-01-22 23:19:46","bandwidth_rate":196608,"bandwidth_burst":393216,"observed_bandwidth":61090,"advertised_bandwidth":61090,"exit_policy":["reject *:*"],"contact":"ContactInfo williamsantiago at protonmail dot ch  - 1B9iz8sYqN7vqCi9MkidmmDLvwQwXyjZDL","platform":"Tor 0.2.4.23 on Windows 8","is_relay":true,"running":false,"nickname":"gandules","address":"190.88.225.164","or_addresses_and_ports":[],"first_seen_millis":1421337600000,"last_seen_millis":1421337600000,"or_port":444,"dir_port":0,"relay_flags":["Running","Valid"],"consensus_weight":20,"default_policy":"reject","port_list":"1-65535","last_changed_or_address_or_port":1421337600000,"recommended_version":true,"exit_addresses":{},"latitude":12.1,"longitude":-68.9167,"country_code":"cw","country_name":"Cura���

The logs show that this file accumulated during several runs (size at the end of each line):

Attempting to store very large document file: path='/home/onionoo/onionoo-master/out/details/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=1654280
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/out/details/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=4961024
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=2126739
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=2126739
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/out/details/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=14881256
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=2126739
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=2126739
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=6378267
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/out/details/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=44641952
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=6378267
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=6378269
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=19132853
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/out/details/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=133924042
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=19132853
 Attempting to store very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=19132853
 Retrieved very large document file: path='/home/onionoo/onionoo-master/status/details/1/3/13CE6B3790B206FD6A63023A39D3FB9F1CCBDC7C', bytes=57396605

Did such a behavior occur before?

Child Tickets

Attachments (3)

Rewrite.java (1.6 KB) - added by karsten 4 years ago.
Java class to diagnose the problem.
RewriteReloaded.java (1.8 KB) - added by iwakeh 4 years ago.
hopefully correct rewriting
0001-Fix-task-14780-file-encoding-problem-that-blows-up-f.patch (2.9 KB) - added by iwakeh 4 years ago.
patch for 14780

Download all attachments as: .zip

Change History (9)

comment:1 Changed 4 years ago by karsten

Ah, that looks exactly like the bug I fixed in 361c56c.

Here's what I think happens: whenever that relay's details status file is read and rewritten, the UTF-8 characters in the country name double in size. That's a problem of the way we read files (using FileInputStream) and write files (using FileWriter). I didn't fix that bug, though. I just made sure it doesn't get triggered anymore.

What I fixed in 361c56c was that I escaped UTF-8 characters in details status files. That way, even if a file is read and rewritten it doesn't change in size, because there are no UTF-8 characters.

What this patch doesn't fix, is if there are already UTF-8 characters in details status files. What I did (on the Onionoo mirror that also had this problem) was manually remove those JSON fields (there were a few dozen of them).

By the way, this case is rare, because in most cases there will be new GeoIP information for details status files. But in this case, the relay was not listed as running for a while (since January 15), but it's still publishing descriptors (last on January 25). And we don't resolve non-running relays using the GeoIP database.

So, you could either make sure you're running 361c56c and edit the JSON file to remove the country_name field, or you could try to fix that other bug where reading and rewriting a file changes its size.

I'm attaching the Java class that I used to diagnose the problem.

Changed 4 years ago by karsten

Attachment: Rewrite.java added

Java class to diagnose the problem.

comment:2 Changed 4 years ago by iwakeh

The attached 'RewriteReloaded.java' fixes the problem here.

Please, verify.

Changed 4 years ago by iwakeh

Attachment: RewriteReloaded.java added

hopefully correct rewriting

comment:3 Changed 4 years ago by iwakeh

Status: newneeds_review

I have the attached patch now running on my mirror.

Changed 4 years ago by iwakeh

patch for 14780

comment:4 Changed 4 years ago by karsten

Cool! Pushed to my branch task-14780 with two trivial fixes. Please let me know how the patch works on your mirror over the next two or three days, and then I'll merge to master. Thanks!

comment:5 Changed 4 years ago by iwakeh

Things look fine here.

comment:6 Changed 4 years ago by karsten

Resolution: fixed
Status: needs_reviewclosed

Great, pushed to master. Resolving. Thank you!

Note: See TracTickets for help on using tickets.