Enable webstats to process large (> 2G) logfiles
Quote from #25161 (moved), comment 12: Looking at the stack trace and the input log files, I noticed that two log files are larger than 2G when decompressed:
3.2G in/webstats/archeotrichon.torproject.org/dist.torproject.org-access.log-20160531
584K in/webstats/archeotrichon.torproject.org/dist.torproject.org-access.log-20160531.xz
2.1G in/webstats/archeotrichon.torproject.org/dist.torproject.org-access.log-20160601
404K in/webstats/archeotrichon.torproject.org/dist.torproject.org-access.log-20160601.xz
I just ran another bulk import with just those two files as import and ran into the same exception.
It seems like we shouldn't attempt to decompress these files into a byte[]
in FileType.decompress
, because Java can only handle arrays with up to 2 billion elements: https://en.wikipedia.org/wiki/Criticism_of_Java#Large_arrays . Maybe we should work with streams there, not byte[]
.