Use multiple threads to parse descriptors

The following idea came up when I looked a bit into #17831 (moved) to speed up metrics-lib.

When we read and parse descriptors from disk, we're using a single thread to read and parse descriptors. It's a daemon thread and not the application's main thread, so if the application's thread is busy processing parsed descriptors we're at least using two threads. But we could parallelize even more by using separate threads for reading and parsing and even using multiple threads for reading and/or for parsing. I'll leave the I/O part to #17831 (moved) and focus on the multi-threaded parsing part here.

I wrote a little patch that measures time spent on reading tarball contents in DescriptorReaderImpl#readTarballs() and then extended that by moving descriptor parsing code to a separate class that implements Runnable and that gets executed by an ExecutorService. I initialized that executor with Executors.newFixedThreadPool(n) for n = [2, 4, 8, 16, 32, 64]. I also tried n = 1, but ran out of memory due to a major issue in my simple patch: it reads all tarball contents to memory when creating Task instances even if they cannot be executed anytime soon. What we should do is block the reader thread when it realizes that the executor is already full. I'm attaching my patch, but only to avoid starting from zero the next time. It needs more work.

separate parser threads	read `.tar` file (s)	parse `.tar` file (s)	read `.tar.xz` file (s)	parse `.tar.xz` file (s)
none (current code)	35	159	9	162
2	36	42	8	126
4	41	13	7	96
8	42	11	6	35
16	41	11	10	28
32	45	13	7	34
64	41	13	6	38

These results show that 4 threads speed up the parse time for .tar files by a factor 12 after which there's no visible improvement, and 8 threads speed up the parse time for .tar.xz files by a factor 4.6. Just from these numbers I'd suggest using 8 threads by default and making this number configurable for the application. But: needs more work.

My recommendation would be to look more into making parsing multi-threaded and save #17831 (moved) for later. It seems like parsing is the lower-hanging fruit.

Note that reading the same tarball in extracted form using the current code took 271 seconds. In that case the lower-hanging fruit might be I/O improvements, not multi-threaded parsing. But my hope is that not many applications extract tarballs containing over 800,000 files and read them using DescriptorReader, especially not if they could as well read the tarball directly.

Suggestions welcome! Otherwise I might pick this up again and move it forward whenever there's time.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information