Changes between Version 7 and Version 8 of Ticket #15844, comment 21


Ignore:
Timestamp:
Jun 17, 2015, 11:57:33 AM (4 years ago)
Author:
leeroy
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #15844, comment 21

    v7 v8  
    11Interestingly a database (or full-text search) may not be the only solution here. Another solution is to use in-memory compression (like zram) with off-heap, disk-based, data structures for the updater. The cpu is underused during updates. If the server has SSD storage even better.
    22
    3 Okay so I've had more time to look into this. First, it's a given that a schema exists (semi-structured works nice). Now, don't hate me for this, but it doesn't look like the database is the solution it appears.  Here's why I'm skeptical.
     3(June 17) Okay so I've had more time to look into this. First, it's a given that a schema exists (semi-structured works nice). Now, don't hate me for this, but it doesn't look like the database is the solution it appears. Here's why I'm skeptical.
    44
    55 * ''__The two JVM's negatively affect the rest of the server.__ ''The documentation encourages the use of 4GB of heap for each of the updater and web server. I understand why it's done. The current implementation relies on heap allocated objects. The problem is that, in general, this means the two processes combined ''will'' use 8GB. This is very wasteful compared to the actual requirements. The JVM will happily hog this memory and only invoke the GC if forced. Which means the GC may not be invoked until it's too late for server performance. Imagine having Postgres and Apache running (they both depend on the OS for memory management).
    66
    7 
    87 * __''Torproject is agile (isn't it?).''__ It isn't a simple hack to add a database. To effectively use the database would require rewriting a lot code. If you don't rewrite a lot of code the database won't perform well, and quality suffers.
    9 
    108
    119 * __''Onionoo already has a database.''__ The current implementation can be improved. It already provides a schemaless, semi-structured db. It already provides the best you can expect from a separate db process. Simply moving to disk based data structures provides relief from heap dependency. It also enables a merging of data processing steps. Fewer steps is less time spent importing and updating. Onionoo can do what JSONB does for postgres.
    1210
    13 
    14 
    1511Maybe I'm being too ambitious?