Changes between Version 6 and Version 7 of Ticket #15844, comment 21


Ignore:
Timestamp:
Jun 17, 2015, 11:56:41 AM (4 years ago)
Author:
leeroy
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #15844, comment 21

    v6 v7  
    1 Great thanks for clearing that up. I had resorted to reading the multiple-spec-documents and Onionoo code to make sure my understanding of the data is correct. No big deal though.
     1Interestingly a database (or full-text search) may not be the only solution here. Another solution is to use in-memory compression (like zram) with off-heap, disk-based, data structures for the updater. The cpu is underused during updates. If the server has SSD storage even better.
    22
    3 -
     3Okay so I've had more time to look into this. First, it's a given that a schema exists (semi-structured works nice). Now, don't hate me for this, but it doesn't look like the database is the solution it appears.  Here's why I'm skeptical.
    44
    5 > [.. ..] converting partial fingerprints between hex and base64 might be problematic because of 4 vs. 6 bits per character. I'd say never mind the storage and just put in everything you want into the database.
     5 * ''__The two JVM's negatively affect the rest of the server.__ ''The documentation encourages the use of 4GB of heap for each of the updater and web server. I understand why it's done. The current implementation relies on heap allocated objects. The problem is that, in general, this means the two processes combined ''will'' use 8GB. This is very wasteful compared to the actual requirements. The JVM will happily hog this memory and only invoke the GC if forced. Which means the GC may not be invoked until it's too late for server performance. Imagine having Postgres and Apache running (they both depend on the OS for memory management).
    66
    7 If the function index is computed on the fingerprint column, then prefix search should be identical with the right plan. Nevertheless, as you say, if storing all three is an advantage for query-time, that will be preferred to minimalism.
    87
    9 -
     8 * __''Torproject is agile (isn't it?).''__ It isn't a simple hack to add a database. To effectively use the database would require rewriting a lot code. If you don't rewrite a lot of code the database won't perform well, and quality suffers.
    109
    11 > Ordering matters if users ask for a specific ordering using the `order` parameter.  Of course, if they don't pass that, you're free to return results in whatever order you want.
    1210
    13 That's good to hear. By strictness, I wasn't sure about cases where pgnosql returns an entity's key:value pairs out of order. Suppose I get key, and value sets, defining ip-extra-info for a relay, like the optional ip-based data (geolocation,rdns,as). I would like to ensure I don't cause havoc if the ordering (within ordered responses) were unexpected. Similarly, appending new key:value pairs is less costly than insertion. If this isn't considered ideal, I'll use alternatives.
     11 * __''Onionoo already has a database.''__ The current implementation can be improved. It already provides a schemaless, semi-structured db. It already provides the best you can expect from a separate db process. Simply moving to disk based data structures provides relief from heap dependency. It also enables a merging of data processing steps. Fewer steps is less time spent importing and updating. Onionoo can do what JSONB does for postgres.
    1412
    15 -
    1613
    17 > Hope that helps!  Thanks for working on this!
    1814
    19 Don't sweat it. Helping you is helping me ;)
    20 
    21 -
    22 
    23 ~~Now, and I know this is ''indirectly'' related to search, but I noticed the''' '''`last_changed_address_or_port` key-value appears to be based on an address change.~~ ~~It's my understanding from dir-spec and [https://lists.torproject.org/pipermail/tor-dev/2015-April/008674.html tor-dev discussion] around fallback-directories that this could be improved.~~ Nevermind, I see you store address and port together in a lastaddresses in one part and named differently in another. I think Apple has spoiled me.
    24 
    25 Is `last_changed_address_or_port` based on the __dir-port__ and __dir-address (possibly same as or-address)__ or __or-port__ and __or-address__ ? (I keep all three address types) From tor-dev and the proposals for fallback-dir it's based on or-address and or-port. But the protocol page for Onionoo documents considering both __dir__ and __or__.
    26 
    27 (Aside) If Onionoo were to include, ''in search'', complementary data to `last_changed_address_or_port`  it would more accurately enable clients to determine candidate  fallback-dir for themselves. Think about it.
    28 
    29 (More important aside) Interestingly a database (or full-text search) may not be the only solution here. Another solution is to use in-memory compression (like zram) with off-heap, disk-based, data structures for the updater. The cpu is underused during updates. If the server has SSD storage even better. I'll look into this more in the next couple days after taking a deployed server to the limit. (Snapshots make it easy to test multiple versions of a server so it won't derail the db)
    30 
    31 On the subject of db or off-heap solutions, why is the virtualbox allocated 4GB when the host has 8GB of ram?
     15Maybe I'm being too ambitious?