Opened 5 years ago

Closed 5 years ago

#11350 closed enhancement (implemented)

Extend Onionoo's lookup parameter to give out relays/bridges that haven't been running in the past week

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: arma, rndm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We're currently only giving out data for relays or bridges that have been running in the past 7 days. The main reason for this is performance. We could make an exception for specific lookups by fingerprint or hashed fingerprint. Just keeping all fingerprints in memory might not be too bad. We still wouldn't support searches or other parameters for those relays.

However, this probably requires moving to faster hardware than we currently have with poor overloaded yatei.

Child Tickets

Change History (4)

comment:1 Changed 5 years ago by karsten

Cc: arma rndm added

I may have found a way to implement this feature without making yatei explode. See branch task-11350 in my public branch.

From the new protocol specification page:

fingerprint

Return only the relay with the parameter value matching the fingerprint or the bridge with the parameter value matching the hashed fingerprint. Fingerprints must consist of 40 hex characters, case does not matter. This parameter is quite similar to the lookup parameter with two exceptions: (1) the provided relay fingerprint or hashed bridge fingerprint must not be hashed (again) using SHA-1; (2) the response will contain any matching relay or bridge regardless of whether they have been running in the past week. Added on April 20, 2014.

The implementation is a bit of a hack though. I'll sleep over it and maybe merge and deploy tomorrow.

Cc'ing arma and rndm, because we may need this feature for the Tor relay challenge.

comment:2 Changed 5 years ago by wfn

This is not really relevant to the relay challenge task per se, so anyone can safely skip this comment.

Maybe orthogonal, but can't hurt, so fwiw, re:

+    /* TODO This is an evil hack to support looking up relays or bridges
+     * that haven't been running for a week without having to load
+     * 500,000 NodeStatus instances into memory.  Maybe there's a better
+     * way?  Or do we need to switch to a real database for this? */

Karsten, *if* you decide to do some benchmarking using a database (using whatever database schema appropriate), I'd very much advise to look over the following document/tutorial:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Note that this is not considered to be any kind of 'postgres hacking'; this can be done in a purely wheezy/stable setting, and is completely normal practice. The postgres defaults in linux systems are somewhat.. conservative. e.g. changing effective_cache_size to up to 75% of the overall system's memory is normal. shared_buffers default in linux is usually 32MB or so. (To elevate this, you do need to change /etc/sysctl.conf (to raise SHMMAX)[1], but again, this should not be considered to be fringe/esoteric practice; if this is not done, postgres assumes it can't pre-allocate more than 32MB of memory; that's not a lot of memory.)

You once mentioned cases of indexes not fitting into memory. Beyond not using partial/functional indexes (LOWER(), SUBSTR()) and having redundant indexes, the primary reason for this is (as I've somewhat painfully discovered) not allowing postgres to actually use enough memory (fwiw, using pre-allocated shared memory is faster, too, though I'd need to dig up references.)

Sorry for the detour, but in case someone *does* end up experimenting with less hacky database-based solutions, don't forget to take a good look at your postgres configuration. :) (or, maybe you've already done that, and this was all redundant!)

Also, it makes sense to use intermediary tables[2], so e.g. a 'fingerprint' table for unique fingerprint lookup -> then join with status entries / whatnot. The fingerprints can just as well reside in memory of course, if they can be efficiently persisted, and so on. In-house partial-nosql-solutions. :)

[1]: http://www.postgresql.org/docs/current/static/kernel-resources.html
[2]: e.g. https://github.com/wfn/torsearch/blob/master/db/db_create.sql

(hopefully this was not painful to read! Just wanted to share what I've learned.)

Last edited 5 years ago by wfn (previous) (diff)

comment:3 Changed 5 years ago by karsten

Thanks, wfn. I moved this discussion to a separate ticket: #11573.

comment:4 Changed 5 years ago by karsten

Resolution: implemented
Status: newclosed

Slept over it, still found it an okay idea, fixed two remaining bugs in the patch, merged, and deployed. Closing.

Note: See TracTickets for help on using tickets.