Opened 8 years ago

Closed 6 years ago

#4440 closed task (wontfix)

Attempt an implementation of the relay-search database using MongoDB or CouchDB

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Our current relay-search function takes forever to return results. There's #2922 for improving the database schema to better support searching for single relays. That ticket assumes that we'll continue to use PostgreSQL.

Today I looked into MongoDB for the web server log analysis, and I wonder if MongoDB or CouchDB might be an alternative for implementing the relay-search database.

Every consensus or descriptor could be represented as a document with references to other documents. Indexes could make typical search queries fast. We don't need complicated Map/Reduce functions, because we're only searching and looking up data, not aggregating anything. (That's also the reason why I think this is worth trying out---replacing the metrics database that aggregates statistics with MongoDB/CouchDB may not make as much sense.) Maybe we should run a simple comparison of the new PostgreSQL database that ExoneraTor uses, and that is highly optimized for searches, to an implementation using MongoDB or CouchDB.

I don't know if such a solution will perform better than a PostgreSQL-based solution. I think we should try to find out.

Child Tickets

Attachments (1)

relay-search-db-import.png (49.7 KB) - added by karsten 8 years ago.
Relay-search database import performance

Download all attachments as: .zip

Change History (3)

Changed 8 years ago by karsten

Attachment: relay-search-db-import.png added

Relay-search database import performance

comment:1 Changed 8 years ago by karsten

I ran a comparison of import performance for PostgreSQL and CouchDB. The task was to import 1 month of network status consensuses and update indexes for relay-search queries after every insert. See the attached graph. Both import times (40 and 90 minutes) are reasonably fast for importing 1 month of data.

The next step will be to implement query generators and executors for both databases and compare query performance. If the results are reasonable for both databases, I'm going to import, say, two years of data into the databases and compare import and query performance.

comment:2 Changed 6 years ago by karsten

Resolution: wontfix
Status: newclosed

Ponies! We should either shut down the relay-search service, or isolate it from metrics-web and leave it running until it breaks. See related discussion on tor-talk@.

Note: See TracTickets for help on using tickets.