Opened 8 years ago

Closed 7 years ago

#2921 closed enhancement (wontfix)

Improve bulk import of relay descriptors into metrics database

Reported by: karsten Owned by:
Priority: Low Milestone:
Component: Metrics/Website Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We currently have two ways to import relay descriptors into the metrics database:

  • JDBC import: We have a Java importer that connects to the metrics database via JDBC. We use a few tweaks like committing batches of up to 500 rows, but importing months of data is still a time-consuming task.
  • psql \copy: The Java importer can be configured to parse relay descriptor files and write files for psql's \copy command. The disadvantage is that \copy cannot handle duplicates very well, so that we have to pre-process the bulk import files.

I wonder if there are better approaches than these two, or if there are improvements to how we implement them. It would be good to compare the performance of these two approaches and any improvements to them for 1 (12, 24) months of data.

Child Tickets

Change History (3)

comment:1 Changed 8 years ago by karsten

Component: MetricsMetrics Website

comment:2 Changed 8 years ago by karsten

Owner: karsten deleted
Priority: normalminor
Status: newassigned

Having better ways to import descriptors would be nice, but we're doing okay with the current code. Reducing priority to minor.

Also re-assigning to the None person, as I'm not working on this.

comment:3 Changed 7 years ago by karsten

Resolution: wontfix
Status: assignedclosed

The real solution is not to import raw descriptors and all kinds of details that we don't need for statistics into the database. Closing as wontfix.

Note: See TracTickets for help on using tickets.