Improve materialized views in the metrics database
The metrics database schema uses periodically updated tables similar to materialized views for aggregating statistics. When inserting data into the database, we write the dates that have changed to a separate updates table. Every three hours, we delete the aggregates for these days and recompute them, which takes a few minutes.
The recompute step that takes most of the time is refresh_user_stats()
, which is no surprise given the complexity of that function. We should try to simplify this function, possibly by pre-computing partial results that can be reused for other statistics. Ideally, recomputing aggregates should run in under one minute, given that we want to add more materialized views for more aggregate statistics in the future. In particular, I'd like to know which particular SQL parts slow us down in order to avoid them in the future.