Opened 7 years ago

Closed 7 years ago

#6064 closed defect (fixed)

Bridge usage statistics on metrics website are broken

Reported by: karsten Owned by: karsten
Priority: High Milestone:
Component: Metrics/Website Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The graph on bridge users from all countries recently went up from 10,000 to 50,000. There was no event that could explain this increase, so I looked for a possible bug.

Here's the bug: when we aggregate bridge users per day, we write single observations to a file with lines like this:

bridge,date,time,??,a1,a2,...,all
0007BC3A0CFC768DB2FA1E3EB6FB4ABF4EBE2D13,2012-05-24,07:12:18,NA,1.12,NA,...,30.55

In the next step we aggregate these lines by summing up all observations of a given day.

Turns out the file with single observations was truncated and we didn't notice. When adding lines to that file, it is read to memory, new observations are added, and the file is written to disk. The file is always kept ordered by bridge fingerprint. Here's the distribution of bridge fingerprints in the file:

0 24567
1 24623
2 11687
3  1526
4  1124
5   825
6  1352
7  1422
8  1271
9  1287
A  1336
B  1048
C  1525
D  1227
E  1497
F   994

We would expect roughly the same number of bridges in each bucket. Looks like the file was truncated after writing half of the fingerprints starting with 2. This could have happened due to Java running out of memory, the server being restarted while writing the file, etc.

The quick fix is to aggregate bridge usage statistics again and replace the single-observations file on yatei. I'm going to do that now.

The next fix is to avoid truncating the file by writing to a temp file and replacing the original file with it once we're done writing. I'll look into that next.

The real fix is to stop using flat files for something that requires a database. That's going to take me quite a bit longer.

Child Tickets

Attachments (2)

bridge-users-2012-03-08-72-2012-06-06-all-OLD.png (9.5 KB) - added by karsten 7 years ago.
bridge-users-2012-03-08-72-2012-06-06-all-NEW.png (9.7 KB) - added by karsten 7 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 Changed 7 years ago by karsten

The "quick fix" from above is now implemented. Graphs on the metrics website are correct again. As an example for later reference, this broken graph is now replaced with this fixed graph.

comment:2 Changed 7 years ago by karsten

Resolution: fixed
Status: newclosed

The "next fix" is now implemented and deployed.

The "real fix" is something longer-term that requires rewriting large parts of metrics-web. Calling this specific issue fixed. Closing.

Note: See TracTickets for help on using tickets.