Opened 9 months ago

Last modified 6 months ago

#27980 needs_information defect

Missing server descriptors in recent/ but not in out/

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Yesterday I noticed the reference checker reporting unusually high numbers of missing descriptors.

After investigating this issue for too many hours, I believe that we're only missing descriptors in the recent/ directory, not in the out/ directory. This means that the tarballs are going to be complete but that applications fetching recent descriptors only will be missing descriptors.

Recent files from October 6 (with unusually small files marked with <-):

2018-10-06-00-05-00-server-descriptors 2018-10-06 00:05  1.2M  
2018-10-06-01-05-00-server-descriptors 2018-10-06 01:07  1.2M  
2018-10-06-02-05-00-server-descriptors 2018-10-06 02:07  1.4M  
2018-10-06-03-05-00-server-descriptors 2018-10-06 03:07  1.3M  
2018-10-06-04-05-00-server-descriptors 2018-10-06 04:09  1.2M  
2018-10-06-05-05-00-server-descriptors 2018-10-06 05:07  1.3M  
2018-10-06-06-05-00-server-descriptors 2018-10-06 06:07  1.4M  
2018-10-06-07-05-00-server-descriptors 2018-10-06 07:07  1.2M  
2018-10-06-08-05-00-server-descriptors 2018-10-06 08:09  1.3M  
2018-10-06-09-05-00-server-descriptors 2018-10-06 09:05  1.2M  
2018-10-06-10-05-00-server-descriptors 2018-10-06 10:09  1.4M  
2018-10-06-11-05-00-server-descriptors 2018-10-06 11:07  3.6K  <-
2018-10-06-12-05-00-server-descriptors 2018-10-06 12:07  1.2M  
2018-10-06-13-05-00-server-descriptors 2018-10-06 13:05  1.2M  
2018-10-06-14-05-00-server-descriptors 2018-10-06 14:09  1.2M  
2018-10-06-15-05-00-server-descriptors 2018-10-06 15:07  1.4M  
2018-10-06-16-05-00-server-descriptors 2018-10-06 16:07  1.3M  
2018-10-06-17-05-00-server-descriptors 2018-10-06 17:09  5.9K  <-
2018-10-06-18-05-00-server-descriptors 2018-10-06 18:05  1.4M  
2018-10-06-19-05-00-server-descriptors 2018-10-06 19:07  1.5M  
2018-10-06-20-05-00-server-descriptors 2018-10-06 20:05  1.6M  
2018-10-06-21-05-00-server-descriptors 2018-10-06 21:07  1.3M  
2018-10-06-22-05-00-server-descriptors 2018-10-06 22:05  1.2M  
2018-10-06-23-05-00-server-descriptors 2018-10-06 23:07  8.0K  <-

In contrast, here are the files in the out/ directory, by last-modified hour:

    534 Oct  6 00
    530 Oct  6 01
    618 Oct  6 02
    590 Oct  6 03
    511 Oct  6 04
    562 Oct  6 05
    559 Oct  6 06
    527 Oct  6 07
    576 Oct  6 08
    570 Oct  6 09
    600 Oct  6 10
    587 Oct  6 11
    536 Oct  6 12
    529 Oct  6 13
    540 Oct  6 14
    615 Oct  6 15
    580 Oct  6 16
    634 Oct  6 17
    600 Oct  6 18
    657 Oct  6 19
    717 Oct  6 20
    606 Oct  6 21
    543 Oct  6 22
    585 Oct  6 23

Note how the numbers stay roughly the same at 11:00, 17:00, and 23:00. It doesn't look like we're missing descriptors here.

So, after looking at too many logs and code, I'm giving up. I can't find the bug, at least not yet.

I'm going to provide a patch that improves logging in this area of the code, in particular with respect to unchecked return values when creating directories, renaming files, etc. Maybe we'll learn something from those logs.

Child Tickets

Change History (4)

comment:1 Changed 9 months ago by karsten

Status: assignedneeds_review

comment:2 Changed 9 months ago by irl

Status: needs_reviewmerge_ready

I do hope we're able to find the problem with the extra logging. The changes look good to me.

comment:3 Changed 9 months ago by karsten

Status: merge_readyneeds_information

Great! Pushed to master. We should put out a new release soon. Leaving this ticket open in needs_information just in case the issues comes back. After all, we probably didn't fix anything, we just added more logging.

comment:4 Changed 6 months ago by gaba

Priority: HighMedium
Note: See TracTickets for help on using tickets.