Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#12676 closed defect (fixed)

Bridge descriptors CollecTor's recent/ directory contain many duplicates

Reported by: karsten Owned by:
Priority: Low Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The recent/ directory should only contain new descriptors, and ideally no duplicates. I just found that the latter is not the case:

$ grep -c "@type" recent/bridge-descriptors/server-descriptors/2014-07-22-07-04-02-server-descriptors 
18175
$ grep -c "@type" recent/bridge-descriptors/extra-infos/2014-07-22-07-04-02-extra-infos 
9723

Compare this to relay descriptors:

$ grep -c "@type" recent/relay-descriptors/server-descriptors/2014-07-22-07-05-52-server-descriptors 
931
$ grep -c "@type" recent/relay-descriptors/extra-infos/2014-07-22-07-05-52-extra-infos 
930
$ grep -c "@type" recent/relay-descriptors/microdescs/micro/2014-07-22-07-05-52-micro 
30

The reason is that only novel relay descriptors will be downloaded and stored to disk, but the parsed bridge descriptor tarballs are full snapshots of Tonga's cached descriptor files. We need to add a check whether we already have a sanitized bridge descriptor and only store it if not.

Priority is minor, because this only adds some additional load on clients parsing descriptors more than once. But other than that it's mostly harmless.

Child Tickets

Change History (4)

comment:1 Changed 5 years ago by karsten

Fixed here, I think. Deployed on yatei now. Will resolve in a few hours if nothing breaks horribly.

comment:2 Changed 5 years ago by karsten

Resolution: fixed
Status: newclosed

comment:3 Changed 5 years ago by isis

Perhaps related to #15707?

comment:4 in reply to:  3 Changed 5 years ago by karsten

Replying to isis:

Perhaps related to #15707?

That might have amplified the problem, but the problem I fixed was related to duplication between hourly runs. So, even if Tonga wouldn't duplicate any descriptors in the files it provides, we'd duplicate them between runs. That's the part that I fixed.

Note: See TracTickets for help on using tickets.