Table of Contents
Part 1: Analysis of Referenced Descriptor Completeness
This page summarizes the current findings. The discussion and questions can be found here.
Log Entries
The archiving component of CollecTor logs the missing descriptors of various types in a special format.
The following log entry explanation was extracted from Karsten's description in ticket 18798.
M-2016-04-11T22:00:00Z -> D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0279 -> 0.0279) M-2016-04-11T23:00:00Z -> D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0279 -> 0.0279) M-2016-04-11T23:00:00Z -> D-597C4455AF049B147337BBFF35CE4817676339FF5C94E971A05D416FD1A2DD95 (0.0279 -> 0.0558) M-2016-04-12T00:00:00Z -> D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0280 -> 0.0558) M-2016-04-12T00:00:00Z -> D-597C4455AF049B147337BBFF35CE4817676339FF5C94E971A05D416FD1A2DD95 (0.0280 -> 0.0558)
- The first line means that there's a microdescriptor with digest
38F2..
missing from the microdescriptor consensus with valid-after time2016-04-11 22:00:00
. That missing microdescriptor adds a value of0.0279
to the total missing descriptor count which is then0.0279
. The idea is to only warn if that total value passes1.0
. - The second line says that the same missing microdescriptor is also referenced from the microdescriptor consensus with valid-after time
2016-04-11 23:00:00
. Given that we shouldn't double-count that missing descriptor, we're not increasing the total count there. - The third line mentions another microdescriptor with digest
597C..
that is missing, and in this case it's referenced from the microdescriptor consensus with valid-after time2016-04-11 23:00:00
. That one raises the total count by another0.0279
to then0.0558
.
Other log entry examples listing missing descriptors are
C-2016-03-19T07:00:00Z -> S-BD9E2444C8416A29467463F6B228CEB75B1216B7 (0.0281 -> 0.0281) S-000A13E991700CB0A356CD08DDC0CDAB022F8B7E -> E-8A8DB3818A2CEE9D2844F8A9AD6FB89E04CFA7D1 (0.0100 -> 8.6512) V-2016-03-19T09:00:00Z-14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4 -> S-010612B70E18CB3E0CCA72A464E8FD683FDF029B (0.0254 -> 15.5266)
The short explanation for all four types:
S-
: a server descriptor references an extra-info descriptor that is missing,V-
: a vote references a server descriptor that we're missing,C-
: a consensus references a server descriptor that we're missing, andM-
: a microdescriptor consensus references a microdescriptor that is missing (see above).
Method
The missing descriptor log entries are parsed and collected in sets according to the time-stamp of the log entry and the referrer type. Using sets we avoid counting a missing descriptor referenced by multiple entities (e.g. different votes, different microconsensus, etc.). Missing server descriptors are listed for votes and consensus separately, i.e., a missing server descriptor referenced by votes and consensus will increase the count in both types.
From these sets two numbers are calculated for each time-stamp and referrer type:
- the number of currently missing descriptors of a certain type belonging to a certain type of referrer and
- the number of new missing descriptors for each time-stamp compared to the previous run.
Data
The log files last from 2016-03-08 to 2016-04-13 with missing parts 2016-03-09 to 2016-03-18 and 2016-03-24 to 2016-03-31.
There was one known incident of a full server hard drive that prevented storing descriptors around 2016-03-19.
Another peak in missing descriptors is visible around 2016-04-01, which is also explained by a full hard drive.
Deciles
The following deciles are calculated without excluding the peaks:
referenced by | 0% | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | 100% |
consensus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1339 |
votes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 2375 |
server | 0 | 2 | 2 | 3 | 7 | 11 | 16 | 19 | 26 | 35 | 55 |
microconsensus | 0 | 3 | 4 | 5 | 7 | 8 | 12 | 15 | 26 | 56 | 798 |
Graphs
Each of the following diagrams shows the number of total missing descriptor in lighter colors and the number of newly encountered missing descriptors in a darker color.
The y-axis depicts the count, the x-axis the time of measurement.
Counts are discrete, so the lines connecting the data points are just there to make perception easier, they are not an interpolation for the time in between measurements.
Total Picture
April 1st Closeup
April 2nd to April 13th
Attachments (6)
-
missing-descriptors-201603-20-22.png (1 bytes) - added by 2 years ago.
3d
-
missing-descriptors-201603-72h.png (1 bytes) - added by 2 years ago.
72h
- mdesc-0401-2016.png (77.6 KB) - added by 2 years ago.
- mdesc-0402-0413-2016.png (139.7 KB) - added by 2 years ago.
- mdes-all-0803-0413-2016.png (141.2 KB) - added by 2 years ago.
- mdesc-all-0308-0413-2016.png (141.2 KB) - added by 2 years ago.
Download all attachments as: .zip