Opened 2 weeks ago
Last modified 31 hours ago
#31071 assigned enhancement
Add a notice if we're missing data for a lookup
Reported by: | karsten | Owned by: | metrics-team |
---|---|---|---|
Priority: | Medium | Milestone: | |
Component: | Metrics/ExoneraTor | Version: | |
Severity: | Normal | Keywords: | |
Cc: | metrics-team | Actual Points: | |
Parent ID: | Points: | ||
Reviewer: | irl | Sponsor: |
Description
Turns out the the exit scanner had an issue between April 25 and 29, 2019. If somebody looks up their exit IP address during that time, they won't be listed in the results. I know of one case where this is now potentially an issue.
Let's think about adding a notice if we're missing data for part of a lookup period, including exit lists and maybe also consensuses. This is different from having no data at all, it's about missing some data only.
First step will be to refine the (already quite complex) query to return whether we have sufficient or insufficient data, possibly but not necessarily with exact timestamps of available data.
Second step will be to include the notice in the website, first in English and then in translated languages.
Third step will be to release and deploy all this.
I'll work on this, but I'm putting it into needs_review to discuss the idea first.
Child Tickets
Change History (5)
comment:1 Changed 2 weeks ago by
Cc: | metrics-team added |
---|---|
Reviewer: | → irl |
Status: | assigned → needs_review |
comment:2 Changed 2 weeks ago by
comment:3 follow-up: 4 Changed 2 weeks ago by
How do we plan to detect incomplete data? Perhaps the simplest option is to have a table that keeps track of events, and return those events along with the result instead of making the query more complex. i.e. do two SQL queries, one for the data and a second one for any events that might give context.
When the importer fails to find a new exit list, it can just add an entry to the events table. If there are not enough addresses in it, or no address was recently checked, it could do the same.
Understanding how often this has happened would be a good start, maybe we don't need to have the importer do this if it's not happening that often, we can just detect it and manually add rows to the table.
comment:4 Changed 2 weeks ago by
Replying to irl:
Understanding how often this has happened would be a good start, maybe we don't need to have the importer do this if it's not happening that often, we can just detect it and manually add rows to the table.
Here are the gaps of 4 hours or more that I found in existing data:
Gap of 19 hours between 2011-09-10T00:05 and 2011-09-10T19:28:46. Gap of 10 hours between 2011-09-10T22:30:22 and 2011-09-11T09:27:05. Gap of 23 hours between 2011-12-21T01:21:19 and 2011-12-22T00:23:36. Gap of 4 hours between 2012-01-10T03:16:24 and 2012-01-10T07:20:03. Gap of 111 hours between 2012-02-07T02:33:16 and 2012-02-11T18:07:47. Gap of 12 hours between 2012-11-09T05:06:32 and 2012-11-09T17:44:08. Gap of 6 hours between 2013-03-07T06:14:23 and 2013-03-07T13:03:51. Gap of 26 hours between 2013-03-07T15:16:50 and 2013-03-08T17:17:55. Gap of 156 hours between 2013-03-14T09:29:25 and 2013-03-20T22:03:41. Gap of 6 hours between 2013-08-08T20:07:23 and 2013-08-09T02:57:34. Gap of 7 hours between 2013-09-29T01:01:11 and 2013-09-29T08:04:56. Gap of 12 hours between 2013-10-05T14:13:06 and 2013-10-06T03:11:26. Gap of 11 hours between 2013-11-03T15:06:38 and 2013-11-04T02:31:19. Gap of 4 hours between 2013-12-24T08:33:57 and 2013-12-24T13:04:42. Gap of 7 hours between 2014-01-21T10:38:42 and 2014-01-21T18:14:20. Gap of 19 hours between 2015-10-09T14:13:37 and 2015-10-10T09:57:26. Gap of 8 hours between 2016-09-18T03:16:31 and 2016-09-18T12:11:46. Gap of 14 hours between 2017-11-19T17:12:04 and 2017-11-20T07:54:34. Gap of 9 hours between 2018-01-21T22:12:34 and 2018-01-22T08:10:15. Gap of 6 hours between 2018-01-26T16:11:36 and 2018-01-26T23:02:35. Gap of 5 hours between 2018-01-27T04:21:42 and 2018-01-27T09:28:47. Gap of 5 hours between 2018-01-27T14:53:18 and 2018-01-27T20:24:21. Gap of 18 hours between 2018-02-02T18:27:54 and 2018-02-03T13:06:09. Gap of 9 hours between 2018-02-25T00:16:21 and 2018-02-25T10:12:04. Gap of 5 hours between 2018-03-03T15:54:17 and 2018-03-03T21:09:07. Gap of 5 hours between 2018-09-24T15:11:07 and 2018-09-24T21:09:27. Gap of 9 hours between 2018-12-30T23:22:06 and 2018-12-31T08:57:55. Gap of 13 hours between 2019-01-12T18:30:24 and 2019-01-13T08:18:33. Gap of 122 hours between 2019-04-25T13:13:19 and 2019-04-30T15:40:21. Gap of 21 hours between 2019-05-25T19:04:43 and 2019-05-26T16:09:23. Gap of 9 hours between 2019-06-21T21:14:21 and 2019-06-22T07:06:06.
Note that a 4 hour downtime wouldn't be an issue for ExoneraTor. It considers a previously scanned exit IP address valid for 24 hours. We would probably be looking for gaps 18 hours or longer.
comment:5 Changed 31 hours ago by
Owner: | changed from karsten to metrics-team |
---|---|
Status: | needs_review → assigned |
I'm not working on this ticket at the moment. Re-assigning to metrics-team.
And let's add a step zero: look through the archives when this situation has happened before and post all those time intervals to tor-relays@ with an explanation how this affects ExoneraTor results. I'll start with this now.