Opened 3 months ago

Last modified 8 weeks ago

#31071 assigned enhancement

Add a notice if we're missing data for a lookup

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/ExoneraTor Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: irl Sponsor:


Turns out the the exit scanner had an issue between April 25 and 29, 2019. If somebody looks up their exit IP address during that time, they won't be listed in the results. I know of one case where this is now potentially an issue.

Let's think about adding a notice if we're missing data for part of a lookup period, including exit lists and maybe also consensuses. This is different from having no data at all, it's about missing some data only.

First step will be to refine the (already quite complex) query to return whether we have sufficient or insufficient data, possibly but not necessarily with exact timestamps of available data.

Second step will be to include the notice in the website, first in English and then in translated languages.

Third step will be to release and deploy all this.

I'll work on this, but I'm putting it into needs_review to discuss the idea first.

Child Tickets

Change History (6)

comment:1 Changed 3 months ago by karsten

Cc: metrics-team added
Reviewer: irl
Status: assignedneeds_review

comment:2 Changed 3 months ago by karsten

And let's add a step zero: look through the archives when this situation has happened before and post all those time intervals to tor-relays@ with an explanation how this affects ExoneraTor results. I'll start with this now.

comment:3 Changed 3 months ago by irl

How do we plan to detect incomplete data? Perhaps the simplest option is to have a table that keeps track of events, and return those events along with the result instead of making the query more complex. i.e. do two SQL queries, one for the data and a second one for any events that might give context.

When the importer fails to find a new exit list, it can just add an entry to the events table. If there are not enough addresses in it, or no address was recently checked, it could do the same.

Understanding how often this has happened would be a good start, maybe we don't need to have the importer do this if it's not happening that often, we can just detect it and manually add rows to the table.

comment:4 in reply to:  3 Changed 3 months ago by karsten

Replying to irl:

Understanding how often this has happened would be a good start, maybe we don't need to have the importer do this if it's not happening that often, we can just detect it and manually add rows to the table.

Here are the gaps of 4 hours or more that I found in existing data:

Gap of  19 hours between 2011-09-10T00:05 and 2011-09-10T19:28:46.
Gap of  10 hours between 2011-09-10T22:30:22 and 2011-09-11T09:27:05.
Gap of  23 hours between 2011-12-21T01:21:19 and 2011-12-22T00:23:36.
Gap of   4 hours between 2012-01-10T03:16:24 and 2012-01-10T07:20:03.
Gap of 111 hours between 2012-02-07T02:33:16 and 2012-02-11T18:07:47.
Gap of  12 hours between 2012-11-09T05:06:32 and 2012-11-09T17:44:08.
Gap of   6 hours between 2013-03-07T06:14:23 and 2013-03-07T13:03:51.
Gap of  26 hours between 2013-03-07T15:16:50 and 2013-03-08T17:17:55.
Gap of 156 hours between 2013-03-14T09:29:25 and 2013-03-20T22:03:41.
Gap of   6 hours between 2013-08-08T20:07:23 and 2013-08-09T02:57:34.
Gap of   7 hours between 2013-09-29T01:01:11 and 2013-09-29T08:04:56.
Gap of  12 hours between 2013-10-05T14:13:06 and 2013-10-06T03:11:26.
Gap of  11 hours between 2013-11-03T15:06:38 and 2013-11-04T02:31:19.
Gap of   4 hours between 2013-12-24T08:33:57 and 2013-12-24T13:04:42.
Gap of   7 hours between 2014-01-21T10:38:42 and 2014-01-21T18:14:20.
Gap of  19 hours between 2015-10-09T14:13:37 and 2015-10-10T09:57:26.
Gap of   8 hours between 2016-09-18T03:16:31 and 2016-09-18T12:11:46.
Gap of  14 hours between 2017-11-19T17:12:04 and 2017-11-20T07:54:34.
Gap of   9 hours between 2018-01-21T22:12:34 and 2018-01-22T08:10:15.
Gap of   6 hours between 2018-01-26T16:11:36 and 2018-01-26T23:02:35.
Gap of   5 hours between 2018-01-27T04:21:42 and 2018-01-27T09:28:47.
Gap of   5 hours between 2018-01-27T14:53:18 and 2018-01-27T20:24:21.
Gap of  18 hours between 2018-02-02T18:27:54 and 2018-02-03T13:06:09.
Gap of   9 hours between 2018-02-25T00:16:21 and 2018-02-25T10:12:04.
Gap of   5 hours between 2018-03-03T15:54:17 and 2018-03-03T21:09:07.
Gap of   5 hours between 2018-09-24T15:11:07 and 2018-09-24T21:09:27.
Gap of   9 hours between 2018-12-30T23:22:06 and 2018-12-31T08:57:55.
Gap of  13 hours between 2019-01-12T18:30:24 and 2019-01-13T08:18:33.
Gap of 122 hours between 2019-04-25T13:13:19 and 2019-04-30T15:40:21.
Gap of  21 hours between 2019-05-25T19:04:43 and 2019-05-26T16:09:23.
Gap of   9 hours between 2019-06-21T21:14:21 and 2019-06-22T07:06:06.

Note that a 4 hour downtime wouldn't be an issue for ExoneraTor. It considers a previously scanned exit IP address valid for 24 hours. We would probably be looking for gaps 18 hours or longer.

comment:5 Changed 2 months ago by karsten

Owner: changed from karsten to metrics-team
Status: needs_reviewassigned

I'm not working on this ticket at the moment. Re-assigning to metrics-team.

comment:6 Changed 8 weeks ago by karsten

I started looking into implementing this enhancement.

First thing is that we'll have to define criteria for putting out a warning that we might be lacking exit scanner information. I think that experiencing an 18+ hour gap of exit scan results would be reason enough to emit a warning. This would apply to the requested day minus 2 days and plus 1 day.

What we could do is extend search_by_date_address24() by another UNION to include a query like this:

SELECT DATE(scanned) AS date, NULL AS fingerprint_base64,
      DATE_TRUNC('hour', scanned) AS scanned, NULL AS exitaddress,
      NULL AS validafter, NULL AS nickname, NULL AS exit, NULL AS oraddress
  FROM exitlistentry_exitaddress
  WHERE DATE(scanned)
  BETWEEN '2013-03-12' AND '2013-03-15' -- requested date in this example: '2013-03-14'
  GROUP BY 1, 3;

The response would contain up to 4 * 24 = 96 rows for each hour of exit scan results during the four days around the requested date. We could then check for gaps that are 18 hours or longer and print out a warning if we found something.

This query is really fast, because it uses the exitlistentry_exitaddress_date_scanned_fingerprint_id index. I don't expect major performance drawbacks from adding this query to search_by_date_address24().

Next steps:

  • Provide a new schema file src/main/sql/exonerator3.sql with the updated search_by_date_address24 function.
  • Update to parse the new query reponse by checking whether there were exit scanner results for all four days in question.
  • Extend to include this information.
  • Update to display this information.
  • Prepare the new warning message for translation.

Still leaving this assigned to metrics-team, as I'm not planning to implement this in the next couple days.

Feedback appreciated!

Note: See TracTickets for help on using tickets.