Opened 8 weeks ago

Closed 12 days ago

#24470 closed enhancement (fixed)

Distinguish point events from ongoing events in metrics timeline

Reported by: dcf Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Normal Keywords: metrics-timeline
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The metrics timeline does not distinguish entries that represent a single point in time from those that started but have not finished yet. They both are represented by a non-empty start date column and an empty end date column. For example (some columns omitted):

start date end date description
2017-10-05 geoip database updated.
2017-10-07 Increase of users in Romania.

We need to disambiguate these two cases for the sake of uses like #24260, which wants to present date ranges as text.

Child Tickets

Change History (6)

comment:1 in reply to:  description Changed 8 weeks ago by dcf

I've been brainstorming ways to represent the distinction.

Idea 1

Put a mark (like x, /, .) in the end date column of point events. Leave the "end date" blank for ongoing events.

start date end date description
2017-10-05 / geoip database updated.
2017-10-07 Increase of users in Romania.

Idea 2

Put a keyword like ? or ongoing in the "end date" column of ongoing events. Leave the "end date" of point events blank.

start date end date description
2017-10-05 geoip database updated.
2017-10-07 ongoing Increase of users in Romania.

Idea 3

Combination of #1 and #2. Could help to prevent errors. Entries with a blank "end date" would have an undefined meaning.

start date end date description
2017-10-05 / geoip database updated.
2017-10-07 ongoing Increase of users in Romania.

Idea 4

Have the date of point events span both columns. Looks nice(?), greatly complicates downstream parsing.

start date end date description
2017-10-05 geoip database updated.

2017-10-07

Increase of users in Romania.

Idea 5

Point events simply have "start date"="end date". Doesn't allow to distinguish point events from timespans than happened to start and finish on the same day. (Not currently a problem, as the only such timespans we have so far are additionally disambiguated by timestamps.)

start date end date description
2017-10-05 2017-10-05 geoip database updated.
2017-10-07 Increase of users in Romania.
Last edited 8 weeks ago by dcf (previous) (diff)

comment:2 Changed 8 weeks ago by karsten

From a parsing POV, your second Idea 4 would be easiest. I don't really see a problem with not being able to distinguish point events from single-day events, because 1 day is the smallest amount of time we're processing on Tor Metrics anyway. (We're discarding timestamps as part of the parsing process.)

But any of the others will do. I'd say just pick whatever you like most.

Thanks!

comment:3 in reply to:  2 Changed 8 weeks ago by dcf

Replying to karsten:

your second Idea 4

Oops, meant that to be "Idea 5" 😀 I just edited comment:1 to fix it.

comment:4 Changed 2 weeks ago by dcf

Status: newneeds_review

In doc/MetricsTimeline?action=diff&version=215 I went with Idea 2; that is, point events have a blank end date, and ongoing events have an end date of "ongoing".

I initially intended to implement Idea 1 (mark the point events instead of the ongoing events). In fact I had it all implemented and then I changed my mind. The main reason was that almost all the entries, whose end date is blank, are point events and not ongoing events. I went through the whole timeline and only at most 10 were ongoing. One of the things we're going to have to do is periodically check which ongoing events have ended, and it's easier to ctrl-F for "ongoing" than it is to search for the absence of a mark. In fact some of the entries I marked "ongoing" I suspect have already ended. The "ongoing" marker therefore serves as a todo of sorts.

The necessary code changes were, for me, pretty small. Here is the diff from the tidy script in metrics-timeline-tools:

@@ -28,20 +28,22 @@ class Entry(object):
     def __init__(self):
         self.start_date = None
         self.start_date_approx = None
         self.end_date = None
         self.end_date_approx = False
+        self.is_ongoing = False
         self.places = set()
         self.protocols = set()
         self.description = None
         self.links = []
 
     @staticmethod
     def from_table_row(row):
         entry = Entry()
         entry.start_date, entry.start_date_approx = parse_datetime(row.start_date)
         entry.end_date, entry.end_date_approx = parse_datetime(row.end_date)
+        entry.is_ongoing = row.end_date == "ongoing"
         entry.places = set(row.places.split())
         entry.protocols = set(row.protocols.split())
         entry.description = parse_wikitext(row.description)
         entry.links = parse_links(row.links)
         return entry
@@ -57,20 +59,20 @@ class Entry(object):
         )
 
     def to_wikitext_row(self):
         cells = (
             format_datetime_approx(self.start_date, self.start_date_approx),
-            format_datetime_approx(self.end_date, self.end_date_approx),
+            self.is_ongoing and "ongoing" or format_datetime_approx(self.end_date, self.end_date_approx),
             " ".join(sorted(self.places)),
             " ".join(sorted(self.protocols)),
             self.description.to_wikitext(),
             " ".join(link.to_wikitext() for link in self.links),
         )
         return format_table_row(cells)
 
 def parse_datetime(s):
-    if s == "":
+    if s == "" or s == "ongoing":
         return None, False
 
     approx = False
     if s.startswith("~"):
         approx = True

And in my metrics-country.html page, I treat ongoing entries as having an infinite end date for the purpose of filtering, and treat them specially when formatting the date field:

@@ -6789,7 +6789,7 @@ function accept_entry(entry, start, end, country) {
                return false;
        }
        if (entry.start_date && !entry.end_date && entry.start_date < start) {
-               return false;
+               return entry.is_ongoing;
        }
        if (entry.end_date && !entry.start_date && entry.end_date > end) {
                return false;
@@ -6891,7 +6891,11 @@ function format_timeline_entry_dates(entry) {
        if (entry.start_date && entry.end_date) {
                return format_date(entry.start_date, entry.start_date_approx) + " – " + format_date(entry.end_date, entry.end_date_approx);
        } else if (entry.start_date) {
-               return format_date(entry.start_date, entry.start_date_approx);
+               if (entry.is_ongoing) {
+                       return format_date(entry.start_date, entry.start_date_approx) + " – present";
+               } else {
+                       return format_date(entry.start_date, entry.start_date_approx);
+               }
        } else if (entry.end_date) {
                return "? – " + format_date(entry.end_date, entry.end_date_approx);
        } else {

comment:5 Changed 12 days ago by karsten

Works for me. I just updated Tor Metrics which now shows some entries as "$date to present".

I guess that concludes this ticket. Feel free to keep it open if there's more to do, but don't keep it open for me if you think everything's done. Thanks!

comment:6 Changed 12 days ago by dcf

Resolution: fixed
Status: needs_reviewclosed
Note: See TracTickets for help on using tickets.