Opened 10 months ago

Closed 9 months ago

Last modified 9 months ago

#25264 closed enhancement (fixed)

Decide what graph to display when there's no data to graph

Reported by: karsten Owned by: iwakeh
Priority: High Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

While looking through the Rserve logs, I found that we sometimes attempt to graph data that cannot be graphed in a reasonable way. Examples:

  • Data from the CSV file ends on 2018-02-13. We try to graph 2018-02-13 to 2018-02-14. A line needs at least 2 points, so we're showing an almost empty graph.
  • Data from the CSV file ends on 2018-02-13. We try to graph 2018-02-14 to 2018-02-15. We don't have a single data point to graph, so we're not showing anything at all.

I could imagine we do the following things, if we can't display a single data point or line segment:

  • Display a "No Data Available" placeholder like Relay Search.
    • Note that we might need this for other cases, too, when users pass parameters that we cannot process.
    • We might as well not display anything, which is what we do right now. Might be less usable, though.
  • Display an empty graph with all requested dates on the x axis and no data points.
    • Note that if we want to do this, we need to decide when to "trim" the graph to the available dates and when to show all requested dates from start to end. Example: Try to draw a graph from 2000-01-01 to 2001-01-01, and then try to draw another graph from 2000-01-01 to 2018-01-01. Should both graphs display the full requested time period, or should just the second graph be trimmed to available data?

Thoughts?

Child Tickets

Attachments (5)

no-data-available.png (35.6 KB) - added by karsten 9 months ago.
no-data-available.pdf (4.2 KB) - added by karsten 9 months ago.
nodata.png (20.0 KB) - added by irl 9 months ago.
Style guide compliant no data available image
nodata.2.png (24.9 KB) - added by irl 9 months ago.
Style guide compliant no data available image - grey and resized
no-data-available.tar.xz (82.3 KB) - added by irl 9 months ago.
XCF, PNG and PDF, resized to the correct dimensions this time

Download all attachments as: .zip

Change History (34)

comment:1 Changed 10 months ago by karsten

Priority: MediumLow

This issue isn't new, and the current situation is not as bad. Setting priority to low.

comment:2 Changed 9 months ago by iwakeh

For avoiding the 500 server error when trying to access the png (as reported in detail in #25468) an empty graph/png stating 'no data available for this parameter choice' should be generated. Currently, the corresponding csv only contains the header. Maybe, also add a comment here too in case no data is available. It should be avoided to call R if there is no data available.

comment:3 in reply to:  2 ; Changed 9 months ago by karsten

Replying to iwakeh:

For avoiding the 500 server error when trying to access the png (as reported in detail in #25468) an empty graph/png stating 'no data available for this parameter choice' should be generated.

Sounds good. We'll also need such a static file in the PDF format. I'll create placeholders using R/ggplot2 in a bit. We just need to write the code to put them in.

Currently, the corresponding csv only contains the header. Maybe, also add a comment here too in case no data is available.

Sounds good, too. Want to suggest a text? I can put it in then.

It should be avoided to call R if there is no data available.

Well, we need to call R to find out whether there's data available for the requested period of time. Or what did you mean?

Changed 9 months ago by karsten

Attachment: no-data-available.png added

Changed 9 months ago by karsten

Attachment: no-data-available.pdf added

comment:4 in reply to:  3 Changed 9 months ago by karsten

Replying to karsten:

Replying to iwakeh:

For avoiding the 500 server error when trying to access the png (as reported in detail in #25468) an empty graph/png stating 'no data available for this parameter choice' should be generated.

Sounds good. We'll also need such a static file in the PDF format. I'll create placeholders using R/ggplot2 in a bit. We just need to write the code to put them in.

Attached. I generated these two files with the following code:

library(ggplot2)
library(dplyr)
library(tidyr)

data.frame(x = c(0, NA), y = c(0, NA)) %>%
  filter(x != 0) %>%
  ggplot(aes(x = x, y = y)) + 
  scale_x_continuous("") + 
  scale_y_continuous("") + 
  labs(caption = "The Tor Project - https://metrics.torproject.org/") + 
  theme(plot.margin = margin(5.5, 11, 5.5, 5.5),
    plot.title = element_text(hjust = 0.5, margin = margin(b = 11), size = 36),
    panel.background = element_rect(fill = NA)) + 
  ggtitle("\n\n\n\nNo Data Available")
ggsave(filename = "no-data-available.png", width = 8, height = 5, dpi = 150)
ggsave(filename = "no-data-available.pdf", width = 8, height = 5, dpi = 150)

I can also write the code that returns these files, unless you'd like to do that.

comment:5 in reply to:  3 Changed 9 months ago by iwakeh

Replying to karsten:

Replying to iwakeh:

For avoiding the 500 server error when trying to access the png (as reported in detail in #25468) an empty graph/png stating 'no data available for this parameter choice' should be generated.

Sounds good. We'll also need such a static file in the PDF format. I'll create placeholders using R/ggplot2 in a bit. We just need to write the code to put them in.

Currently, the corresponding csv only contains the header. Maybe, also add a comment here too in case no data is available.

Sounds good, too. Want to suggest a text? I can put it in then.

It should be avoided to call R if there is no data available.

Well, we need to call R to find out whether there's data available for the requested period of time. Or what did you mean?

The java code currently returns the SC_INTERNAL_SERVER_ERROR. This is decided by the properties of the RObject received. As RObject is our code it should always be available in order to avoid the null check.
In case, the R code doesn't give a result or fails the RObject should 'know', i.e., a method needs to be added, e.g., boolean error() for any unforeseeable error w/o any file returned (triggers error message) otherwise (in case of error()==false) it returns the wanted file or the 'no-data' version.

Regarding the text:

Unfortunately there is no data available for the chosen parameters.

Is fine for all and we should include the parameters chosen into the response, e.g., in the pdf, csv, png.


comment:6 Changed 9 months ago by karsten

Huh, I'm afraid I don't follow. Do you mind writing this patch?

I'm also not clear where you'd want to include parameters in the pnd or pdf. Do you mean we should generate a custom "No Data Available" graph that displays the chosen parameters? If so, why? (Maybe this will be clearer with a patch, too.)

comment:7 Changed 9 months ago by iwakeh

Owner: changed from metrics-team to iwakeh
Status: newaccepted

The use case I'm thinking of:
Some people retrieve only the pdf/png/csv using the links provided. If such a file is downloaded a URL in there with all parameters and the no-data-info makes clear what happened.
On the web page we wouldn't need to display a graph w/o dots just the text, b/c the parameters are visible still.

Other than that code might be easier to discuss. I'll provide a patch.

comment:8 Changed 9 months ago by karsten

I see. It's a possible use case. But I'd say let's focus on our main use case first, which is users looking at the website. And if somebody links to our PDF/PNG/CSV directly, and these return "No Data Available", it should be up to them to also provide details to find out why that is the case.

comment:9 Changed 9 months ago by irl

"No Data Available" should be consistent. This should follow the Style Guide too.

Relay Search currently does not follow the Style Guide:
https://metrics.torproject.org/rs/img/no-data-available.png

Changed 9 months ago by irl

Attachment: nodata.png added

Style guide compliant no data available image

comment:10 Changed 9 months ago by karsten

Ah, you mean the image should be consistent across Tor Metrics pages? Good point.

But does it have to be purple? Or can we use the same image in some style-guide-compatible gray? And can we have it in 728 x 455?

Changed 9 months ago by irl

Attachment: nodata.2.png added

Style guide compliant no data available image - grey and resized

comment:11 Changed 9 months ago by irl

https://styleguide.torproject.org/visuals/#colors has the approved colors. I could also make SVG, PDF, etc. versions if that is useful. If we do include a new image to be used, we should also update Relay Search at the same time, I can make a patch for that quite easily and then both could be deployed together.

The white version of the Tor logo comes from https://github.com/TheTorProject/tor-media/tree/master/Tor%20Logo if you'd like to have a go at making your own version.

comment:12 Changed 9 months ago by karsten

Yes, I think it looks better in gray.

Do you mind cutting off a few pixels at the top to make it 728 by 455 (not 728 by 479)? That's the dimensions we're using for all graphs.

And can you provide a PDF file with the same dimensions?

If you want to grab this ticket and write the code, feel free to do it.

Changed 9 months ago by irl

Attachment: no-data-available.tar.xz added

XCF, PNG and PDF, resized to the correct dimensions this time

comment:13 Changed 9 months ago by karsten

New PNG and PDF look good! Thanks!

Next step: write the code to include these files if no graph can be generated.

comment:14 Changed 9 months ago by karsten

Owner: changed from iwakeh to karsten

I'll write some code.

comment:15 Changed 9 months ago by iwakeh

Hmm, ok, had this down on the list, b/c of the low prio.
I can add some suggestions for the ideas in comments 5 to 7 to the branch, when reviewing.

Last edited 9 months ago by iwakeh (previous) (diff)

comment:16 Changed 9 months ago by karsten

Owner: changed from karsten to iwakeh
Status: acceptedassigned

Oh! No, I can give it back. I didn't know you were still planning to write this code.

Please find temp commit fe160e2 in my task-25264 branch with the code I wrote so far.

comment:17 Changed 9 months ago by iwakeh

Status: assignedaccepted

comment:18 Changed 9 months ago by iwakeh

Status: acceptedneeds_review

I used the attached pictures and pdf.

Please review this patch branch, which contains one commit fixing the usage of StringBuilder.append and a second one implementing the wanted functionality.
The implementation actually catches R errors and makes the respective no-data file available instead. Java can only guess if an R problem occurred. Thus, every call for R-graph generation is now wrapped in an R function robust_call that creates the no-data objects in case of an error. (String function = ... was moved down in order to avoid a checkstyle complaint).

comment:19 Changed 9 months ago by karsten

Resolution: fixed
Status: needs_reviewclosed

Looks good. Merged and deployed. Closing. Thanks!

comment:20 Changed 9 months ago by karsten

Priority: LowHigh
Resolution: fixed
Status: closedreopened

Hmm, looks like we broke the CSV links. Any link that I open now says "No data available for the given parameters." The code looks right to me. Any idea what's wrong?

comment:21 Changed 9 months ago by iwakeh

These files don't seem to be on the server. Could there be a reason that they don't get written? Permissions (R runs as metrics) and space seem fine.

comment:22 Changed 9 months ago by iwakeh

Errr, no, they only contain the "No data message".

comment:23 Changed 9 months ago by iwakeh

There are errors in rserve.log, unfortunately w/o timestamp. It is not clear from the log, if R was restarted successfully and when ...
Maybe, try to restart making sure all R servers are down first?

comment:24 Changed 9 months ago by karsten

I just restarted Rserve. No change, I think. Hmm.

comment:25 Changed 9 months ago by iwakeh

Status: reopenedaccepted

comment:26 Changed 9 months ago by iwakeh

Status: acceptedneeds_review

Please review this patch commit.

That was a strange error situation. The call list created in java wasn't proper, but worked for graphs and failed for csvs. That's why there was only no-data in csv files, because there was an R error.
Now, the behavior for graphs is the same as before, but csv files always contain their column headers, i.e., the no-data message will not appear. It only shows up when there is an error.

Last edited 9 months ago by iwakeh (previous) (diff)

comment:27 Changed 9 months ago by karsten

Resolution: fixed
Status: needs_reviewclosed

Patch commit looks good. Merged and deployed. This solves the issue. Good bug hunt! Closing. Thanks!

comment:28 Changed 9 months ago by iwakeh

Could you remove the cached csvs from the server? metrics.torproject.org/userstats-relay-country.csv?start=2018-01-04&end=2018-04-04&country=all&events=off is still 'no-data'.

comment:29 Changed 9 months ago by karsten

Oops. Done!

Note: See TracTickets for help on using tickets.