Opened 9 years ago

Closed 8 years ago

#1841 closed task (wontfix)

Implement node churn and uptime statistics

Reported by: kjbbb Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I spent some time this summer designing a schema to support tracking of relay uptime and churn statistics. The relay churn statistic should be split up by platform, version, and guard/exit status for a more fine-tuned insight into the network. The uptime statistic should be split into guard/exit status and version, as it only sees individual platforms. Also, the data the query returns currently is good for a time graph (similar to Karsten's Windows relay uptime graph), but it could be portrayed as a box-plot distribution.

Relay churn is calculated from the unique routers from one week/month/year that appear in the following week/month/year, and is relatively straightforward to calculate. However, this query could use some optimization because it takes a very long time to group individual routers by the times they appear.

Relay uptime is more difficult to calculate with a database query because "uptime sessions" need to be calculated in order to get a correct average. This is near impossible to do with a database query, and must be done programatically (with cursors in pl/pgsql or elsewhere).

Child Tickets

Change History (3)

comment:1 Changed 9 years ago by karsten

Status: newassigned
Summary: Node churn and uptime statisticsImplement node churn and uptime statistics

We should merge this code into metrics-db. As discussed on #tor-dev today, Kevin is going to cherry-pick/interactively rebase his development branch to come up with a branch that implements just the churn and uptime statistics.

Here's also an idea for graphing churn and uptime: I found it useful to visualize uptime sessions using empirical cumulative distribution functions, like in this graph on uptime sessions by platforms. ECDFs are not built into ggplot2 directly, so one has to transform the data manually and draw a line plot of them. Here's an example:

library(ggplot2)
t <- rnorm(1000)
ts <- sort(t)
data <- data.frame(x = ts, y = (1:length(ts)) / length(ts))
ggplot(data) + geom_line(aes(x=x,y=y))

comment:2 Changed 9 years ago by karsten

Component: MetricsMetrics Website

comment:3 Changed 8 years ago by karsten

Cc: karsten.loesing@… removed
Resolution: wontfix
Status: assignedclosed

There's a tech report about node stability which is related to this ticket. I think it's more useful to discuss node churn based on that report than to add new graphs to the metrics website.

Closing.

Note: See TracTickets for help on using tickets.