The graph on packages requested from GetTor was broken for six weeks between early June and mid-July, and hardly anybody noticed. The problem (#6275 (moved)) is fixed as of today (thanks kaner!), but this bug makes me think whether we need these statistics at all.
Graphing requested GetTor packages involves quite a bit of code in different places, including: GetTor, metrics-db, metrics-lib, and metrics-web. Each code place periodically needs maintenance: there are bugs such as #6275 (moved), new code like metrics-lib or stem need parsing support if they want to support all metrics data, etc. We added these statistics a few years back upon sponsor request, but that sponsor isn't paying for maintenance anymore (or ever since). GetTor data is of little value for learning interesting stuff about the Tor network as a whole, and we wouldn't add statistics for similar services to metrics for maintenance reasons. Just because we have GetTor statistics is a rather bad reason for keeping them.
I suggest archiving the GetTor stats file and graphs we have and removing all stats generating and processing code from the various code bases.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I just removed GetTor statistics from the metrics website. The data is still there, so if we re-decide about removing statistics, we can easily undo this step. If nobody cares in the next, say, two weeks, I'll nuke everything in metrics-* that is labeled "GetTor".
Can you be more specific what we use GetTor stats for?
Please note that we simply cannot maintain everything. If I have to spend time on maintaining GetTor stats, I cannot spend that time on other things. I'd rather want to focus on the important stuff, and I don't consider GetTor stats that important.
I use gettor stats as part of a funding pitch to explain how people acquire tor. All I really need however, is to answer the question of "what percent of downloads is from https, smtp, xmpp, or bittorrent?" If we can answer this question, I'm find with dropping the gettor graphs themselves.
I also don't understand how this is too much to maintain. It sounds like gettor was broken, not the metrics part. The current gettor installation is a mess and needs to be moved to a new server. Gettor probably needs to be rewritten and to provide stats via some other method than http.
I use the stats to look for peaks in email fetching - this usually happens when another blocking event occurs. It seems like this should just accumulate and graph, I don't know why you're stuck holding the bag when gettor itself breaks.
I also don't understand how this is too much to maintain. It sounds like gettor was broken, not the metrics part. The current gettor installation is a mess and needs to be moved to a new server. Gettor probably needs to be rewritten and to provide stats via some other method than http.
It would be helpful for me if you could be more precise on the points "is a mess" and "needs to be rewritten". In IRC, you said easy install/remove was an issue. Have you read the installation instructions/tried an installation and found it too complicated? If you could give me pointers on how to improve, that would be great.
I use gettor stats as part of a funding pitch to explain how people acquire tor. All I really need however, is to answer the question of "what percent of downloads is from https, smtp, xmpp, or bittorrent?" If we can answer this question, I'm find with dropping the gettor graphs themselves.
We can't answer that question.
I also don't understand how this is too much to maintain. It sounds like gettor was broken, not the metrics part.
Metrics was affected, because GetTor provided ill-formatted data that made metrics choke and send me hourly error emails, so I had to disable parsing temporarily. kaner and I exchanged a few (Trac) emails to track down the problem. When GetTor worked again I had to manually merge its old and new data and re-enable parsing on metrics. This incident cost me a few hours overall.
Apart from that, every piece of code of course needs maintenance. The quickly written GetTor parsing code in metrics-* is no exception there. If it breaks, I need to fix it. Also, this code makes it harder for me to refactor metrics-* which is desperately needed to make it easier to "automate graphing xy" as Roger asks me to do every few months. And if I want to provide a library to parse all metrics code or document all metrics documents on a website, that always includes GetTor statistics, too. If I can throw out parts from metrics-* that are mostly unneeded, I'd really prefer to do that.
The current gettor installation is a mess and needs to be moved to a new server. Gettor probably needs to be rewritten and to provide stats via some other method than http.
How exactly is a GetTor rewrite not going to generate work on my side to adapt the metrics side of things?
What makes me so sad about this thread is that maintenance is taken for granted, and that we don't have any process for removing less used features. "Oh, but it's working, why remove it" is not helpful as a reply. No, we cannot keep every feature we ever built.
So, how do we proceed here? I'd really, really prefer if GetTor statistics on metrics went away. If I write a small Python script that graphs GetTor statistics and we add that to GetTor's repository to be maintained by the GetTor maintainer, can we then keep it out of metrics?
The current gettor installation is a mess and needs to be moved to a new server. Gettor probably needs to be rewritten and to provide stats via some other method than http.
How exactly is a GetTor rewrite not going to generate work on my side to adapt the metrics side of things?
What makes me so sad about this thread is that maintenance is taken for granted, and that we don't have any process for removing less used features. "Oh, but it's working, why remove it" is not helpful as a reply. No, we cannot keep every feature we ever built.
So, how do we proceed here? I'd really, really prefer if GetTor statistics on metrics went away. If I write a small Python script that graphs GetTor statistics and we add that to GetTor's repository to be maintained by the GetTor maintainer, can we then keep it out of metrics?
I'm fine with having a script inside the GetTor repository that I maintain and that does the graphs.
Karsten, I'm confused - why not simply define a format that is expected and fail to graph (but someone can store) it; all the while graphs are regularly generated?
The easy thing is to just ensure that the parser is implemented in GetTor and GetTor attempts to parse the data before sending it onward. In the event that it can't parse the data, we send kaner or me or someone an email about it. The graphs might get choppy from time to time but never in a time sensitive way, never with lost data, etc.
Most of that can happen on the GetTor side - I feel like the rest is implemented on your side already. It seems a bit much to maintain a totally different site for an area of the metrics page that already exists.
I think that we should package GetTor in Debian and do proper releases - We should also ensure that GetTor doesn't send malformed data while also ensuring that Metrics won't barf, wasting Karsten's time, if it should slip through.
Karsten, I'm confused - why not simply define a format that is expected and fail to graph (but someone can store) it; all the while graphs are regularly generated?
The easy thing is to just ensure that the parser is implemented in GetTor and GetTor attempts to parse the data before sending it onward. In the event that it can't parse the data, we send kaner or me or someone an email about it. The graphs might get choppy from time to time but never in a time sensitive way, never with lost data, etc.
What happened in June/early July was just one way for the GetTor-metrics connection to break. There are plenty other problems that maintaining GetTor stats on the metrics side involves. I mentioned some of them in my earlier post: bugs in the metrics-* code, more work to refactor metrics-*, additional code in parsing library, need for documentation of GetTor stats format, etc.
Most of that can happen on the GetTor side - I feel like the rest is implemented on your side already. It seems a bit much to maintain a totally different site for an area of the metrics page that already exists.
The idea was to write a command-line tool that downloads GetTor's existing stats export (which covers months of data, or possibly the entire aggregate history of GetTor operation if we want to) and plots a graph locally. No website involved. If you or Andrew want a graph, you go to your ~/src/gettor/ and run ./plot-stats.py or maybe even ./plot-stats.py --from 2012-06-01 --to 2012-06-30. I would write/document/test that code, and kaner would maintain it in the GetTor repo (thanks for that!). Would that work for you?
Karsten, I'm confused - why not simply define a format that is expected and fail to graph (but someone can store) it; all the while graphs are regularly generated?
The easy thing is to just ensure that the parser is implemented in GetTor and GetTor attempts to parse the data before sending it onward. In the event that it can't parse the data, we send kaner or me or someone an email about it. The graphs might get choppy from time to time but never in a time sensitive way, never with lost data, etc.
What happened in June/early July was just one way for the GetTor-metrics connection to break. There are plenty other problems that maintaining GetTor stats on the metrics side involves. I mentioned some of them in my earlier post: bugs in the metrics-* code, more work to refactor metrics-*, additional code in parsing library, need for documentation of GetTor stats format, etc.
I suppose so. I guess I don't see one breakage as a sign of systemic failure that indicates we should scrap it all.
Most of that can happen on the GetTor side - I feel like the rest is implemented on your side already. It seems a bit much to maintain a totally different site for an area of the metrics page that already exists.
The idea was to write a command-line tool that downloads GetTor's existing stats export (which covers months of data, or possibly the entire aggregate history of GetTor operation if we want to) and plots a graph locally. No website involved. If you or Andrew want a graph, you go to your ~/src/gettor/ and run ./plot-stats.py or maybe even ./plot-stats.py --from 2012-06-01 --to 2012-06-30. I would write/document/test that code, and kaner would maintain it in the GetTor repo (thanks for that!). Would that work for you?
That seems like a bunch of work to do what we're already doing with a less generic interface and it would only be useful for about two people.
I suppose so. I guess I don't see one breakage as a sign of systemic failure that indicates we should scrap it all.
This one breakage and the fact that nobody noticed for a month only made me think of removing GetTor stats from metrics. The more general reason is that GetTor stats are out of scope for metrics. We shouldn't have added them in the first place. GetTor is just one service that is not vital for understanding the Tor network, which is the goal of the metrics project. That's different from other services like BridgeDB or Torperf which I consider in scope. But we wouldn't want to add usage data of other Tor services to metrics. Or rather, I wouldn't want that. There's a huge overhead in specifying a statistics file format, writing a metrics-db plugin to collect the data, extending metrics-lib to parse it, and making metrics-web import it into its database and graph it. I can understand how that's hard to grasp for people who haven't seen the code behind all this.
That seems like a bunch of work to do what we're already doing with a less generic interface and it would only be useful for about two people.
Writing a graphing script is trivial if it reduces future maintenance overhead of 3 other products.
So, you say it would be useful for you and Andrew? Great, will write that graphing script today or tomorrow.
I'm not going to stop you from writing something or changing how these things are done, obviously. :)
Regarding the no one noticing - I admit, I don't know how to check if such things are broken. :(
I really do believe that these statistics have been very useful for understanding the deployment of client software at key points in time.
I do think that it might make sense to just specify a generic graphing interface where given a dataset for "foo" we count the daily instances of "bar" occurring. That is a rather minimal thing, which should never change, which is basically just a string and an integer updated on a daily basis. I'd understand if you don't want to do this but I guess removing the code that already runs seems weird, if only because well, we all noticed (late) and some of us care.
Clearly, we're going to follow your lead on this either way. :)
I'm not going to stop you from writing something or changing how these things are done, obviously. :)
Okay.
Regarding the no one noticing - I admit, I don't know how to check if such things are broken. :(
Well, the graphed line suddenly stopped in the middle of the graph, indicating that there were no newer values available. You can't overlook that. I guess nobody looked at the graph at all, that's why nobody noticed. Well, and if nobody looks at the graph, it can't be as important. For other graphs, I hear from people within a few days if they're broken.
I really do believe that these statistics have been very useful for understanding the deployment of client software at key points in time.
I do think that it might make sense to just specify a generic graphing interface where given a dataset for "foo" we count the daily instances of "bar" occurring. That is a rather minimal thing, which should never change, which is basically just a string and an integer updated on a daily basis. I'd understand if you don't want to do this but I guess removing the code that already runs seems weird, if only because well, we all noticed (late) and some of us care.
Again, there's much more code involved in metrics-* than graphing a line. metrics-db grabs and archives GetTor's stats file, metrics-lib provides a simple parser, metrics-web has the format specification, imports the stats file into a database, has the graphing code written in R, and makes a web form available to customize the graph.
Clearly, we're going to follow your lead on this either way. :)
Glad to hear. How do you like the attached Python script? It requires matplot which is apt-get install python-matplotlib on Debian. Run the script with -h to see what options you have. The result should be quite similar to what you know from metrics, but with a lot less code. I can tweak it if you tell me in what direction.
kaner, can you add the script to GetTor's repo? And can you merge past GetTor stats into the current stats and stop truncating old values? The file won't be downloaded as often anymore, so that should be fine from a bandwidth perspective. Thanks!
kaner, can you add the script to GetTor's repo? And can you merge past GetTor stats into the current stats and stop truncating old values? The file won't be downloaded as often anymore, so that should be fine from a bandwidth perspective. Thanks!
kaner, can you add the script to GetTor's repo? And can you merge past GetTor stats into the current stats and stop truncating old values? The file won't be downloaded as often anymore, so that should be fine from a bandwidth perspective. Thanks!
Any progress here? Thanks!
I've created a ticket for the missing matlab package to be installed in getulum, then I'll test the script.