I need to watch my #1976 (moved) code for a bit, and then merge my pid_control branch down, and then generate some graphs and watch the results closely.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Also, is there a magic parameter to say "most-recent-consensus" for those three urls? I'd like to have this scripted and logging graphs for every consensus, so I can watch the changes as I try tweaking various things.
Ok, I think I just mispasted. I got the urls working now. Still wondering about a "latest-consensus" shortcut for scripting. If it doesn't exist, I guess I could just hack date to somehow give me the last hour UTC? I guess that's what I need?
Ok, with the example consensus in the README, I get 2447 lines of output like:
"We're missing descriptor 5514060a9697a4bd52c206a08c262abb9bf1b66d. Please make sure that all referenced server descriptors are available. Continuing anyway."
I assume it is not finding the descriptors file? I have them here:
metrics-tasks.git/task-2394# ls descriptors/2011-07-13-05-00-00-consensus 2011-07-13-05-00-00-descriptors 2011-07-13-05-00-00-votes
That filename matches the one you have in the README.
Ok, I think I just mispasted. I got the urls working now. Still wondering about a "latest-consensus" shortcut for scripting. If it doesn't exist, I guess I could just hack date to somehow give me the last hour UTC? I guess that's what I need?
No shortcuts. Something like this should work: date -u +%Y-%m-%d-%H-00-00.
Ok, with the example consensus in the README, I get 2447 lines of output like:
"We're missing descriptor 5514060a9697a4bd52c206a08c262abb9bf1b66d. Please make sure that all referenced server descriptors are available. Continuing anyway."
Can you do a git pull? That warning isn't in the code anymore since July.
Also, I tweaked the README a little bit by renaming 2011-07-13-05-00-00-descriptors to 2011-07-13-05-00-00-serverdesc so that it looks more like the URL. Nothing to worry about, but maybe it removes one potential error source.
I do like plot.sh. Thanks for that. Though on line 5 you don't use the $COMMONS variable in your file check. My commons-codec-1.4 is a system package.
I also am getting 500 server errors for the urls. Is there an issue with metrics.tp.o having a delay for current consensuses? I tried some from yesterday though and it still gave me the same 500 error using plot.sh.
Please run git pull again. I fixed line 5. I also subtracted 30 minutes from the current system time and rounded that time to the last full UTC hour. That should give the metrics host enough time to fetch the consensus we're requesting.
The 500 server errors come from the metrics database being overloaded. I just kicked Tomcat, and it works again. If the problem happens again, ask me or weasel to restart Tomcat on yatei. It's always safe to do that, and in this case it even helps. (I'm working on an improved metrics database that doesn't run into these problems, but don't hold your breath.)
Karsten: The votes for urras in these two graphs look odd to me, so I want to just confirm I'm reading them right.
On the relay-votes graph, it looks like 60% of the Guard nodes have a ratio of < 0.1 (I assume 0)? Similarly, 40-50% of the Exits also have a ratio of ~0? That seems to be a lot of nodes with no capacity.. Yet urras's measurements look fine when not in PID mode. Definitely a bug somewhere, it seems..
On the measured-votes graph, it looks like urras hates Guards and Exits so much that their total measured bandwidth is way way below what the consensus has, and so their CDF only goes up to like 10-20% of the consensus total? In other words, you did not scale these graphs (which is fine). Is that also right?
Ok, it looks like that is what is happening. A lot of Guards are getting 0 as a bandwidth vote.. Still not sure why yet.
Also not sure why it also hates Exits.. That might be a different bug, as I don't actually see any Exits with 0 bw in the vote file... It does look like it is hating them less as of consensuses today, though.
I solved the Guard 0 bw bug as well as the exit issues and some other problems, and PID feedback code is now running on 4/5 of our bw auths. We now have a ton of knobs and options to tune, to try to find out which values provide sufficient feedback without breaking the network. I've created #4596 (moved) for that.
Trac: Status: new to closed Resolution: N/Ato fixed Actualpoints: N/Ato 20