That sounds kind of worthwhile. I have wondered whether we should
make Tor log heartbeat kind of messages every hour or maybe every
day.
Roughly saying: Dear operator, I am still alive. I have pushed this
much traffic. I am still in the consensus. Do not worry, all is cool.
Here are some more stats for you to stare at:
[Automatically added by flyspray2trac: Operating System: All]
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Hm. If it's at info, nobody will read it necessarily. If it's at notice, I worry that it might be too verbose
and clog the rest of the notice log if we do it hourly. Maybe make its interval configurable?
This is definitely a candidate for going into its own log domain.
Nice idea Sebastian. It's asking a lot of Bridge Ops to run a service but not provide them with any feedback on its usage. Many will seek their answers from network monitoring solutions that probably disclose more info about their sources than is optimal for an anonymity system.
I'm excited to see this happen. Bumping up its priority, and setting a milestone.
Trac: Description: That sounds kind of worthwhile. I have wondered whether we should
make Tor log heartbeat kind of messages every hour or maybe every
day.
Roughly saying: Dear operator, I am still alive. I have pushed this
much traffic. I am still in the consensus. Do not worry, all is cool.
Here are some more stats for you to stare at:
[Automatically added by flyspray2trac: Operating System: All]
to
That sounds kind of worthwhile. I have wondered whether we should
make Tor log heartbeat kind of messages every hour or maybe every
day.
Roughly saying: Dear operator, I am still alive. I have pushed this
much traffic. I am still in the consensus. Do not worry, all is cool.
Here are some more stats for you to stare at:
[Automatically added by flyspray2trac: Operating System: All] Milestone: N/Ato Tor: 0.2.3.x-final Parent: N/AtoN/A Priority: minor to major
let's ponder on what a relay operator would want to see in a heartbeat log message.
traffic pushed
whether it's present on the consensus
number of connections
average traffic speed
relay flags (or change to the relay flags since last heartbeat)
uptime
Some of the above (like the average traffic speed) may belong more to the role of a controller (yes, arm does a great job displaying such stuff).
We also wouldn't want to polute the logs with a ten-liner even if the interval is big.
Any ideas on what would actually be useful here?
PS: I created a repo in gitorious [http://gitorious.org/logheartbeat/logheartbeat] and I'll try to toy with this. For now, I'm just using Sebastian's code (from his heartbeat branch) but I'm planning on writing something as well :)
I guess I'll add a status update here so that you guys can actually sleep at night.
Heartbeat messages are currently looking like this:
Nov 29 02:13:23.000 [notice] Heartbeat [1290989603]: Tor's uptime is 0:18 , with 5 circuits active.
Nov 29 02:13:23.000 [notice] Since last heartbeat I've read 152330 bytes and pushed 126986 bytes.
Nov 29 02:13:23.000 [notice] Unfortunately, it seems like we are not in the cached consensus. If I were you I'd check this out, there is something fishy going on!
As you can see, I'm tracking the uptime, number of circuit, bandwidth and whether we are in the consensus.
The whole things seems to be working well. Unfortunately, I'm not sure if the code is Windows portable but I've tried not to be really careless.
This is - of course - not ready for production use yet, but if you guys could briefly check out the code it would be great. Any feature requests are welcome as well.
From what I gathered from #tor-dev, people would also like:
Relay flags displayed
"Heartbeat issued" control event
Uptime sent to the control port (although I guess this is not heartbeat related)
and a couple of others I haven't grasped :(
The transfer rates should be human friendly values (5.3 MB, 7.4 GB, etc).
There's an extra space between the uptime and comma.
Will the uptime be human friendly values as well (ex, "4 days")? What is it going to look like once it's in the realm of months? The uptime stat (in seconds since starting tor) would be nice to have via the control port but this is part of another proposal, so don't worry about it.
The indicator for if you're in the consensus is both verbose and, imho, unnecessary. That status isn't likely to change (so saying "Congratulations, you're in the consensus!" every day would be pointless) and it's kinda obvious based on the throughput attribute (you're reachable if you're pushing traffic). Maybe just give this status (with the user friendly advice "something seems to be wrong") once, when it's first detected? This could be separate from the heartbeat (maybe at the warning runlevel).
The heartbeat message is three notice level events? That would be pretty noisy. My preference would be something short and sweet like:
Heartbeat [185]: Tor's uptime is 3 days, with 57 circuits. Since the last heartbeat 87 MB has been pushed.
Trying to turn the heartbeat into a controller substitute packed with every interesting stat there is (including the consensus status, flags, both upload/download rates, etc) goes pretty deeply into the territory of what controllers provide and also risks making this more of an annoyance than help (particularly for controller users who already have this information).
We default to logging at the NOTICE runlevel so doing a daily event that's more than a short sentence means that after a couple weeks the log will be full of these events (drowning out any interesting alerts for things that happened - not so helpful). If we're intent on doing that then I think this should be muted by default (enabled via a flag in the torrc).
If we really want this to be both highly verbose and on by default then I'd prefer if the heartbeat defaults to being muted when users have a ControlPort set (unless overwritten in the torrc) so it avoids bothering controller users.
Will the uptime be human friendly values as well (ex, "4 days")? What is it going to look like once it's in the realm of months? The uptime stat (in seconds since starting tor) would be nice to have via the control port but this is part of another proposal, so don't worry about it.
Saying "y months, x days (started at )" isn't so hard.
The indicator for if you're in the consensus is both verbose and, imho, unnecessary. That status isn't likely to change (so saying "Congratulations, you're in the consensus!" every day would be pointless) and it's kinda obvious based on the throughput attribute (you're reachable if you're pushing traffic). Maybe just give this status (with the user friendly advice "something seems to be wrong") once, when it's first detected? This could be separate from the heartbeat (maybe at the warning runlevel).
This was actually what triggered this report. We should definitely include this information, traffic has normal fluctuations etc and if traffic drops we can indicate that this might be because you dropped from the consensus. If you didn't drop, we can rule out one cause quickly.
Trying to turn the heartbeat into a controller substitute packed with every interesting stat there is (including the consensus status, flags, both upload/download rates, etc) goes pretty deeply into the territory of what controllers provide and also risks making this more of an annoyance than help (particularly for controller users who already have this information).
We default to logging at the NOTICE runlevel so doing a daily event that's more than a short sentence means that after a couple weeks the log will be full of these events (drowning out any interesting alerts for things that happened - not so helpful). If we're intent on doing that then I think this should be muted by default (enabled via a flag in the torrc).
If we really want this to be both highly verbose and on by default then I'd prefer if the heartbeat defaults to being muted when users have a ControlPort set (unless overwritten in the torrc) so it avoids bothering controller users.
This is the other reason I wanted to do this; so I don't need to run a controller with all the additional burdens and security concerns this imposes. Controller users can simply turn off heartbeat log messages, or their controllers can even do it for them.
Gotcha. However, I still think that if this is a common gotcha then a separate warning message that we're occasionally dropping from the consensus is the better way to go.
This is the other reason I wanted to do this
That's fine, but if that's the goal then I don't think it's appropriate for this to be the default. Users should opt-in to a highly verbose log message, and if they want detailed information then controllers already do that in a better fashion (in my not-so-humble opinion).
A short, concise heartbeat message is generally useful. Dumping a load of relay stats in an effort to make controllers irrelevant is both annoying and (I think) misguided.
so I don't need to run a controller with all the additional burdens and security concerns this imposes
We obviously disagree on this controllers-are-bad viewpoint, but that's fine. Again, this is something users should opt-into rather than making it the default. I'm happy for them to have this option but I'm not so thrilled with defaulting to an annoying log dump based on this opinion.
As a compromise how about a short, succinct log message (like the one I proposed earlier) by default, then giving the option for a verbose version if the user wants it?
The transfer rates should be human friendly values (5.3 MB, 7.4 GB, etc).
Yep, this is yet TODO.
Will the uptime be human friendly values as well (ex, "4 days")? What is it going to look like once it's in the realm of months? The uptime stat (in seconds since starting tor) would be nice to have via the control port but this is part of another proposal, so don't worry about it.
Right now, it's based on the uptime(1) format. For example, it will show something like "9 days and 12:34:02". Initially I had added months in there, but I took them off, shouldn't be too hard to implement though.
The indicator for if you're in the consensus is both verbose and, imho, unnecessary. That status isn't likely to change (so saying "Congratulations, you're in the consensus!" every day would be pointless) and it's kinda obvious based on the throughput attribute (you're reachable if you're pushing traffic). Maybe just give this status (with the user friendly advice "something seems to be wrong") once, when it's first detected? This could be separate from the heartbeat (maybe at the warning runlevel).
I understand both parties' arguments on this.
At the moment, I'm only mentioning the consensus entry if our router is not in the consensus.
This is again quite easily configurable.
The heartbeat message is three notice level events? That would be pretty noisy. My preference would be something short and sweet like:
Heartbeat [185]: Tor's uptime is 3 days, with 57 circuits. Since the last heartbeat 87 MB has been pushed.
Yeah, I agree with this. I'll probably do it like that.
Trying to turn the heartbeat into a controller substitute packed with every interesting stat there is (including the consensus status, flags, both upload/download rates, etc) goes pretty deeply into the territory of what controllers provide and also risks making this more of an annoyance than help (particularly for controller users who already have this information).
We default to logging at the NOTICE runlevel so doing a daily event that's more than a short sentence means that after a couple weeks the log will be full of these events (drowning out any interesting alerts for things that happened - not so helpful). If we're intent on doing that then I think this should be muted by default (enabled via a flag in the torrc).
I appreciate controllers, but Sebastian's arguments are valid as well.
I think I'll leave the heartbeat interval as is and leave the Tor dev. team figure what should be done about this - FOSS etcetera.
If we really want this to be both highly verbose and on by default then I'd prefer if the heartbeat defaults to being muted when users have a ControlPort set (unless overwritten in the torrc) so it avoids bothering controller users.
Cheers! -Damian
In my router I have the ControlPort open but I know that I would have appreciated heartbeat messages as well.
What I haven't done yet and should be done, is add a way to disable the heartbeat.
As a compromise how about a short, succinct log message (like the one I proposed earlier) by default, then giving the option for a verbose version if the user wants it?
Yes, that could be done. Raising the options to two for a heartbeat event is a bit of an overkill (for the typical user who doesn't really care), but we can try it.
Yeah, I agree with this. I'll probably do it like that.
If the event is short and sweet then I have no issue with it being on by default. In that case it's useful for both controller users and non-controller alike. :)
Raising the options to two for a heartbeat event is a bit of an overkill
Agreed, though we don't need two options. There could be one flag (say, "HeartbeatVerbosity") that could be set to:
0 = off
1 = single, short sentence (default)
2 = dump of the flags, both upload/download rates, average rates, and whatever else a controller substitute might want
Raising the options to two for a heartbeat event is a bit of an overkill
Agreed, though we don't need two options. There could be one flag (say, "HeartbeatVerbosity") that could be set to:
0 = off
1 = single, short sentence (default)
2 = dump of the flags, both upload/download rates, average rates, and whatever else a controller substitute might want
Cheers! -Damian
Oh sorry, forgot to mention it. There is already a HeartbeatPeriod flag that configures the heartbeat interval :)
On a good day the heartbit message looks like this:
Dec 01 01:35:58.000 [notice] Heartbeat [1291167358]: Tor's uptime is 6:11 , with 98 circuits active, I've pushed 975.89 MB.
In the documentation for the new functions in status.c, you don't explain what most of the functions actually return. Instead of "count circuits", say "return the number of circuits"; instead of "Transform X into Y", say "Return a new string Y containing..." and so on
Let's never display the exact number of bytes; let's always round to the nearest 1k.
Our code style is K&$; we never do
if (a) { b } /* The "else" should be on this line */ else { c; }
Don't cast to "long long unsigned int" before printf on a uint64_t, and don't assume that "%llu" will work for the format. Use the U64_PRINTF_ARG and U64_FORMAT definitions in src/common/compat.h instead.
"If I were you, I'd check this out, something there is something fishy going on!" is not helpful to users; they will just get confused and send us email or file bug reports. If the log message is going to suggest action, it should suggest a concrete course of action that users can follow.
Functions that return things are usually called "get_X()", not "export_X". Also, there is no need to document that a function is "exported"; we assume that the reader knows C and knows what a lack of "static" means.
Be sure to run "make check-spaces"
The manpage should say "Tor server", not "Tor instance", since clients don't have a heartbeat.
Design issues:
What is the point of logging the time as a long anyway? This is meant to be a human-readable thing, and humans can just look at the time of the log message, right?
What else do we envision moving to "status.c" / "status.h" ?
the 1b97bd51 commit in the gitorious repo 1 is an attempt to fix the above mentioned issues.
Apart from that, after a talk with Sebastian in #tor-dev we decided to ship the heartbeat patch with a customizable message.
Basically, the user will be able to select which information should appear in the message.
I'll try to come with a blueprint for this soon. If anyone has any suggestions on how this should be done, do say :)
Oh by the way, the heartbeat message will also appear to clients.
It looks like the libtor_a_SOURCES and noinst_HEADERS lists are now out-of-order.
make check-spaces doesn't pass.
The style still isn't K&R.
commit messages that don't say anything. (e.g., "Various fixes".)
By convention, we try to use "foo_out" as the name for an argument that gets changed by a function. get_traffic_stats's arguments should probably then be called "n_read_out" and "n_written_out".
secs_to_uptime shouldn't take uptime_string as an argument; it makes no sense to do so. (See asprintf documentatino).
In bytes_to_bandwidth you're going to asprintf a uint64_t, you need to use the U64_PRINTF_ARG to make sure it's cast to the right type.
In log_heartbeat, nothing frees any of the allocated strings.
It looks like the libtor_a_SOURCES and noinst_HEADERS lists are now out-of-order.
make check-spaces doesn't pass.
The style still isn't K&R.
commit messages that don't say anything. (e.g., "Various fixes".)
By convention, we try to use "foo_out" as the name for an argument that gets changed by a function. get_traffic_stats's arguments should probably then be called "n_read_out" and "n_written_out".
secs_to_uptime shouldn't take uptime_string as an argument; it makes no sense to do so. (See asprintf documentatino).
In bytes_to_bandwidth you're going to asprintf a uint64_t, you need to use the U64_PRINTF_ARG to make sure it's cast to the right type.
In log_heartbeat, nothing frees any of the allocated strings.
Latest commit addresses those issues 1.
Thanks for the review.
I'm removing the needs_review label, since next step is to implement the custom message feature and not code review.
Trac: Resolution: None to not a bug Status: needs_review to closed
The comment on count_circuits() looks wrong: it returns the total number of circuits, not just the number of open circuits.
The comment on secs_to_uptime() is incomplete: by convention we mention when a function returns a newly allocated string so that the caller knows to free it.
The whitespace in bytes_to_bandwidth is a little wonky.