Opened 10 years ago

Closed 9 years ago

Last modified 7 years ago

#1368 closed enhancement (implemented)

Implement a heartbeat log message

Reported by: Sebastian Owned by:
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: 0.2.0.31
Severity: Keywords:
Cc: Sebastian, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by arma)

That sounds kind of worthwhile. I have wondered whether we should
make Tor log heartbeat kind of messages every hour or maybe every
day.

Roughly saying: Dear operator, I am still alive. I have pushed this
much traffic. I am still in the consensus. Do not worry, all is cool.
Here are some more stats for you to stare at:

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (25)

comment:1 Changed 10 years ago by nickm

Hm. If it's at info, nobody will read it necessarily. If it's at notice, I worry that it might be too verbose
and clog the rest of the notice log if we do it hourly. Maybe make its interval configurable?

This is definitely a candidate for going into its own log domain.

comment:2 Changed 10 years ago by Sebastian

I did think about notice, yes, so that normal people will read it. Configurable interval
sounds good.

comment:3 Changed 9 years ago by zax

Nice idea Sebastian. It's asking a lot of Bridge Ops to run a service but not provide them with any feedback on its usage. Many will seek their answers from network monitoring solutions that probably disclose more info about their sources than is optimal for an anonymity system.

comment:4 Changed 9 years ago by arma

Description: modified (diff)
Milestone: Tor: 0.2.3.x-final
Priority: minormajor

I'm excited to see this happen. Bumping up its priority, and setting a milestone.

comment:5 Changed 9 years ago by asn

Hello,

let's ponder on what a relay operator would want to see in a heartbeat log message.

  • traffic pushed
  • whether it's present on the consensus
  • number of connections
  • average traffic speed
  • relay flags (or change to the relay flags since last heartbeat)
  • uptime

Some of the above (like the average traffic speed) may belong more to the role of a controller (yes, arm does a great job displaying such stuff).
We also wouldn't want to polute the logs with a ten-liner even if the interval is big.

Any ideas on what would actually be useful here?

PS: I created a repo in gitorious http://gitorious.org/logheartbeat/logheartbeat and I'll try to toy with this. For now, I'm just using Sebastian's code (from his heartbeat branch) but I'm planning on writing something as well :)

comment:6 Changed 9 years ago by asn

Greetings,

I guess I'll add a status update here so that you guys can actually sleep at night.

Heartbeat messages are currently looking like this:
Nov 29 02:13:23.000 [notice] Heartbeat [1290989603]: Tor's uptime is 0:18 , with 5 circuits active.
Nov 29 02:13:23.000 [notice] Since last heartbeat I've read 152330 bytes and pushed 126986 bytes.
Nov 29 02:13:23.000 [notice] Unfortunately, it seems like we are not in the cached consensus. If I were you I'd check this out, there is something fishy going on!

As you can see, I'm tracking the uptime, number of circuit, bandwidth and whether we are in the consensus.
The whole things seems to be working well. Unfortunately, I'm not sure if the code is Windows portable but I've tried not to be really careless.

This is - of course - not ready for production use yet, but if you guys could briefly check out the code it would be great. Any feature requests are welcome as well.

From what I gathered from #tor-dev, people would also like:

  • Relay flags displayed
  • "Heartbeat issued" control event
  • Uptime sent to the control port (although I guess this is not heartbeat related)

and a couple of others I haven't grasped :(

comment:7 Changed 9 years ago by atagar

Hi asn, here's my thoughts:

  • The transfer rates should be human friendly values (5.3 MB, 7.4 GB, etc).
  • There's an extra space between the uptime and comma.
  • Will the uptime be human friendly values as well (ex, "4 days")? What is it going to look like once it's in the realm of months? The uptime stat (in seconds since starting tor) would be nice to have via the control port but this is part of another proposal, so don't worry about it.
  • The indicator for if you're in the consensus is both verbose and, imho, unnecessary. That status isn't likely to change (so saying "Congratulations, you're in the consensus!" every day would be pointless) and it's kinda obvious based on the throughput attribute (you're reachable if you're pushing traffic). Maybe just give this status (with the user friendly advice "something seems to be wrong") once, when it's first detected? This could be separate from the heartbeat (maybe at the warning runlevel).
  • The heartbeat message is three notice level events? That would be pretty noisy. My preference would be something short and sweet like:

Heartbeat [185]: Tor's uptime is 3 days, with 57 circuits. Since the last heartbeat 87 MB has been pushed.

Trying to turn the heartbeat into a controller substitute packed with every interesting stat there is (including the consensus status, flags, both upload/download rates, etc) goes pretty deeply into the territory of what controllers provide and also risks making this more of an annoyance than help (particularly for controller users who already have this information).

We default to logging at the NOTICE runlevel so doing a daily event that's more than a short sentence means that after a couple weeks the log will be full of these events (drowning out any interesting alerts for things that happened - not so helpful). If we're intent on doing that then I think this should be muted by default (enabled via a flag in the torrc).

If we really want this to be both highly verbose and on by default then I'd prefer if the heartbeat defaults to being muted when users have a ControlPort set (unless overwritten in the torrc) so it avoids bothering controller users.

Cheers! -Damian

comment:8 in reply to:  7 Changed 9 years ago by Sebastian

Replying to atagar:

  • Will the uptime be human friendly values as well (ex, "4 days")? What is it going to look like once it's in the realm of months? The uptime stat (in seconds since starting tor) would be nice to have via the control port but this is part of another proposal, so don't worry about it.

Saying "y months, x days (started at <date>)" isn't so hard.

  • The indicator for if you're in the consensus is both verbose and, imho, unnecessary. That status isn't likely to change (so saying "Congratulations, you're in the consensus!" every day would be pointless) and it's kinda obvious based on the throughput attribute (you're reachable if you're pushing traffic). Maybe just give this status (with the user friendly advice "something seems to be wrong") once, when it's first detected? This could be separate from the heartbeat (maybe at the warning runlevel).

This was actually what triggered this report. We should definitely include this information, traffic has normal fluctuations etc and if traffic drops we can indicate that this might be because you dropped from the consensus. If you didn't drop, we can rule out one cause quickly.

Trying to turn the heartbeat into a controller substitute packed with every interesting stat there is (including the consensus status, flags, both upload/download rates, etc) goes pretty deeply into the territory of what controllers provide and also risks making this more of an annoyance than help (particularly for controller users who already have this information).

We default to logging at the NOTICE runlevel so doing a daily event that's more than a short sentence means that after a couple weeks the log will be full of these events (drowning out any interesting alerts for things that happened - not so helpful). If we're intent on doing that then I think this should be muted by default (enabled via a flag in the torrc).

If we really want this to be both highly verbose and on by default then I'd prefer if the heartbeat defaults to being muted when users have a ControlPort set (unless overwritten in the torrc) so it avoids bothering controller users.

This is the other reason I wanted to do this; so I don't need to run a controller with all the additional burdens and security concerns this imposes. Controller users can simply turn off heartbeat log messages, or their controllers can even do it for them.

comment:9 Changed 9 years ago by atagar

This was actually what triggered this report...

Gotcha. However, I still think that if this is a common gotcha then a separate warning message that we're occasionally dropping from the consensus is the better way to go.

This is the other reason I wanted to do this

That's fine, but if that's the goal then I don't think it's appropriate for this to be the default. Users should opt-in to a highly verbose log message, and if they want detailed information then controllers already do that in a better fashion (in my not-so-humble opinion).

A short, concise heartbeat message is generally useful. Dumping a load of relay stats in an effort to make controllers irrelevant is both annoying and (I think) misguided.

so I don't need to run a controller with all the additional burdens and security concerns this imposes

We obviously disagree on this controllers-are-bad viewpoint, but that's fine. Again, this is something users should opt-into rather than making it the default. I'm happy for them to have this option but I'm not so thrilled with defaulting to an annoying log dump based on this opinion.

As a compromise how about a short, succinct log message (like the one I proposed earlier) by default, then giving the option for a verbose version if the user wants it?

comment:10 in reply to:  7 Changed 9 years ago by asn

Replying to atagar:

Hi asn, here's my thoughts:

  • The transfer rates should be human friendly values (5.3 MB, 7.4 GB, etc).

Yep, this is yet TODO.

  • Will the uptime be human friendly values as well (ex, "4 days")? What is it going to look like once it's in the realm of months? The uptime stat (in seconds since starting tor) would be nice to have via the control port but this is part of another proposal, so don't worry about it.

Right now, it's based on the uptime(1) format. For example, it will show something like "9 days and 12:34:02". Initially I had added months in there, but I took them off, shouldn't be too hard to implement though.

  • The indicator for if you're in the consensus is both verbose and, imho, unnecessary. That status isn't likely to change (so saying "Congratulations, you're in the consensus!" every day would be pointless) and it's kinda obvious based on the throughput attribute (you're reachable if you're pushing traffic). Maybe just give this status (with the user friendly advice "something seems to be wrong") once, when it's first detected? This could be separate from the heartbeat (maybe at the warning runlevel).

I understand both parties' arguments on this.
At the moment, I'm only mentioning the consensus entry if our router is not in the consensus.
This is again quite easily configurable.

  • The heartbeat message is three notice level events? That would be pretty noisy. My preference would be something short and sweet like:

Heartbeat [185]: Tor's uptime is 3 days, with 57 circuits. Since the last heartbeat 87 MB has been pushed.

Yeah, I agree with this. I'll probably do it like that.

Trying to turn the heartbeat into a controller substitute packed with every interesting stat there is (including the consensus status, flags, both upload/download rates, etc) goes pretty deeply into the territory of what controllers provide and also risks making this more of an annoyance than help (particularly for controller users who already have this information).

We default to logging at the NOTICE runlevel so doing a daily event that's more than a short sentence means that after a couple weeks the log will be full of these events (drowning out any interesting alerts for things that happened - not so helpful). If we're intent on doing that then I think this should be muted by default (enabled via a flag in the torrc).

I appreciate controllers, but Sebastian's arguments are valid as well.
I think I'll leave the heartbeat interval as is and leave the Tor dev. team figure what should be done about this - FOSS etcetera.

If we really want this to be both highly verbose and on by default then I'd prefer if the heartbeat defaults to being muted when users have a ControlPort set (unless overwritten in the torrc) so it avoids bothering controller users.

Cheers! -Damian

In my router I have the ControlPort open but I know that I would have appreciated heartbeat messages as well.
What I haven't done yet and should be done, is add a way to disable the heartbeat.

As a compromise how about a short, succinct log message (like the one I proposed earlier) by default, then giving the option for a verbose version if the user wants it?

Yes, that could be done. Raising the options to two for a heartbeat event is a bit of an overkill (for the typical user who doesn't really care), but we can try it.

Thanks for the comments!

comment:11 Changed 9 years ago by atagar

Yeah, I agree with this. I'll probably do it like that.

If the event is short and sweet then I have no issue with it being on by default. In that case it's useful for both controller users and non-controller alike. :)

Raising the options to two for a heartbeat event is a bit of an overkill

Agreed, though we don't need two options. There could be one flag (say, "HeartbeatVerbosity") that could be set to:
0 = off
1 = single, short sentence (default)
2 = dump of the flags, both upload/download rates, average rates, and whatever else a controller substitute might want

Cheers! -Damian

comment:12 in reply to:  11 Changed 9 years ago by asn

Raising the options to two for a heartbeat event is a bit of an overkill

Agreed, though we don't need two options. There could be one flag (say, "HeartbeatVerbosity") that could be set to:
0 = off
1 = single, short sentence (default)
2 = dump of the flags, both upload/download rates, average rates, and whatever else a controller substitute might want

Cheers! -Damian

Oh sorry, forgot to mention it. There is already a HeartbeatPeriod flag that configures the heartbeat interval :)

comment:13 in reply to:  description Changed 9 years ago by asn

Greetings all!

I think the heartbeat patch is ready for review.

You can find it in the following gitorious repository:
http://gitorious.org/heartbeat

On a good day the heartbit message looks like this:
Dec 01 01:35:58.000 [notice] Heartbeat [1291167358]: Tor's uptime is 6:11 , with 98 circuits active, I've pushed 975.89 MB.

comment:14 Changed 9 years ago by asn

Status: newneeds_review

comment:15 Changed 9 years ago by Sebastian

Looks like the heartbeat branch isn't yet squashed as you had intended to do?

comment:16 Changed 9 years ago by nickm

Minor style issues:

  • In the documentation for the new functions in status.c, you don't explain what most of the functions actually return. Instead of "count circuits", say "return the number of circuits"; instead of "Transform X into Y", say "Return a new string Y containing..." and so on
  • Let's never display the exact number of bytes; let's always round to the nearest 1k.
  • Our code style is K&$; we never do
       if (a) {
          b
       } /* The "else" should be on this line */
       else {
           c;
       }
    
  • Don't cast to "long long unsigned int" before printf on a uint64_t, and don't assume that "%llu" will work for the format. Use the U64_PRINTF_ARG and U64_FORMAT definitions in src/common/compat.h instead.
  • "If I were you, I'd check this out, something there is something fishy going on!" is not helpful to users; they will just get confused and send us email or file bug reports. If the log message is going to suggest action, it should suggest a concrete course of action that users can follow.
  • Functions that return things are usually called "get_X()", not "export_X". Also, there is no need to document that a function is "exported"; we assume that the reader knows C and knows what a lack of "static" means.
  • Be sure to run "make check-spaces"
  • The manpage should say "Tor server", not "Tor instance", since clients don't have a heartbeat.

Design issues:

  • What is the point of logging the time as a long anyway? This is meant to be a human-readable thing, and humans can just look at the time of the log message, right?
  • What else do we envision moving to "status.c" / "status.h" ?

comment:17 Changed 9 years ago by nickm

Version: 0.2.1.250.2.0.31

(Actually, some of the "minor style issues" above are really correctness issues.)

comment:18 Changed 9 years ago by asn

Greetings,

the 1b97bd51 commit in the gitorious repo [1] is an attempt to fix the above mentioned issues.

Apart from that, after a talk with Sebastian in #tor-dev we decided to ship the heartbeat patch with a customizable message.
Basically, the user will be able to select which information should appear in the message.
I'll try to come with a blueprint for this soon. If anyone has any suggestions on how this should be done, do say :)

Oh by the way, the heartbeat message will also appear to clients.

[1]: http://gitorious.org/heartbeat

comment:19 Changed 9 years ago by nickm

Re-reading now:

  • It looks like the libtor_a_SOURCES and noinst_HEADERS lists are now out-of-order.
  • make check-spaces doesn't pass.
  • The style still isn't K&R.
  • commit messages that don't say anything. (e.g., "Various fixes".)
  • By convention, we try to use "foo_out" as the name for an argument that gets changed by a function. get_traffic_stats's arguments should probably then be called "n_read_out" and "n_written_out".
  • secs_to_uptime shouldn't take uptime_string as an argument; it makes no sense to do so. (See asprintf documentatino).
  • In bytes_to_bandwidth you're going to asprintf a uint64_t, you need to use the U64_PRINTF_ARG to make sure it's cast to the right type.
  • In log_heartbeat, nothing frees any of the allocated strings.

comment:20 in reply to:  19 Changed 9 years ago by asn

Resolution: Nonenot a bug
Status: needs_reviewclosed

Replying to nickm:

Re-reading now:

  • It looks like the libtor_a_SOURCES and noinst_HEADERS lists are now out-of-order.
  • make check-spaces doesn't pass.
  • The style still isn't K&R.
  • commit messages that don't say anything. (e.g., "Various fixes".)
  • By convention, we try to use "foo_out" as the name for an argument that gets changed by a function. get_traffic_stats's arguments should probably then be called "n_read_out" and "n_written_out".
  • secs_to_uptime shouldn't take uptime_string as an argument; it makes no sense to do so. (See asprintf documentatino).
  • In bytes_to_bandwidth you're going to asprintf a uint64_t, you need to use the U64_PRINTF_ARG to make sure it's cast to the right type.
  • In log_heartbeat, nothing frees any of the allocated strings.

Latest commit addresses those issues [1].
Thanks for the review.

I'm removing the needs_review label, since next step is to implement the custom message feature and not code review.

[1]: http://gitorious.org/heartbeat/heartbeat/commit/5098c0b700a027fcedd4cfd8b08b3f59ac5fe91c

comment:21 Changed 9 years ago by asn

Resolution: not a bug
Status: closedreopened

(bleh, I accidentally closed the bug entry as well. me is not so familiar with trac.)

comment:22 Changed 9 years ago by asn

Status: reopenedneeds_review

Latest version on gitorious supports client heartbeat messages as well.
I guess I'll put this on needs_review.

Have fun.

comment:23 Changed 9 years ago by nickm

Almost done!

The comment on count_circuits() looks wrong: it returns the total number of circuits, not just the number of _open_ circuits.

The comment on secs_to_uptime() is incomplete: by convention we mention when a function returns a _newly allocated_ string so that the caller knows to free it.

The whitespace in bytes_to_bandwidth is a little wonky.

I'll clean those up and merge to master.

comment:24 Changed 9 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Boom. Tweaked, rebased, squashed, merged. Thanks, everybody!

comment:25 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.