Design/implement method for collecting/reporting statistics

changed milestone to %Tor: 0.2.5.x-final

added component::core tor/tor milestone::Tor: 0.2.5.x-final parent::7357 performance priority::medium resolution::implemented simulation statistics status::closed tor-client tor-relay type::enhancement labels

Trac:
Type: defect to enhancement

Should we be taking the approach of sending events to the ControlPort and letting an external process turn them into statistics for anything that we're not sending up to daddy? Or are we OK with doing the stats inside of Tor itself. I'm personally in favor of the latter as it minimizes effort required to get useful stats and requires less duplication of code in the long run.

Trac:
Description: I envision a stats module that is responsible for processing all statistics gathered anywhere in Tor. This module would be responsible for making decisions on where they get published (ORPort, DirAuth) and how much scrubbing/aggregation is required for those destinations.

If we like that approach, implementation likely involves porting currently collected stats to use the new "stats module knows best".

I'll be working in my github repo in branch "morestats": https://github.com/robgjansen/torclone.git

to

I envision a stats module that is responsible for processing all statistics gathered anywhere in Tor. This module would be responsible for making decisions on where they get published (ControlPort, DirAuth) and how much scrubbing/aggregation is required for those destinations.

If we like that approach, implementation likely involves porting currently collected stats to use the new "stats module knows best".

I'll be working in my github repo in branch "morestats": https://github.com/robgjansen/torclone.git

Trac:
Cc: arma, karsten, robgjansen to arma, karsten, robgjansen, nickm

Trac:
Milestone: Tor: 0.2.4.x-final to Tor: unspecified

Replying to robgjansen:

Should we be taking the approach of sending events to the ControlPort and letting an external process turn them into statistics for anything that we're not sending up to daddy? Or are we OK with doing the stats inside of Tor itself. I'm personally in favor of the latter as it minimizes effort required to get useful stats and requires less duplication of code in the long run.

Letting Tor do it sounds convenient, but only up to the point where we want to change something. Turning observations into statistics may depend on what we want to do with the statistics. If we change our mind there, we'll have to fix that in Tor. It's probably quite inconvenient for simulations if older Tor versions report different statistics than newer versions. I'd rather favor a variant where control port events contain the raw data and where it's up to the application, here Shadow, how to aggregate these data points to get something useful. I bet there's good support for processing control port events in stem that we can use here. (That's just my 2 cents.)

In general, having a common stats module that either prepares control port events, or aggregates data for being sent to the directory authorities, or both, makes sense. This is a major refactoring effort, so I'd be careful mixing that with adding new stats.

In #7358 (moved), I asked you for which statistics you were planning to emit asynchronous control port events whenever there's a change, or to emit events periodically. You asked me to move that question here. So, for example, the CIRC, STREAM, ORCONN, BUILDTIMEOUT_SET, and CIRC_MINOR events (Sections 4.1.1, .2, .3, .16, and .19) are emitted whenever there's a change, whereas BW and STREAM_BW (Sections 4.1.4 and .13) are emitted every second. In fact, the mentioned events already cover some of the statistics you were looking for. We'll probably want to design the remaining control port events similar to these.

Replying to karsten:

Replying to robgjansen:

Should we be taking the approach of sending events to the ControlPort and letting an external process turn them into statistics for anything that we're not sending up to daddy? Or are we OK with doing the stats inside of Tor itself. I'm personally in favor of the latter as it minimizes effort required to get useful stats and requires less duplication of code in the long run.

Letting Tor do it sounds convenient, but only up to the point where we want to change something. Turning observations into statistics may depend on what we want to do with the statistics. If we change our mind there, we'll have to fix that in Tor. It's probably quite inconvenient for simulations if older Tor versions report different statistics than newer versions. I'd rather favor a variant where control port events contain the raw data and where it's up to the application, here Shadow, how to aggregate these data points to get something useful. I bet there's good support for processing control port events in stem that we can use here. (That's just my 2 cents.)

I've thought about this and reviewed the Tor controller spec. I agree. Thanks for your insight!

In general, having a common stats module that either prepares control port events, or aggregates data for being sent to the directory authorities, or both, makes sense. This is a major refactoring effort, so I'd be careful mixing that with adding new stats.

Yeah, good point. I'll focus on minimizing changes and only adding the desired event types to collect the new stats.

In #7358 (moved), I asked you for which statistics you were planning to emit asynchronous control port events whenever there's a change, or to emit events periodically. You asked me to move that question here. So, for example, the CIRC, STREAM, ORCONN, BUILDTIMEOUT_SET, and CIRC_MINOR events (Sections 4.1.1, .2, .3, .16, and .19) are emitted whenever there's a change, whereas BW and STREAM_BW (Sections 4.1.4 and .13) are emitted every second. In fact, the mentioned events already cover some of the statistics you were looking for. We'll probably want to design the remaining control port events similar to these.

I see. More soon on this...

We made some progress here. Proposal 218 defines four new controller events that shall help understand connection and circuit usage. Branch morestats3 in my public tor branch (which is based on Rob's morestats branch) has a tested implementation that could use some review before merging it into 0.2.5.x.

Trac:
Milestone: Tor: unspecified to Tor: 0.2.5.x-final
Status: new to needs_review

Please review branch morestats4 in my public repo. Rebased to master and tested using Shadow.

quick comments:

General:

This needs to go along with a patch to control-spec.txt .

e54d664f7bb1205162c1df3495f8ebc30c23d867

The last time we added K=V arguments in the middle of controller event, it created trouble for controllers, even though it wasn't supposed to do so, since the spec says we can. Since nobody's actually hacking on Vidalia, would it be possible to just put the new K=V arguments at the end, to avoid possible trouble?

8d1f78c556abb570bb80ea84261c954d9746cf33

Instead of being controlled by TestingTorNetwork, should there be a separate option for EVENT_CONN_BW events?
The documentation for n_read and n_written should say that they're only used for EVENT_CONN_BW events, and only turned on conditionally. They should also probably have a name that implies that they are only for stats too.
The new code in control_update_global_event_mask should get extracted into a function.

c386d2d6ce4c4f58163acb385c7a5de1da8c5e30

"extern circuit_t *global_circuitlist" is really not a great way to mediate access to this thing; we should add an accessor function.
cell_command_to_string doesn't really belong in control.c
append_cell_stats_by_command doesn't seem to document how long the arrays are. I might be a bit more comfortable if it took a structure instead. This
Accessing num_used in smartlist_t is now how we do it. Call smartlist_len like every other place in the code.
include_if_positive is a weird name for an unsigned type. It can't be negative, after all.
control_event_circuit_cell_stats seems to be really huge and full of duplicated code. Can it be refactored into smaller functions?
Is it "exitward" or "exit_ward"? I prefer the former.
Prefer smartlist_add_asprintf() to tor_asprintf();smartlist_add().
Is this really slow in practice? It seems like it's likely to be really slow.
Ditto about having another controlling option. We don't want TestingTorNetwork to do everything at once.
CELL_COMMAND_MAX would be a better name than CELL_MAX
Prefer TOR_SIMPLEQ to manual list management for new queues.
This data structure ... how does it work out in practice? It looks like it's going to be hideously inefficient.
This code needs unit tests and functional abstraction badly.

dd5ce2157d8c47ffa3686b0579813e7b1aae8069

The documentation for the new global_*_emptied thing doesn't really say what they are. Do you mean "Last time at which the ... buckets were emptied in msec since midnight", or something else?
"connection_buckets_empty_ts" -- I can't read that name as a verb phrase. Our practice is to try to make function that do things have an active verb in their names.
The int cast in that function doesn't give me a great feeling.
Here too, TestingTorNetwork should not mean everything.
The code to convert time to "msec since midnight" is duplicated code. Let's try not to duplicate code.
Holy moly, it's a 13-argument function!!!!! Please, let's be better than that.

2925e2fe786dfd9a27c2dff503996712cd180de4

See above about the name and documentation for n_read and n_written.
The new code in control_update_global_event_mask should get extracted into a function.

Trac:
Status: needs_review to needs_revision

Replying to nickm:

e54d664f7bb1205162c1df3495f8ebc30c23d867

The last time we added K=V arguments in the middle of controller event, it created trouble for controllers, even though it wasn't supposed to do so, since the spec says we can. Since nobody's actually hacking on Vidalia, would it be possible to just put the new K=V arguments at the end, to avoid possible trouble?

It only created trouble for the PyTorCtl-based Python crap. Vidalia handled the new keyword arguments fine.

Started looking at some of the fixes when I meant to look at something else. Thought I'd review anyway, since why not.

a0c5080a48ed7612c72099c8c96dceef00be117e

Aren't we going to need a space before the ID= ? I don't see anything to put it there; am I missing it?

16e7f7889a7869220535b76af5c8ddb457068c20

Thanks! control_event_circuit_cell_stats() looks downright readable now.

Replying to nickm:

Started looking at some of the fixes when I meant to look at something else. Thought I'd review anyway, since why not.

a0c5080a48ed7612c72099c8c96dceef00be117e

Aren't we going to need a space before the ID= ? I don't see anything to put it there; am I missing it?

Oops. Yes, I should put back the space where it belongs. (I didn't really test this branch yet, other than compiling it, because I still need to make more changes.)

16e7f7889a7869220535b76af5c8ddb457068c20

Thanks! control_event_circuit_cell_stats() looks downright readable now.

Cool!

I'll let you know when I'm done with your list of comments. It's a long list, but it's also a good list. Might keep me busy a few more hours.

Replying to nickm:

quick comments:

I finally made it through the list! Please see my updated morestats4 branch. It compiles, makes check-spaces happy, passes --enable-gcc-warnings, and runs in Shadow, but still I expect we'll need another iteration or two. Below I'm replying to all comments that require more than just: "done."

General:

This needs to go along with a patch to control-spec.txt .

Sure. I converted proposal 218 into a control-spec.txt patch here:

https://gitweb.torproject.org/user/karsten/torspec.git/shortlog/refs/heads/morestats4

e54d664f7bb1205162c1df3495f8ebc30c23d867

The last time we added K=V arguments in the middle of controller event, it created trouble for controllers, even though it wasn't supposed to do so, since the spec says we can. Since nobody's actually hacking on Vidalia, would it be possible to just put the new K=V arguments at the end, to avoid possible trouble?

rransom, I agree that sane controllers should handle this, but I also agree that we should change this, just to be safe.

8d1f78c556abb570bb80ea84261c954d9746cf33

Instead of being controlled by TestingTorNetwork, should there be a separate option for EVENT_CONN_BW events?

Added a new option TestingEnableConnBwEvent that depends on TestingTorNetwork being set, is 0 by default, and is changed to 1 if TestingTorNetwork is set. I also added similar options for the other new events that are specified for test networks only.

c386d2d6ce4c4f58163acb385c7a5de1da8c5e30

cell_command_to_string doesn't really belong in control.c

Moved to command.c, though I'm not entirely sure it belongs there. If not, where would you want it to be?

append_cell_stats_by_command doesn't seem to document how long the arrays are. I might be a bit more comfortable if it took a structure instead. This

Changed to a structure. How does the last sentence end?

Is this really slow in practice? It seems like it's likely to be really slow.

Do you mean slow or fast in your second sentence?

If you refer to the comment explaining why we're re-using arrays for all circuits, maybe I didn't just mean slow, but also bad for memory fragmentation.

Prefer TOR_SIMPLEQ to manual list management for new queues.

This data structure ... how does it work out in practice? It looks like it's going to be hideously inefficient.

I can't really think of a more efficient data structure. What do you have in mind? (I'll hold off changing the code to TOR_SIMPLEQ until I'm sure we want to use a queue there.)

This code needs unit tests and functional abstraction badly.

I think functional abstraction is improved and code is now easier to test. I didn't write tests yet, because I'm not quite sure where to start. Would you mind writing one example test that I could use as a start for these functions?

connection_buckets_note_empty_ts in connection.c
bucket_millis_empty in connection.c
sum_up_cell_stats_by_command in control.c
append_cell_stats_by_command in control.c
format_cell_stats in control.c Are there any other new functions that need testing badly?

Thanks!

Trac:
Status: needs_revision to needs_review

Replying to karsten:

Replying to nickm:

quick comments:

I finally made it through the list! Please see my updated morestats4 branch. It compiles, makes check-spaces happy, passes --enable-gcc-warnings, and runs in Shadow, but still I expect we'll need another iteration or two. Below I'm replying to all comments that require more than just: "done."

Okay. It's looking better. Thanks there!

append_cell_stats_by_command doesn't seem to document how long the arrays are. I might be a bit more comfortable if it took a structure instead. This

Changed to a structure. How does the last sentence end?

Probably explaining why a structure would be a good idea, I guess.

Is this really slow in practice? It seems like it's likely to be really slow.

Do you mean slow or fast in your second sentence?

If you refer to the comment explaining why we're re-using arrays for all circuits, maybe I didn't just mean slow, but also bad for memory fragmentation.

I'm asking here (and again below) what kind of a performance hit gets taken for turning this feature on.

Prefer TOR_SIMPLEQ to manual list management for new queues.

This data structure ... how does it work out in practice? It looks like it's going to be hideously inefficient.

I can't really think of a more efficient data structure. What do you have in mind? (I'll hold off changing the code to TOR_SIMPLEQ until I'm sure we want to use a queue there.)

Hm. Assuming a 64-bit architecture, this is going to eat an 8 byte pointer for the "next" element, plus 8 bytes for chunk element overhead, for every 5-byte entry we need to hold. We could do better by using some kind of dynamic array structure, which wouldn't have a per-element overhead...

but before we go and do that, we should know whether that actually matters. Hence my asking about how it actually works out in practice. If the memory consumption on a busy relay isn't much comparatively, then there's not much need to re-do this.

This code needs unit tests and functional abstraction badly.

I think functional abstraction is improved and code is now easier to test. I didn't write tests yet, because I'm not quite sure where to start. Would you mind writing one example test that I could use as a start for these functions?

connection_buckets_note_empty_ts in connection.c

bucket_millis_empty in connection.c

sum_up_cell_stats_by_command in control.c

append_cell_stats_by_command in control.c

format_cell_stats in control.c

Well, format_cell_stats is pretty easy: you would make one or more circuit_t* object with the cell statistics you want set on it, and then pass them to format_cell_stats, and then check whether the output is as you expect and whether the circuit is munged. You could stick this in an existing test_.c module in src/tests/ , or add a new module as appropriate. For information on writing actual tinytest tests, see src/ext/tinytest.

Is there more you need to know there? I'm not clear on what aspect of writing unit tests isn't making sense.

Are there any other new functions that need testing badly?

Our default should not be "let's test the functions that need it badly". Our default should be "test whatever new and changed functions we can."

Replying to nickm:

Replying to karsten:

Replying to nickm:

Prefer TOR_SIMPLEQ to manual list management for new queues.

This data structure ... how does it work out in practice? It looks like it's going to be hideously inefficient.

I can't really think of a more efficient data structure. What do you have in mind? (I'll hold off changing the code to TOR_SIMPLEQ until I'm sure we want to use a queue there.)

Hm. Assuming a 64-bit architecture, this is going to eat an 8 byte pointer for the "next" element, plus 8 bytes for chunk element overhead, for every 5-byte entry we need to hold. We could do better by using some kind of dynamic array structure, which wouldn't have a per-element overhead...

I'm currently running somewhat bigger Shadow simulations to compare the overhead, but that's going to take a while.

but before we go and do that, we should know whether that actually matters. Hence my asking about how it actually works out in practice. If the memory consumption on a busy relay isn't much comparatively, then there's not much need to re-do this.

I wonder, if insertion_command_queue_t is an inefficient data structure, wouldn't that also apply to insertion_time_queue_t? Should we make both of them more efficient? If so, can you explain your dynamic array structure idea in some more detail? Maybe I can give that a try next week.

This code needs unit tests and functional abstraction badly.

I think functional abstraction is improved and code is now easier to test. I didn't write tests yet, because I'm not quite sure where to start. Would you mind writing one example test that I could use as a start for these functions?

connection_buckets_note_empty_ts in connection.c

bucket_millis_empty in connection.c

sum_up_cell_stats_by_command in control.c

append_cell_stats_by_command in control.c

format_cell_stats in control.c

Well, format_cell_stats is pretty easy: you would make one or more circuit_t* object with the cell statistics you want set on it, and then pass them to format_cell_stats, and then check whether the output is as you expect and whether the circuit is munged. You could stick this in an existing test_.c module in src/tests/ , or add a new module as appropriate. For information on writing actual tinytest tests, see src/ext/tinytest.

Is there more you need to know there? I'm not clear on what aspect of writing unit tests isn't making sense.

The part where these tests should go was a bit unclear to me, but now it's clearer, I think. Please see my updated morestats4 branch for a first attempt.

Are there any other new functions that need testing badly?

Our default should not be "let's test the functions that need it badly". Our default should be "test whatever new and changed functions we can."

Right, I meant the latter. :)

So, my questions about efficiency and performance are really questions, and not rhetorical. I'm asking: in practice, when you turn this feature on and run a reasonably busy simulation, is the performance/memory hit acceptable or not?

I'm asking that because I suspect that it's going to consume a whole lot of memory. And if it does, we can make it more efficient. But if it doesn't (and you've tested it out, so I'm hoping you will know), then there is not any point in putting the effort into making it more efficient.

Does that make sense?

More issues on review:

You're passing struct timeval on the stack; our convention is to pass around a "const struct timeval *" instead.
What keeps the various bandwidth events from getting enabled when not in Testing mode? I'm seeing that for some but not all of the other new event types.
The uint64_t * pointer arguments to append_cell_stats_by_command should be const. That function's documentation should document the lengths of those arrays.

Something to do AFTER merge:

Go through all the core functions that grew new if blocks here, and move the bodies of those if blocks into new functions. We have lots of places in Tor where ancillary functionality is crammed into core functions, but we shouldn't perpetuate that.

For the next branch like this:

Please learn the --fixup and --squah arguments to git commit.

Trac:
Status: needs_review to needs_revision

Replying to nickm:

So, my questions about efficiency and performance are really questions, and not rhetorical. I'm asking: in practice, when you turn this feature on and run a reasonably busy simulation, is the performance/memory hit acceptable or not?

I'm asking that because I suspect that it's going to consume a whole lot of memory. And if it does, we can make it more efficient. But if it doesn't (and you've tested it out, so I'm hoping you will know), then there is not any point in putting the effort into making it more efficient.

Of interest here is recent functionality I've added to Shadow. It now has the ability to track memory usage of simulated nodes over time (see this issue on GitHub). In the XML file, you include the 'heartbeatloginfo' attribute like

<node id="relay" heartbeatloginfo="node,ram,socket" ...>
...

For this ticket, you probably only need heartbeatloginfo="ram", and only on the nodes that you want to profile because Shadow will consume significantly more RAM to track the malloced pointers and sizes.

Once your simulation is complete, the contrib/analyze.py script will parse allocated, deallocated, and running total RAM usage over time from the scallion.log file, and can also be used to graph the results.

Trac:
karsten-morestats4-mem-total.pdf

Design/implement method for collecting/reporting statistics

Child items ...

Activity