What I would like to see are the full isolation parameters like SOCKS listen IP address / port / username / password (if the circuit was bound) and traffic stats (received / sent bytes).
Those traffic stats would also be really nice to have in both circuit and stream CLOSED events.
I'm willing to work on this task, as it seems a good first issue on the project.
If I understand well, that's a control socket that will only be used for monitoring or debug,
so the size of the output may not be a problem?
I looked at the origin_circuit_st struct and wanted to be sure that I understood the meaning and
importance of the different attributes in the call result.
So I made a listing of the printed attributes, as well as the one I intend to add on the output
and the one remaining, so you can assert that I did not misunderstood some of them.
The following attributes seem to be used in the controller output:
global_identifier
build_state
cpath
hs_ident
rend_data
socks_username_len
socks_password_len
socks_username
socks_password
The following attributes seem to be interesting to add with the proposed keyword:
This is an interesting design question: do we want to expose everything here, or just stuff where we have a specific use-case for having it? Ideally, we should only expose things where any implementation of the Tor protocol might also want to expose them.
I'd suggest ignoring isolation stuff for now, since there is work on progress for that stuff with ticket #19859 (moved). That includes all the stuff isolate_*, nym_epoch, and the things between client_proto_type and associated_isolated_stream_global_id in the source code. If we expose those, we'll want to do so in some way that's compatible with how we expose the same information for streams.
For implementation strategy, I suggest the following:
Let's pick an fields set of options to add. Probably some of the above fields will be easier and
Then, let's get documentation for control-spec.txt that explains the new fields, so we can make sure that we are explaining them right. This is spec documentation, so it needs to explain what the fields are guaranteed to mean, not what they actually mean in today's Tor. If there are any fields that might go away in the future, let's document that too.
Once that's settled, let's figure out how we do tests for these. One option is a unit test that constructs a dummy origin_circuit_t object and calls the appropriate function to serialize it for a controller. Another option would be a new test in Stem.
Once we have a specification and a testing strategy, let's start merging the new fields. Let's do a branch that implements just one or two of them at first, so that we can make sure we're on the sampe page about implementation and testing strategy. After that's done, we can figure out our best approach for doing the rest.
(Next I'll look at each of the options you mentioned above and make some suggestions.)
I looked at the origin_circuit_st struct and wanted to be sure that I understood the meaning and
importance of the different attributes in the call result.
That's a good place to start! I'd suggest also looking at the members of circuit_t, since every origin_circuit_t contains a circuit_t object. If you want to do an initial breakdown of those, I can look at that list too.
The following attributes seem to be interesting to add with the proposed keyword:
These will be a little tricky to document: they all have somewhat complicated logic. They were all originally used to implement code of the form "Remember when X has happened, so we can do Y", and that's what their documentation says. But it's possible that the conditions when we want to do Y, or the ways in which we respond to X, will change over time. So we should document which condition we actually want each of the controller flags to represent: does it represent "this is a field where X has happened" or "this is a field where we Y"? We should carefully document what each one actually means in practice.
This makes the following attributes remaining non-printed:
p_streams
half_streams
guard_state
These might be worth potentially exposing some information about in the future, but it doesn't have to be now.
relay_early_cells_sent
This is worth exposing, and easy to expose.
global_origin_circuit_list_idx
This is an implementation detail and should not be exposed.
pathbias_probe_id
pathbias_probe_nonce
hs_service_side_rend_circ_has_been_relaunched
I don't think there's any reason to expose these, but I could be wrong.
relay_early_commands
I think we should remove this field; it doesn't seem to be doing anything useful any more, and nobody has encountered bug #878 (moved) for a long time.
next_stream_id
We could expose this, but I don't see any reason to do so.