improve docs for config var abstraction

Trac:
Parent Ticket: #29211 (moved)

added 043-should 044-deferred component::core tor/tor doc milestone::Tor: unspecified owner::nickm parent::29211 points::2 priority::medium severity::normal status::assigned type::enhancement labels

Trac:
Parent: N/A to #29211 (moved)

Trac:
Cc: N/A to teor

These changes need to be done (or at least triaged) before we continue work on #29211 (moved).

Trac:
Status: new to assigned
Milestone: Tor: unspecified to Tor: 0.4.2.x-final
Owner: N/A to nickm

Mark a number of current 0.4.2.x "defects" as "enhancements."

Trac:
Type: defect to enhancement

Mark some assigned tickets as 042-should.

Trac:
Keywords: N/A deleted, 042-should added

I've been trying to figure out which concrete steps would help the most here. I think that clarity on the final API, and clarity on the words "type" and "variable" are the biggest requests.

The variables here might also be called "options" or "fields" or "settables" or "configurables". Each one is a C value that maps to a named option in a configuration or state file. I'm okay renaming these from "variables" to one of the other things if we have a good "other thing" to rename them to.

In the C code, each "variable" is a member of a some configuration object. Each module can have its own configuration object. These objects are registered at startup with a central "configuration manager", which is responsible for parsing configurations, telling modules about new configurations, and so on.

The implementation for these variables comes in 4 layers:

The lowest level is the "typed_var" layer, which views the C value as a void *, and views the configuration value as a string. This is the layer that knows how to encode, decode, copy, etc. The set of functions that does the encoding/decoding/copying/etc defines the type of the variable. "Codec" or "manipulator" might be another good name for this.
One level higher is the "struct_member" layer, which views the C value as stored at a given offset within a structure.
One level higher is the "config_var" layer. This layer knows the names of different configuration values, and knows that some values may be obsolete, deprecated.
At the highest level is the "managed_var" layer. It is an internal object used by the configmgr code to keep track of which variables correspond to which objects.

Each layer is consumed by the layer above it. Additionally, layer 1 is the layer at which you can declare new "types" (codecs? manipulators?). Layer 3 is the layer at which modules declare their variables.

Here's what I have in mind for the final artchitecture.

There are three main users of the configuration system.

There is type code, which wants to declare new "types" (codecs? manipulators?) that modules can use for their data. (This kind of user #includes var_type_def_st.h, and defines a new var_type_def_t.)
There are modules which want to declare configuration or state variables, and learn what their values are, and learn when those values change. They declare a structure for their configuration and/or state, and a table of config_var_t mapping configuration/state variables to the fields of that structure. They expose this information via the subsystem API. (There is not yet a separate example of this; all users of get_options() are currently taking this role, as is the variable-declaration part of config.c.)
There is the application code, which wants to load, reload, change, or dump configuration values, and make sure that the right modules find out about it. (This kind of user uses the "confmgr.h" API to combine the mapping tables from multiple modules, and to manipulate the correct fields in their configuration/state objects.)

On renaming:

typed_var_t could be "c data", a "c object", an "encodeable", a "manipulatable".

struct_member_t could probably become an implementation detail of config_var_t.

config_var_t could be an "option", "field", "setting", "member", "entry".

var_type_def_t could be "encoder", "encoding", "codec", "manipulator", "manip".

Do any of these sound like good changes?

Edited to add: left to my own devices, I would rename typed_var to c_data, make struct_member more hidden, leave var_type_def alone or rename it to c_cfg_codec, and leave config_var alone or rename it to cfg_option.

On documentation:

I think the best place to document all this is probably in the top-level doxygen comments in lib/conf/conftypes.h and lib/confparse/confmgt.h (to be renamed from confparse.h), and that the right way to do so is probably by copying/adapting the text above.

Does that sound like a good way to do this?

Trac:
Cc: teor to teor, catalyst

I am happy moving forward with the architecture and documentation as you describe.

I have some opinions about naming. Naming is hard. There are lots of good options. And different people may prefer different options.

We are parsing a config. I wonder if it would help to use standard parsing jargon.

Replying to nickm:

On renaming:

typed_var_t could be "c data", a "c object", an "encodeable", a "manipulatable".

This isn't quite a token, because it has a type. But it's also doesn't quite feel like a variable, because it can be part of a config option's value (rather than the entire value). It's implemented as a string and its (possibly binary) equivalent.

I'd like to consider "field" or "element" or some other term here. If there's another appropriate word from parsing, codecs, or protocol design, I'd happily use that.

struct_member_t could probably become an implementation detail of config_var_t.

+1

config_var_t could be an "option", "field", "setting", "member", "entry".

option is the term we use in the rest of the code, so let's stick with that, unless you think it will cause confusion.

var_type_def_t could be "encoder", "encoding", "codec", "manipulator", "manip".

I think "codec" is close, but it's usually used for binary. We're doing serialisation and deserialisation to text - is there a more precise term? How about "format"?

Do any of these sound like good changes?

Edited to add: left to my own devices, I would rename typed_var to c_data, make struct_member more hidden, leave var_type_def alone or rename it to c_cfg_codec, and leave config_var alone or rename it to cfg_option.

If we do this, it will happen in 0.4.3.

(Should I go ahead and do it?)

Trac:
Milestone: Tor: 0.4.2.x-final to Tor: 0.4.3.x-final

Trac:
Keywords: 042-should deleted, 043-should doc added

No more sponsor 31. All this tickets remained open after sponsor 31 ended.

Trac:
Sponsor: Sponsor31-can to N/A

Ping on this ticket: I can make the changes that teor and I discussed above, but I'd like to know whether you think they make a good start, Catalyst.

Current plan from network team meeting is to go ahead with the changes above, amended by Teor's suggested names above. We can open tickets for more docs and renaming in the future, and treat this as an incremental improvement.

0.4.3 was released: Move non merge-ready 0.4.3 tickets to 044.

Trac:
Milestone: Tor: 0.4.3.x-final to Tor: 0.4.4.x-final

Bulk-remove tickets from 0.4.4. Add the 044-deferred label to them.

Trac:
Milestone: Tor: 0.4.4.x-final to Tor: unspecified
Keywords: 043-should doc deleted, 044-deferred, 043-should, doc added

changed time estimate to 16h

mentioned in issue #31516 (moved)

mentioned in issue #29211 (moved)

moved to tpo/core/tor#31078 (closed)

improve docs for config var abstraction

Child items 0

Activity