Opened 3 months ago

Last modified 13 days ago

#31078 assigned enhancement

improve docs for config var abstraction

Reported by: catalyst Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 042-should
Cc: teor, catalyst Actual Points:
Parent ID: #29211 Points: 2
Reviewer: Sponsor: Sponsor31-can

Description

In ticket:30864#comment:11, I commented on some vagueness in the code comments that made it difficult for me to understand some of what's going on.

We should revise those comments to use improved terminology to help other developers understand what's going on. This might need to wait until the other refactoring on that branch is done.

Child Tickets

Change History (11)

comment:1 Changed 3 months ago by nickm

Parent ID: #29211

comment:2 Changed 3 months ago by teor

Cc: teor added

comment:3 Changed 7 weeks ago by teor

Milestone: Tor: unspecifiedTor: 0.4.2.x-final
Owner: set to nickm
Status: newassigned

These changes need to be done (or at least triaged) before we continue work on #29211.

comment:4 Changed 6 weeks ago by nickm

Type: defectenhancement

Mark a number of current 0.4.2.x "defects" as "enhancements."

comment:5 Changed 5 weeks ago by nickm

Mark some assigned tickets as 042-should.

comment:6 Changed 5 weeks ago by nickm

Keywords: 042-should added

comment:7 Changed 13 days ago by nickm

I've been trying to figure out which concrete steps would help the most here. I think that clarity on the final API, and clarity on the words "type" and "variable" are the biggest requests.


The variables here might also be called "options" or "fields" or "settables" or "configurables". Each one is a C value that maps to a named option in a configuration or state file. I'm okay renaming these from "variables" to one of the other things if we have a good "other thing" to rename them to.

In the C code, each "variable" is a member of a some configuration object. Each module can have its own configuration object. These objects are registered at startup with a central "configuration manager", which is responsible for parsing configurations, telling modules about new configurations, and so on.

The implementation for these variables comes in 4 layers:

  1. The lowest level is the "typed_var" layer, which views the C value as a void *, and views the configuration value as a string. This is the layer that knows how to encode, decode, copy, etc. The set of functions that does the encoding/decoding/copying/etc defines the type of the variable. "Codec" or "manipulator" might be another good name for this.
  1. One level higher is the "struct_member" layer, which views the C value as stored at a given offset within a structure.
  1. One level higher is the "config_var" layer. This layer knows the _names_ of different configuration values, and knows that some values may be obsolete, deprecated.
  1. At the highest level is the "managed_var" layer. It is an internal object used by the configmgr code to keep track of which variables correspond to which objects.

Each layer is consumed by the layer above it. Additionally, layer 1 is the layer at which you can declare new "types" (codecs? manipulators?). Layer 3 is the layer at which modules declare their variables.


Here's what I have in mind for the final artchitecture.

There are three main users of the configuration system.

  1. There is type code, which wants to declare new "types" (codecs? manipulators?) that modules can use for their data. (This kind of user #includes var_type_def_st.h, and defines a new var_type_def_t.)
  1. There are modules which want to declare configuration or state variables, and learn what their values are, and learn when those values change. They declare a structure for their configuration and/or state, and a table of config_var_t mapping configuration/state variables to the fields of that structure. They expose this information via the subsystem API. (There is not yet a separate example of this; all users of get_options() are currently taking this role, as is the variable-declaration part of config.c.)
  1. There is the application code, which wants to load, reload, change, or dump configuration values, and make sure that the right modules find out about it. (This kind of user uses the "confmgr.h" API to combine the mapping tables from multiple modules, and to manipulate the correct fields in their configuration/state objects.)

comment:8 Changed 13 days ago by nickm

On renaming:

typed_var_t could be "c data", a "c object", an "encodeable", a "manipulatable".

struct_member_t could probably become an implementation detail of config_var_t.

config_var_t could be an "option", "field", "setting", "member", "entry".

var_type_def_t could be "encoder", "encoding", "codec", "manipulator", "manip".

Do any of these sound like good changes?

Edited to add: left to my own devices, I would rename typed_var to c_data, make struct_member more hidden, leave var_type_def alone or rename it to c_cfg_codec, and leave config_var alone or rename it to cfg_option.

Last edited 13 days ago by nickm (previous) (diff)

comment:9 Changed 13 days ago by nickm

On documentation:

I think the best place to document all this is probably in the top-level doxygen comments in lib/conf/conftypes.h and lib/confparse/confmgt.h (to be renamed from confparse.h), and that the right way to do so is probably by copying/adapting the text above.

Does that sound like a good way to do this?

comment:10 Changed 13 days ago by nickm

Cc: catalyst added

comment:11 in reply to:  8 Changed 13 days ago by teor

I am happy moving forward with the architecture and documentation as you describe.

I have some opinions about naming. Naming is hard. There are lots of good options. And different people may prefer different options.

We are parsing a config. I wonder if it would help to use standard parsing jargon.

Replying to nickm:

On renaming:

typed_var_t could be "c data", a "c object", an "encodeable", a "manipulatable".

This isn't quite a token, because it has a type.
But it's also doesn't quite feel like a variable, because it can be part of a config option's value (rather than the entire value).
It's implemented as a string and its (possibly binary) equivalent.

I'd like to consider "field" or "element" or some other term here.
If there's another appropriate word from parsing, codecs, or protocol design, I'd happily use that.

struct_member_t could probably become an implementation detail of config_var_t.

+1

config_var_t could be an "option", "field", "setting", "member", "entry".

option is the term we use in the rest of the code, so let's stick with that, unless you think it will cause confusion.

var_type_def_t could be "encoder", "encoding", "codec", "manipulator", "manip".

I think "codec" is close, but it's usually used for binary.
We're doing serialisation and deserialisation to text - is there a more precise term?
How about "format"?

Do any of these sound like good changes?

Edited to add: left to my own devices, I would rename typed_var to c_data, make struct_member more hidden, leave var_type_def alone or rename it to c_cfg_codec, and leave config_var alone or rename it to cfg_option.

Note: See TracTickets for help on using tickets.