Opened 5 months ago

Last modified 5 weeks ago

#29209 new task

Reduce visibility of more data type internals

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: technical-debt refactoring
Cc: Actual Points: 3.5
Parent ID: Points: 15
Reviewer: nickm Sponsor: Sponsor31-can

Description

Many of our data types -- particularly the ones in or.h -- have their internals visible across modules, in a way that harms modularity and leads to weird code couplings. We would do well to reduce this visibility and refactor as needed.

Child Tickets

TicketStatusOwnerSummaryComponent
#30236closedasnEncapsulate details about crypt_path_t data structureCore Tor/Tor
#30349newDocument member-hiding conventions for structsCore Tor/Tor

Change History (21)

comment:1 Changed 2 months ago by asn

Keywords: technical-debt refactoring added
Milestone: Tor: 0.4.1.x-final
Points: 15

comment:2 Changed 2 months ago by asn

With Nick we decided that a low-hanging fruit here is encapsulating crypt_path_t.

comment:3 Changed 2 months ago by asn

Actual Points: 3.5
Status: newneeds_review

OK, I finally have something to show here: https://github.com/torproject/tor/pull/929

So basically this branch does the following:

  • Moves crypt_path_t related functions to its own file src/core/or/crypt_path.c. Previously they were all over the place, even tho they were cpath-specific.
  • Starts splitting crypt_path_t into a public struct and a private struct (using an opaque pointer).
  • Moves crypt_path_t.crypto to the private part of the struct by defining appropriate functions for doing circuit crypto.

This took much longer than anticipated because I first experimented with encapsulating others parts of crypt_path_t before deciding to go with crypto. In particular, I first started hiding deliver_window and package_window but I quickly realized that this would cause lots of conflicts with #26288. I think after #26288 gets merged, these two fields might be the next candidates for hiding.

Now in terms of general notes, this project took me more than 3 points of careful refactoring work, and I've only hided like 10% of the cpath structure. Hiding the whole structure is far far away since there are quite complicated structures involved like extend_info_t.

In terms of future goals here, I think hiding extend_info_t into its own interface would be quite convenient since that structure is used and poked in weird ways all over the codebase. Furthermore, that would mean we can eventually encapsulate crypt_path_t even better. Other plausible medium-term targets would be structures like tor_cert_t which are mostly hidden, but not completely.

I also added some TODO notes in the top of crypt_path.c with more cpath-specific things we can move without too much trouble.

Finally, I also tried to find any static analysis tools that will automatically output widely-used structures, or structures that could be hidden easily, but I didn't manage to find such a project for C. include-what-you-see was kinda related, but not exactly.

Last edited 2 months ago by asn (previous) (diff)

comment:4 Changed 2 months ago by asn

Reviewer: nickm

comment:5 Changed 2 months ago by nickm

Okay, I've read through the code. This looks like a good start; I've got some suggestions before we merge. First the minor ones:

  1. Let's open a sub-ticket for crypt_path_t, so that this ticket can still be about
  2. I think you're right that a bottom-up approach is better, where we start with low-level types like extend_info_t whose info is complex and exposed. Your patch here does seem to demonstrate that to me.

Now the major issue I have:

I think that this "private struct" approach is not actually the best for the case where we want to make fields private one at a time. It introduces an extra objects and an extra pointer, increasing both fragmentation and memory pressure. Here are some other approaches that we could use that I think would create fewer code changes:

// Example

struct foobar_t {
  int a;
  char *b; // let's say that we want to make this one private.
  long c;
};
// Option 1:

struct foobar_t {
  int a;
  long c;
};

struct foobar_private_t {
  struct foobar_t base;
  char *b;
};

static inline foobar_private_t *
to_private(foobar_t *obj)
{
  char *ptr = (char *)obj;
  return (foobar_private_t *)(ptr - (OFFSET_OF(foobar_private_t, base)));
}

static inline foobar_t *
to_public(foobar_private_t *obj)
{
  return &obj->base;
}
// Option 2

// (the foobar_private macro should only be defined in C modules that are
// allowed to see the internals.)

#ifdef FOOBAR_T_PRIVATE
#define PRIV(x) x
#else
#define PRIV(x) foobar_private_member__ ## x ## __
#endif

struct foobar_t {
  int a;
  char *PRIV(b);
  long c;
};
// option 3

#define PRIV(x) foobar_private_member__ ## x ## __

struct foobar_t {
  int a;
  char *PRIV(b);
  long c;
};

#ifdef FOOBAR_T_PRIVATE
#define b PRIV(b)
#endif
// option 4

struct foobar_t {
  int a;
  long c;
  struct {
    char *b;
  } foobar_private_members__;
};

#ifdef FOOBAR_T_PRIVATE
#define pvt foobar_private_members__
#else
#define foobar_private_members__ "you got a syntax error because you used this field outside foobar."
#endif
Last edited 2 months ago by nickm (previous) (diff)

comment:6 Changed 2 months ago by asn

I opened #30236 for the crypt_path case.

I agree that the private approach I took is suboptimal. Here are some thoughts about the options above:

  • Option1: I find usin the downcast approach a bit of an overkill here, but it might be the right way forward.
  • Option2: I think I like this, it seems quite simple. The only negative is that there is no warning if someone uses the private field outside of a private file, but I don't think people will attempt to do that with that name and macro (and appropriate docs).
  • Option3: I don't think I like this because #define b PRIV(b) might hijack lots of stuff. So for example, in the crypt_path_t.crypto case we would have to do #define crypto PRIV(crypto) and that would hijack all the cryptos in the file. Also it would be bad form to not capitalize crypto in that case, and ugly if we did.
  • Option4: This also seems like a plausible one. I'm not sure what pvt is there tho. I like the fact that it throws a message.

I think right now I'm leaning more towards option2 for being the most lightweight one and easy to understand. What do you think?

comment:7 Changed 8 weeks ago by nickm

I'm not sure what pvt is there tho.

Sorry; it's meant to be an abbreviation for private.

I also like option 2, but I think we need to make sure that it's valid C.

comment:8 Changed 8 weeks ago by nickm

Reading the C99 spec (section 6.2.7) I don't think options 2 is good C:

  1. Two types have compatible type if their types are the same... Moreover, two structure,union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: ... there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types, and such that if one member of a corresponding pair is declared with a name, the other member is declared with the same name.

2.All declarations that refer to the same object or function shall have compatible type;otherwise, the behavior is undefined.

comment:9 Changed 8 weeks ago by nickm

Here is an alternative approach I talked about with Catalyst:

// Option 5
struct foobar_t {
  int a;
  char *_modulename__private__b;
  long c;
};

#ifdef MODULENAME_PRIVATE
#define my_b _modulename__private_b
#endif

Instead of my_b we might use private_b, pvt_b, local_b, or something.

comment:10 in reply to:  9 ; Changed 8 weeks ago by asn

Replying to nickm:

Here is an alternative approach I talked about with Catalyst:

// Option 5
struct foobar_t {
  int a;
  char *_modulename__private__b;
  long c;
};

#ifdef MODULENAME_PRIVATE
#define my_b _modulename__private_b
#endif

Instead of my_b we might use private_b, pvt_b, local_b, or something.

That seems similar to option #3 from above.

So to understand this better. How would that look for the patch in #30236? Would we have to turn crypto into my_crypto when PRIVATE is set? I think turning it into pvt_crypto or local_crypto might even be ambiguous in terms of cryptography. my_crypto can also be a bit ambiguous but let's ignore this for now.

Also, is it fine to have a macro definition be named in lowercase? Could this be confusing?

If we are fine with the above, that seems like a plausible way to go forward and I can change my patch for #30236.

(BTW, I lost your discussion in #tor-dev (i guess) about this because my irc host rebooted).

Last edited 8 weeks ago by asn (previous) (diff)

comment:11 in reply to:  10 ; Changed 8 weeks ago by catalyst

Replying to asn:

So to understand this better. How would that look for the patch in #30236? Would we have to turn crypto into my_crypto when PRIVATE is set? I think turning it into pvt_crypto or local_crypto might even be ambiguous in terms of cryptography. my_crypto can also be a bit ambiguous but let's ignore this for now.

Also, is it fine to have a macro definition be named in lowercase? Could this be confusing?

I think yes, it's ok to have a lowercase macro as long as its not widely exposed and has a reasonable prefix to control the namespace. The purpose of such a member-renaming macro is to be mostly invisible to the developer.

If we are fine with the above, that seems like a plausible way to go forward and I can change my patch for #30236.

(BTW, I lost your discussion in #tor-dev (i guess) about this because my irc host rebooted).

I think the way this tended to work in historical Unix is that struct members all have very short prefixes (1-3 characters and an underscore) to their names, which made the namespacing of the macros more manageable. (e.g., sa_ for struct sockaddr or sin_ for struct sockaddr_in.) The macro expansion and actual member name, of course, could be longer because we don't expect developers to manually write it.

Example, using f_ as a prefix (let's not actually do that particular prefix, because POSIX reserves it for some header)

struct foobar_t {
  int f_count;
  char *_f__private_thing;
  long f_id;
};
#ifdef FOOBAR_PRIVATE
#define f_thing _f__private_thing
#endif

or maybe with a helper macro like

#define F_PRIV(x) _f__private_ ## x

or

#define F_PRIV(x) x ## _ ## _private
struct foobar_t {
  int f_count;
  char *F_PRIV(f_thing);
  long f_id;
};
#ifdef FOOBAR_PRIVATE
#define f_thing F_PRIV(f_thing)
#endif

For more background, there's http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_02_02 and http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_02_04 from the POSIX standard.

comment:12 in reply to:  11 Changed 8 weeks ago by catalyst

Replying to catalyst:

or

#define F_PRIV(x) x ## _ ## _private
struct foobar_t {
  int f_count;
  char *F_PRIV(f_thing);
  long f_id;
};
#ifdef FOOBAR_PRIVATE
#define f_thing F_PRIV(f_thing)
#endif

oops sorry that should probably be

/* f_stuff => _f_stuff__private */
/* use x ## _ ## _private instead of x ## __private to avoid reserved namespace */ 
#define F_PRIV(x) _ ## x ## _ ## _private
struct foobar_t {
  int f_count;
  char *F_PRIV(f_thing);
  long f_id;
};
#ifdef FOOBAR_PRIVATE
#define f_thing F_PRIV(f_thing)
#endif

(I forgot to have the macro add the underscore prefix)

Note the underscore-prefixed reserved names space rules are a little tricky -- identifiers starting with two underscores or a single underscore followed by a capital letter are reserved for any use, which means they shouldn't appear anywhere except as allowed by a documented C (or OS) interface or extension. (stuff like __attribute__ in GCC) The single underscore prefix unless followed by a capital letter is reserved for file-scope identifiers, which a struct member is not. (C99 §7.1.3)

comment:13 Changed 8 weeks ago by asn

Thanks for all the info!

Here is an attempt to accomplish the hiding of crypto using the macro concept from above:
https://github.com/torproject/tor/pull/982

Things that might need improvement but I would like feedback on:

  • Right now, I named the privatization macro as TOR_PRIV() so that it's generic and can be used by other modules, however it's currently hidden in crypt_path_st.h. Do you like the name? And where should I put it?
  • I named the access macro pvt_crypto. I went for the agnostic pvt, but perhaps according to comment:11 I should have gone with some module-specific name? Like cp_crypto for crypt_path? Any other preferences?

comment:14 Changed 7 weeks ago by nickm

This strategy looks good!

Also, let's change the identifier so that it's more clearly not reserved; I'm not sure that the ## _ ## _private trick is actually any more legal than ##__private. Instead let's use something like x ## _MODULE_NAME_private_field maybe?

The pvt version is fine with me.

If this is going to be a shared macro, lib/cc seems like a decent place for it, but I'm not sure we want to use this same macro everywhere: I think we'd like to mangle member names differently depending on which module owns them.

comment:15 in reply to:  14 ; Changed 7 weeks ago by asn

Replying to nickm:

This strategy looks good!

Also, let's change the identifier so that it's more clearly not reserved; I'm not sure that the ## _ ## _private trick is actually any more legal than ##__private. Instead let's use something like x ## _MODULE_NAME_private_field maybe?

The pvt version is fine with me.

If this is going to be a shared macro, lib/cc seems like a decent place for it, but I'm not sure we want to use this same macro everywhere: I think we'd like to mangle member names differently depending on which module owns them.

OK, pushed a fixup to address the above. I kept the macro private, so that each module makes their own. After we have a few of those, we can see if we can turn it into a common one.

comment:16 Changed 7 weeks ago by nickm

Looks good. I'm going to continue the discussion on #30236

comment:17 Changed 7 weeks ago by nickm

Status: needs_reviewnew

comment:18 in reply to:  15 Changed 7 weeks ago by catalyst

Replying to asn:

Replying to nickm:

This strategy looks good!

Also, let's change the identifier so that it's more clearly not reserved; I'm not sure that the ## _ ## _private trick is actually any more legal than ##__private. Instead let's use something like x ## _MODULE_NAME_private_field maybe?

The pvt version is fine with me.

If this is going to be a shared macro, lib/cc seems like a decent place for it, but I'm not sure we want to use this same macro everywhere: I think we'd like to mangle member names differently depending on which module owns them.

OK, pushed a fixup to address the above. I kept the macro private, so that each module makes their own. After we have a few of those, we can see if we can turn it into a common one.

It looks like the resulting member name no longer begins with an underscore? That'll make it harder to search for improper uses. I think Nick's concern is that ## __private might still impinge on the "reserved for any purpose" name space? I think _ ## x ## _module_name_private_field is better because it adds a leading underscore.

comment:19 Changed 7 weeks ago by nickm

My understanding is that the leading underscore is also reserved for external identifiers; even if it's safe for struct members, it's something we don't want to encourage.

Maybe a trailing underscore? In any case, _foobar_private_field is also easy to grep for...

comment:20 in reply to:  19 Changed 7 weeks ago by catalyst

Replying to nickm:

My understanding is that the leading underscore is also reserved for external identifiers; even if it's safe for struct members, it's something we don't want to encourage.

We chatted a bit about this on IRC. I still like leading underscores for hiding struct members, because I think it makes it easier to search for inappropriate uses. (I also think it's easier to search for leading underscore than for some specific long suffix(es).)

I think generally speaking we should avoid using identifiers with leading underscores in .c files, even when we use them for hiding struct members. Any .c file that uses the private members should only access them using helper macros defined in headers. These macros will hide the use of the leading underscores.

It's fairly easy to write small Coccinelle scripts for matching

  • uses of identifiers starting with an underscore
  • uses of struct members starting with an underscore

There are quite a few places where we already use identifiers with leading underscores for parameter names and local variables. The C standard allows these uses. I think there isn't much harm for leading underscores for parameters and locals.

Nick suggests waiting until after we merge #30236 to update that patch to use _ ## x _ ## _module_name_private_field. I'll open a new ticket for that.

comment:21 Changed 5 weeks ago by nickm

Milestone: Tor: 0.4.1.x-finalTor: 0.4.2.x-final

Move various s31 refactoring tasks to 0.4.2

Note: See TracTickets for help on using tickets.