Opened 3 months ago

Last modified 2 days ago

#29209 needs_review task

Reduce visibility of more data type internals

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: technical-debt refactoring
Cc: Actual Points: 3.5
Parent ID: Points: 15
Reviewer: nickm Sponsor: Sponsor31-can

Description

Many of our data types -- particularly the ones in or.h -- have their internals visible across modules, in a way that harms modularity and leads to weird code couplings. We would do well to reduce this visibility and refactor as needed.

Child Tickets

TicketStatusOwnerSummaryComponent
#30236newEncapsulate details about crypt_path_t data structureCore Tor/Tor

Change History (6)

comment:1 Changed 12 days ago by asn

Keywords: technical-debt refactoring added
Milestone: Tor: 0.4.1.x-final
Points: 15

comment:2 Changed 12 days ago by asn

With Nick we decided that a low-hanging fruit here is encapsulating crypt_path_t.

comment:3 Changed 10 days ago by asn

Actual Points: 3.5
Status: newneeds_review

OK, I finally have something to show here: https://github.com/torproject/tor/pull/929

So basically this branch does the following:

  • Moves crypt_path_t related functions to its own file src/core/or/crypt_path.c. Previously they were all over the place, even tho they were cpath-specific.
  • Starts splitting crypt_path_t into a public struct and a private struct (using an opaque pointer).
  • Moves crypt_path_t.crypto to the private part of the struct by defining appropriate functions for doing circuit crypto.

This took much longer than anticipated because I first experimented with encapsulating others parts of crypt_path_t before deciding to go with crypto. In particular, I first started hiding deliver_window and package_window but I quickly realized that this would cause lots of conflicts with #26288. I think after #26288 gets merged, these two fields might be the next candidates for hiding.

Now in terms of general notes, this project took me more than 3 points of careful refactoring work, and I've only hided like 10% of the cpath structure. Hiding the whole structure is far far away since there are quite complicated structures involved like extend_info_t.

In terms of future goals here, I think hiding extend_info_t into its own interface would be quite convenient since that structure is used and poked in weird ways all over the codebase. Furthermore, that would mean we can eventually encapsulate crypt_path_t even better. Other plausible medium-term targets would be structures like tor_cert_t which are mostly hidden, but not completely.

I also added some TODO notes in the top of crypt_path.c with more cpath-specific things we can move without too much trouble.

Finally, I also tried to find any static analysis tools that will automatically output widely-used structures, or structures that could be hidden easily, but I didn't manage to find such a project for C. include-what-you-see was kinda related, but not exactly.

Last edited 10 days ago by asn (previous) (diff)

comment:4 Changed 5 days ago by asn

Reviewer: nickm

comment:5 Changed 4 days ago by nickm

Okay, I've read through the code. This looks like a good start; I've got some suggestions before we merge. First the minor ones:

  1. Let's open a sub-ticket for crypt_path_t, so that this ticket can still be about
  2. I think you're right that a bottom-up approach is better, where we start with low-level types like extend_info_t whose info is complex and exposed. Your patch here does seem to demonstrate that to me.

Now the major issue I have:

I think that this "private struct" approach is not actually the best for the case where we want to make fields private one at a time. It introduces an extra objects and an extra pointer, increasing both fragmentation and memory pressure. Here are some other approaches that we could use that I think would create fewer code changes:

// Example

struct foobar_t {
  int a;
  char *b; // let's say that we want to make this one private.
  long c;
};
// Option 1:

struct foobar_t {
  int a;
  long c;
};

struct foobar_private_t {
  struct foobar_t base;
  char *b;
};

static inline foobar_private_t *
to_private(foobar_t *obj)
{
  char *ptr = (char *)obj;
  return (foobar_private_t *)(ptr - (OFFSET_OF(foobar_private_t, base)));
}

static inline foobar_t *
to_public(foobar_private_t *obj)
{
  return &obj->base;
}
// Option 2

// (the foobar_private macro should only be defined in C modules that are
// allowed to see the internals.)

#ifdef FOOBAR_T_PRIVATE
#define PRIV(x) x
#else
#define PRIV(x) foobar_private_member__ ## x ## __
#endif

struct foobar_t {
  int a;
  char *PRIV(b);
  long c;
};
// option 3

#define PRIV(x) foobar_private_member__ ## x ## __

struct foobar_t {
  int a;
  char *PRIV(b);
  long c;
};

#ifdef FOOBAR_T_PRIVATE
#define b PRIV(b)
#endif
// option 4

struct foobar_t {
  int a;
  long c;
  struct {
    char *b;
  } foobar_private_members__;
};

#ifdef FOOBAR_T_PRIVATE
#define pvt foobar_private_members__
#else
#define foobar_private_members__ "you got a syntax error because you used this field outside foobar."
#endif
Last edited 4 days ago by nickm (previous) (diff)

comment:6 Changed 2 days ago by asn

I opened #30236 for the crypt_path case.

I agree that the private approach I took is suboptimal. Here are some thoughts about the options above:

  • Option1: I find usin the downcast approach a bit of an overkill here, but it might be the right way forward.
  • Option2: I think I like this, it seems quite simple. The only negative is that there is no warning if someone uses the private field outside of a private file, but I don't think people will attempt to do that with that name and macro (and appropriate docs).
  • Option3: I don't think I like this because #define b PRIV(b) might hijack lots of stuff. So for example, in the crypt_path_t.crypto case we would have to do #define crypto PRIV(crypto) and that would hijack all the cryptos in the file. Also it would be bad form to not capitalize crypto in that case, and ugly if we did.
  • Option4: This also seems like a plausible one. I'm not sure what pvt is there tho. I like the fact that it throws a message.

I think right now I'm leaning more towards option2 for being the most lightweight one and easy to understand. What do you think?

Note: See TracTickets for help on using tickets.