wiki:org/meetings/2018NetworkTeamHackfestSeattle/Modularisation

Version 5 (modified by ahf, 5 months ago) (diff)

--

Modularisation Session 2018/05/31 - 10:00

We start by walking over what has been done:

  • We now have a directory authority module that can be disabled at compile time.

Nick explains that the inter-module dependencies are still a bit of a mess.

Large offenders:

  • The main module is especially problematic.
  • The control module is called by everything that is triggering an event and might query out to other modules.
  • The config module which is called by many other modules.

The crypto module is getting some attention to be split up.

Nick explains how Git's new code-movement highlighting makes it easier to review changes that moves large chunks of code.

Roger asks if there's currently a plan for doing multi-process architecture with protocols used for talking between them. There's currently no such plan.

Tim explains how the Rust changes are positive because it also forces us to carefully define interfaces across the Rust/C boundary.

We talk about how or.h should be sliced up -- could be done on a per module basis. Nick explains that this would be good since it makes it more visible which modules depends upon each other.

Could the new dirauth directory be moved back one directory?

Isis explains an experiment with a tool that scans for unused headers, but it also looks at system headers, which we should probably be more careful about. This could be modified to help us.

We should think of how our data structures are defined and where and not just think of call-graph. Using forward-declarations in or.h could be useful here and have the concrete structure in the modules respective headers.

David mentions that we have different roles now: client, relay, etc.

Onion services might be a candidate for splitting into a module.

It's important to address that things in or can call things in common, but not the other way around.

When we have a function that seems misplaced in a module, because it might only be used by another module, it might make more sense to move it to its own .c file so we at least have the split. This is more problematic for Rust because Rust works on crates.

Should we go for modules or roles or something third as the next step? Nick thinks we should start extracting modules as we can. Assign C files to modules we *think* it belongs to. The C files that are in multiple modules should be split up.

Nick mentions that we have a call-graph tool that runs works nicely.

Taylor suggests using Travis to run call-graph tool a part of Travis.

Nick suggests doing a split up in or before an LTS release even though it might be problematic to do some backporting across the boundary, but the problems should be less than at other times.

Taylor mentions different concerns contributors might have to global source tree changes: downstream projects might have patches that we will break.

Patterns

We talked about minimizing structure usage, we talked about layer violations. One thing we are missing is handling an "events" (for example, when a puppy is received we must call on_puppy_received). These can be fixed by using the event loop more via libevent.

Nick explains the handle-object in C and how it can be used for a more message oriented architecture.

Removing needless abstractions is probably worth it.

Callbacks

Callbacks can be making our call-graphs much more complicated. Using function-pointers can be problematic. qsort()'s given as an example.

We talk about whether pub/sub patterns might make sense and in which places? Nick explains how it is important that the handling is happening in the top of the call-stack and not in the bottom of the stack.

Tim mentions that in the distant future it might make sense to split directory code from other parts.

Taylor mentions that the bootstrap process might be able to benefit from a pub/sub pattern in emitting data signals between the very low level components that needs to notify the higher level layers.

Tim mentions that our control event system is a good example because it is also async; the log system is sync.

Nick mentions we need to be careful when we refactor this because right now is sync, but in the async world we need to be careful where ordered dependencies are hidden away from us.

Tim mentions that we should be able to flag an event such that it gets handled "later".

The rule for a message that is passed to the message handler is entirely owned by the queue. Everything that the message refers to on the outside of the queue must be wrapped in a Handle.

Ideas of what layers we could have for tor:

  • Lowest level: functionality that isn't specific to tor- everything you can build without the network
  • Middle level: sending cells, packaign data on streams - network core
  • Highest level: cause circuits to get built, streams to get attached to circuits - path selection, etc
  • Tor roles may or may not use this highest role (for example, directory authorities would not)

Steps that we can take next (actions!)

  • We move the src/or/dirauth module to src/dirauth to avoid deeply nested directories. See #26270
  • (maybe?) We should create a ticket to figure out if the Rust code needs to be split so it has a dirauth "crate"(?).
  • We should look into using Travis for uploading call-graph information (and maybe Doxygen while we are at it?).
  • Do we want to do something about the Channel abstraction?
  • Get the abstractions that we want to use into Tor when we have a need for it -- not before.
  • Do code movements in a near time-frame: directory / dirauth code?
  • The consensus-changed event is used a lot in the HS code and might be a candidate too.
  • Do a refactor week the week before a merge window opens to do code movement. Nick is going to do it in the week up to June 15:
    • David wants to help with it.
    • Nick will post what he plans on doing beforehands.
    • Collect big patches before (Isis + haxpopp + PT implementation after new spec patches?)