wiki:org/meetings/2018MexicoCity/Notes/TechnicalDebt

Net team Technical debt session 10am Oct 3.

Notes, v1.

what's frustrating:

  • spaghetti dependencies
  • spaghetti callgraph
  • testability! hard to write tests for a function -- not so easy to construct a big pile of objects to actually do things. Need to mock too many things. Not enough helpers, not reusable enough.
  • Not clear to see where we might block.

Solutions for new code:

  • design for test, design for debuggability

` *

Considerations:

  • New code vs old code
  • Hard to say whether we have a real test for something even when it's covered.
  • Should we reorganize our tests from bottom-up order, so we can make sure that we're testing the parts before we test theirs.
  • Can we have a quality-of-coverage metric? Like, how covered is this?

TODO:

  • Write a checklist for reviews, include "are there tests?" and "Does this bug have a regression test.
  • Remove obsolete options more aggresively.
  • Add a rule: Need an explanation for upward includes? Identify and review upwards includes?
  • Have a counter for upward includes, and advertise it in PR status?
  • Add rule/reminder: patch against oldest version it might need to be backported to.
  • Add CI to check RAM usage during chutney/stem, make sure it doesn't go up a lot?
  • Make a test harness for "run inside tor" -- run tests inside a bootstrapped client on a Chutney net? Or inside a fake-bootstrapped Tor?
  • Maybe identify and clean up or remove bogus tests? Preemptively minimize them, maybe?
  • Can we have a mode to only count coverage from the tests that are meant to test the particular code.
  • Checklist: Do you check the various valid range elements, and off the end? Do you check success AND failure?
  • Checklist: Do you check that the function actually does what it is supposed to do? (eg, check that sqrt(x) *sqrt(x) == x)
  • Goal: make fuzz-static-testcases actually work without BUG or ERR.

Top stuff to simplify:

  • Delete all the code about ipv6 dirports. (We don't have them)
  • Split control and config? Maybe we can do this with some kind of subsystem-based stuff. (X2)
    • They are biggest
    • They keep accumulating
  • Separate and contain our crypto and networking state so that they can be in separate threads, not know about each other.
  • Would be nice to reduce C-macro usage in C/Rust dependency, reduce amount of C-Rust-Coupled.
    • Automate bindings? Look at the ticket.
    • We have an automated test from Alex C.
  • Re-design extend_info_t so the reachability checks are done at a more logical point.

Notes, v2

Testability

  • amount of stuff you need to put together- would need a full circuit, a lot of complete set up for test state in order to test cells
  • lack of test functions - not good abstractions- slowely adding these but maybe not enough?
  • solutions for new code versus solutions for existing code
  • new code is easier to design for testing/debuggability
  • tests are not carefully structured to invoke only the code that it is testing
  • example: function that checks something about a router - router initialization would be invoked, but it isn't actually thoroughly being tested
  • working out what the test failure is- not easy to tell what the specific issue is that would cause the test failure. Would executing tests in order help with this?
  • optimizing tests- reducing blocking operations
  • DNS queries block

How to make progress here over the next 6 months?

  • success with big code movement
  • introducing new requirements for merging
  • should make the requirement that all new code has tests- checklists for review?

Code review checklist:

  1. is there a visible changes file?
  1. if this is a new feature, are there unit tests?
  1. identify and review upward includes (and whether this functionality should belong in another layer)
  • downward layers shouldn't called into upward layers
  • core or features layers aren't enforced (layers aren't actually enforced here)
  • no new upward includes requirement? we can't actually do this right now, as we have to include config.c
  • we should feel free to move code around in the layers right now
  • should we have something that counts dependency violations?
  1. Look at the _quality_ of the tests. Are they actually meaningful?
  1. Testing extreme inputs as well as the normal expected inputs- also should test error cases

Improve test vectors?

  • test that the function is doing what it is meant to do
  • but also having arbitrary values is ok? (unclear)

Should we have a new metric for test coverage?

  • only code that is directly tested (as opposed to indirectly exercised)
  • A static call graph might help with this?

Automated PR checks:

  • layer violations
  • Ram usage

How to make tests more writable?

  • run in a bootstrapped tor- bootstrapped small chutney network and then run tests? - this is more like an integration test

Should we do any work to clean up existing tests?

  • options tests (don't test what they are meant to be testing)
  • Tor TLS tests were overfitted
  • Config tests are overfitted as well? (expect configs to be parsed in an order) - when we rewrite confing, we should rewrite tests

Should we remove bogus tests? We should minimize bogus tests to not pretend to test what they are actually testing (minimize their scope to only test what they actually test)

git blame --follow --moved - looking at code that has moved as opposed to new code git --color-moved will make your life much better (should set as default in git config)

What is everyone's top "this should be simplified" wishes?

  • delete all IPv6 dirports
  • go for the biggest c files
  • config.c (has bad tests) - subsystem specific configurations
  • control.c (well tested by stem)
  • crypto state and network code - abstract these so that we can move to multithreading
  • things should be able to register for events (publish/subsc

Testability

  • amount of stuff you need to put together- would need a full circuit, a lot of complete set up for test state in order to test cells
  • lack of test functions - not good abstractions- slowely adding these but maybe not enough?
  • solutions for new code versus solutions for existing code
  • new code is easier to design for testing/debuggability
  • tests are not carefully structured to invoke only the code that it is testing
  • example: function that checks something about a router - router initialization would be invoked, but it isn't actually thoroughly being tested
  • working out what the test failure is- not easy to tell what the specific issue is that would cause the test failure. Would executing tests in order help with this?
  • optimizing tests- reducing blocking operations
  • DNS queries block

How to make progress here over the next 6 months?

  • success with big code movement
  • introducing new requirements for merging
  • should make the requirement that all new code has tests- checklists for review?

Code review checklist:

  1. is there a visible changes file?
  1. if this is a new feature, are there unit tests?
  1. identify and review upward includes (and whether this functionality should belong in another layer)
  • downward layers shouldn't called into upward layers
  • core or features layers aren't enforced (layers aren't actually enforced here)
  • no new upward includes requirement? we can't actually do this right now, as we have to include config.c
  • we should feel free to move code around in the layers right now
  • should we have something that counts dependency violations?
  1. Look at the _quality_ of the tests. Are they actually meaningful?
  1. Testing extreme inputs as well as the normal expected inputs- also should test error cases

Improve test vectors?

  • test that the function is doing what it is meant to do
  • but also having arbitrary values is ok? (unclear)

Should we have a new metric for test coverage?

  • only code that is directly tested (as opposed to indirectly exercised)
  • A static call graph might help with this?

Automated PR checks:

  • layer violations
  • Ram usage

How to make tests more writable?

  • run in a bootstrapped tor- bootstrapped small chutney network and then run tests? - this is more like an integration test

Should we do any work to clean up existing tests?

  • options tests (don't test what they are meant to be testing)
  • Tor TLS tests were overfitted
  • Config tests are overfitted as well? (expect configs to be parsed in an order) - when we rewrite confing, we should rewrite tests

Should we remove bogus tests? We should minimize bogus tests to not pretend to test what they are actually testing (minimize their scope to only test what they actually test)

git blame --follow --moved - looking at code that has moved as opposed to new code

git --color-moved will make your life much better (should set as default in git config)

What is everyone's top "this should be simplified" wishes?

  • delete all IPv6 dirports
  • go for the biggest c files
  • config.c (has bad tests) - subsystem specific configurations
  • control.c (well tested by stem)
  • crypto state and network code - abstract these so that we can move to multithreading
  • things should be able to register for events (publish/subscribe) - good abstractions should help with this (what would the abstraction for an inter-module communication be?)

TODO:

  • make progress by deleting code and deprecating old options
Last modified 2 months ago Last modified on Oct 4, 2018, 11:50:45 PM