Tooling to track code-style/best-practices violations

changed milestone to %Tor: 0.4.1.x-final

added actualpoints::4 component::core tor/tor milestone::Tor: 0.4.1.x-final network-team-roadmap-2019-Q1Q2 points::3 priority::medium resolution::implemented reviewer::nickm severity::normal sponsor::31-can status::closed teor-merge type::enhancement labels

Trac:
Component: - Select a component to Core Tor/Tor

Trac:
Keywords: N/A deleted, network-team-roadmap-2019-Q1Q2 added
Points: 5 to 3

OK I'm getting started with this ticket:

Nick what do you mean that some of these measures already have code for them? We have code in Tor to count function and file length? I could not find that in scripts/ but we do have the #include checking script.

On this note, and since we are chasing for low-hanging fruit here, I played a bit with static analysis tools today. The thing that worked out best because it's open source and does some of the things that we want it do is GNU complexity (...): https://www.gnu.org/software/complexity/

Based on the best practices list of #29219 (moved), complexity can calculate: file length, function length, nesting level count, and function-per-file count. As far as I know we don't have any of these right now so being able to do these would be an improvement. Also it's a GNU tool so it's FOSS and something we can use as an optional parts of our scripts toolkit.

As an example here is complexity's output for scheduler_kist.c:

$ complexity --histogram --score --thresh=1 ./src/core/or/scheduler_kist.c
NOTE: proc kist_scheduler_run in file ./src/core/or/scheduler_kist.c line 544
	nesting depth reached level 5
Complexity Scores
Score | ln-ct | nc-lns| file-name(line): proc-name
    1       2       2   ./src/core/or/scheduler_kist.c(140): free_all_socket_info
    1       4       2   ./src/core/or/scheduler_kist.c(420): MOCK_IMPL
    1       4       2   ./src/core/or/scheduler_kist.c(473): kist_scheduler_on_new_options
    1       3       3   ./src/core/or/scheduler_kist.c(109): each_channel_write_to_kernel
    1       4       4   ./src/core/or/scheduler_kist.c(147): socket_table_search
    1       4       4   ./src/core/or/scheduler_kist.c(773): scheduler_kist_set_lite_mode
    1       4       4   ./src/core/or/scheduler_kist.c(783): scheduler_kist_set_full_mode
    1       5       5   ./src/core/or/scheduler_kist.c(118): free_outbuf_info_by_ent
    1       5       5   ./src/core/or/scheduler_kist.c(129): free_socket_info_by_ent
    1       5       5   ./src/core/or/scheduler_kist.c(430): MOCK_IMPL
    1       5       5   ./src/core/or/scheduler_kist.c(441): have_work
    1       8       5   ./src/core/or/scheduler_kist.c(95): channel_outbuf_length
    1       7       7   ./src/core/or/scheduler_kist.c(333): outbuf_table_remove
    1       7       7   ./src/core/or/scheduler_kist.c(346): set_scheduler_run_interval
    1       8       8   ./src/core/or/scheduler_kist.c(157): free_socket_info_by_chan
    1      10       8   ./src/core/or/scheduler_kist.c(396): update_socket_written
    1      12       8   ./src/core/or/scheduler_kist.c(794): scheduler_can_use_kist
    1      10       9   ./src/core/or/scheduler_kist.c(483): kist_scheduler_init
    1      14       9   ./src/core/or/scheduler_kist.c(359): socket_can_write
    1      10      10   ./src/core/or/scheduler_kist.c(301): init_socket_info
    1      10      10   ./src/core/or/scheduler_kist.c(318): outbuf_table_add
    1      11      11   ./src/core/or/scheduler_kist.c(379): update_socket_info
    1      16      11   ./src/core/or/scheduler_kist.c(751): kist_scheduler_run_interval
    2      31      20   ./src/core/or/scheduler_kist.c(507): kist_scheduler_schedule
    5     120      54   ./src/core/or/scheduler_kist.c(172): MOCK_IMPL
   17     168      89   ./src/core/or/scheduler_kist.c(544): kist_scheduler_run

Complexity Histogram
Score-Range  Lin-Ct
    0-9         218 ************************************************************
   10-19         89 ************************

Scored procedure ct:       26
Non-comment line ct:      307
Average line score:         6
25%-ile score:              1 (75% in higher score procs)
50%-ile score:              2 (half in higher score procs)
75%-ile score:             17 (25% in higher score procs)
Highest score:             17 (kist_scheduler_run() in ./src/core/or/scheduler_kist.c)

In the CLI I set the score threshold to 1 so that it ignores truly trivial functions, but that can be set to 0 if wanted.

You can see the following:

It tells us that there is a function with a bad nesting depth of 5.
It tells us how big each function is (nc-lns counts only code).
It tells us how big the file is (307 LoC).
It tells us that the file contains 26 functions.
It has a weird but configurable algorithm to calculate function score.

The only weird thing is that it miscalculates at which line a function is, I think this has something to do with the way we comment our code but I haven't figured it out yet (does not seem related to macros). And it also does not work really well with MOCK_IMPL etc.

I can also pass it all the .c files of Tor and it will give me a project wide summary which I have attached to this ticket.

Does this seem like a plausible way forward? Like, having a script tha runs this in a useful way and parses the output to give some controlled information? Or do we car eabout other best-practices from #29219 (moved) more? This likely seems to be the lowest hanging fruit, but I'm open to more.

Also what do we want to do? Throw error if we have a nesting score above 7 (example)? Or just give out information?

Trac:
tor_complexity.log

Hi, George!

I think that the function/file length thing I mentioned was just ad hoc one-liners, like find src/ -name '*.[ch]' | xargs wc -l |sort -n and like whatever I used for cyclomatic complexity in #6313 (moved).

I think you should feel free to go with whatever fruit seems lowest-hanging here. complexity seems like a good start for measurement, but I don't see immediately how to use it for enforcing rules. What do you think there?

Initial PoC posted in https://github.com/asn-d6/tor/tree/bug29221_draft

Needs further work based on the TODO in practracker.py.

Would like some feedback from Nick on how this looks.

Trac:
Status: new to needs_review

OK. Something that looks like a plausible MVP here can be found in my bug29221_draft above. Or: https://github.com/torproject/tor/pull/742

Let me know how this looks like to you.

Trac:
Reviewer: N/A to nickm

OK fixed the CI issues and also added a squash commit that slightly improves the error message in case of problems. We can always do better but this might be OK for a minimum-viable-product here.

I've made a few comments in the PR. This is generally looking very solid.

Trac:
Status: needs_review to needs_revision

Replying to nickm:

I've made a few comments in the PR. This is generally looking very solid.

OK I fixed the easy stuff in a new commit ac05cc7437. I made a ticket #29746 (moved) for the rest of good-to-have stuff asked on the PR.

In general, some of the good-to-have stuff were not too hard to add, but IMO there should be tests to make sure that they don't influence results.

Trac:
Status: needs_revision to needs_review

LGTM, except that I think one of the requested changes is important for correctness. I've added a commit for that, plus another commit for the message change, in a new PR: https://github.com/torproject/tor/pull/782

If you like that, let's squash and merge.

Looks good but perhaps we should use the "FAILED" keyword when the program fails. After that, let's mark as merge-ready.

Trac:
Status: needs_review to needs_revision

Done; thanks!

Trac:
Status: needs_revision to needs_review

Trac:
Status: needs_review to merge_ready

Trac:
Keywords: network-team-roadmap-2019-Q1Q2 deleted, network-team-roadmap-2019-Q1Q2 teor-merge added
Milestone: N/A to Tor: 0.4.1.x-final

I did a squashed branch, but the CI for the original pull request failed, so I will only merge if the CI for the squashed branch works: https://github.com/torproject/tor/pull/786

Trac:
Type: defect to enhancement

I added two more exceptions to the branch to try to make the PR not fail post-merge.

Sorry, that didn't seem to fix CI.

Trac:
Status: merge_ready to needs_revision

Replying to teor:

Sorry, that didn't seem to fix CI.

Almost there. The problem with your fix was that you added a problem line for hs_cell.c:hs_cell_parse_introduce2 but there was already one in the exceptions file. Since the second one was more down it had more priority (having two same problems in the exception file is currently UB).

~~Perhaps we can soft-solve this problem by editing the right exception line, and then as part of #29746 (moved) we can figure out what to do about double exception lines?~~

Please see my branch bug29221_more for a fix to the issue and special handling of duplicate exception lines: https://github.com/torproject/tor/pull/787

Trac:
Status: needs_revision to merge_ready

I'm going to assume that small fixes like this don't need another review before merging?

Replying to teor:

I'm going to assume that small fixes like this don't need another review before merging?

I have document this exception in our merge policy: https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/MergePolicy?action=diff&version=7

CI passes. One Appveyor build failed but it's unrelated (it's a stochastic prob distr fail which I noted down at #29767 (moved)).

I've merged this to master, since we've all signed off on it in various ways, and since it doesn't actually touch any C code.

Trac:
Status: merge_ready to closed
Resolution: N/A to implemented
Actualpoints: N/A to 4

This script breaks jenkins on master due to a python error:

23:19:11 python ../tor/scripts/maint/practracker/practracker.py ../tor
23:19:11 Traceback (most recent call last):
23:19:11   File "../tor/scripts/maint/practracker/practracker.py", line 25, in <module>
23:19:11     import problem
23:19:11   File "/srv/jenkins-workspace/workspace/tor-ci-linux-master/ARCHITECTURE/amd64/SUITE/stretch/tor/scripts/maint/practracker/problem.py", line 27
23:19:11     print("No exception file provided", file=sys.stderr)
23:19:11                                             ^
23:19:11 SyntaxError: invalid syntax
23:19:11 Makefile:19273: recipe for target 'check-best-practices' failed
23:19:11 make[1]: *** [check-best-practices] Error 1

https://jenkins.torproject.org/job/tor-ci-linux-master/ARCHITECTURE=amd64,SUITE=stretch/3830/consoleFull#188818640921ba4c2d-fbf2-4975-9955-c2fe5cc665f5

Trac:
Resolution: implemented to N/A
Status: closed to reopened

Trac:
Status: reopened to needs_revision

I'm guessing you probably wanted to say "python3"?

Hmm, no, travis runs with python 2.7.6 by default, so we should fix the script so it works with python3.

Replying to teor:

Hmm, no, travis runs with python 2.7.6 by default, so we should fix the script so it works with python3.

Thanks for catching that.

Seems like checkIncludes.py is also a python3 script but it also runs with python2 in jenkins/travis because of the from __future__ import print_function trick.

Please see https://github.com/torproject/tor/pull/796 for a branch that does exactly that.

Marking as merge_ready but feel free to push down to needs_rev if needed.

Trac:
Status: needs_revision to merge_ready

Seems simple enough, I'll merge after CI finishes chewing on the last lot of merges.

I can't merge this today: my day should end soon, I'm on leave tomorrow, and Appveyor still has 3.5 hours to go. I'll be back Monday, or anyone can merge in the meantime?