Opened 10 months ago

Closed 9 months ago

#29221 closed enhancement (implemented)

Tooling to track code-style/best-practices violations

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: network-team-roadmap-2019-Q1Q2 teor-merge
Cc: Actual Points: 4
Parent ID: Points: 3
Reviewer: nickm Sponsor: Sponsor31-can

Description

From the list of best practices defined in #29219, when we can write tooling to do so, we should write tooling to find and measure violations of these practices

Some of these measures (function length, file length, layer violations via includes) already have code for them; others will be harder. We should probably aim for the lower hanging fruit here.

Child Tickets

Attachments (1)

tor_complexity.log (502.5 KB) - added by asn 9 months ago.

Download all attachments as: .zip

Change History (34)

comment:1 Changed 10 months ago by nickm

Component: - Select a componentCore Tor/Tor

comment:2 Changed 10 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added
Points: 53

comment:3 Changed 9 months ago by asn

OK I'm getting started with this ticket:

Nick what do you mean that some of these measures already have code for them? We have code in Tor to count function and file length? I could not find that in scripts/ but we do have the #include checking script.

On this note, and since we are chasing for low-hanging fruit here, I played a bit with static analysis tools today. The thing that worked out best because it's open source and does some of the things that we want it do is GNU complexity (...): https://www.gnu.org/software/complexity/

Based on the best practices list of #29219, complexity can calculate: file length, function length, nesting level count, and function-per-file count. As far as I know we don't have any of these right now so being able to do these would be an improvement. Also it's a GNU tool so it's FOSS and something we can use as an optional parts of our scripts toolkit.

As an example here is complexity's output for scheduler_kist.c:

$ complexity --histogram --score --thresh=1 ./src/core/or/scheduler_kist.c
NOTE: proc kist_scheduler_run in file ./src/core/or/scheduler_kist.c line 544
	nesting depth reached level 5
Complexity Scores
Score | ln-ct | nc-lns| file-name(line): proc-name
    1       2       2   ./src/core/or/scheduler_kist.c(140): free_all_socket_info
    1       4       2   ./src/core/or/scheduler_kist.c(420): MOCK_IMPL
    1       4       2   ./src/core/or/scheduler_kist.c(473): kist_scheduler_on_new_options
    1       3       3   ./src/core/or/scheduler_kist.c(109): each_channel_write_to_kernel
    1       4       4   ./src/core/or/scheduler_kist.c(147): socket_table_search
    1       4       4   ./src/core/or/scheduler_kist.c(773): scheduler_kist_set_lite_mode
    1       4       4   ./src/core/or/scheduler_kist.c(783): scheduler_kist_set_full_mode
    1       5       5   ./src/core/or/scheduler_kist.c(118): free_outbuf_info_by_ent
    1       5       5   ./src/core/or/scheduler_kist.c(129): free_socket_info_by_ent
    1       5       5   ./src/core/or/scheduler_kist.c(430): MOCK_IMPL
    1       5       5   ./src/core/or/scheduler_kist.c(441): have_work
    1       8       5   ./src/core/or/scheduler_kist.c(95): channel_outbuf_length
    1       7       7   ./src/core/or/scheduler_kist.c(333): outbuf_table_remove
    1       7       7   ./src/core/or/scheduler_kist.c(346): set_scheduler_run_interval
    1       8       8   ./src/core/or/scheduler_kist.c(157): free_socket_info_by_chan
    1      10       8   ./src/core/or/scheduler_kist.c(396): update_socket_written
    1      12       8   ./src/core/or/scheduler_kist.c(794): scheduler_can_use_kist
    1      10       9   ./src/core/or/scheduler_kist.c(483): kist_scheduler_init
    1      14       9   ./src/core/or/scheduler_kist.c(359): socket_can_write
    1      10      10   ./src/core/or/scheduler_kist.c(301): init_socket_info
    1      10      10   ./src/core/or/scheduler_kist.c(318): outbuf_table_add
    1      11      11   ./src/core/or/scheduler_kist.c(379): update_socket_info
    1      16      11   ./src/core/or/scheduler_kist.c(751): kist_scheduler_run_interval
    2      31      20   ./src/core/or/scheduler_kist.c(507): kist_scheduler_schedule
    5     120      54   ./src/core/or/scheduler_kist.c(172): MOCK_IMPL
   17     168      89   ./src/core/or/scheduler_kist.c(544): kist_scheduler_run

Complexity Histogram
Score-Range  Lin-Ct
    0-9         218 ************************************************************
   10-19         89 ************************

Scored procedure ct:       26
Non-comment line ct:      307
Average line score:         6
25%-ile score:              1 (75% in higher score procs)
50%-ile score:              2 (half in higher score procs)
75%-ile score:             17 (25% in higher score procs)
Highest score:             17 (kist_scheduler_run() in ./src/core/or/scheduler_kist.c)

In the CLI I set the score threshold to 1 so that it ignores truly trivial functions, but that can be set to 0 if wanted.

You can see the following:

  • It tells us that there is a function with a bad nesting depth of 5.
  • It tells us how big each function is (nc-lns counts only code).
  • It tells us how big the file is (307 LoC).
  • It tells us that the file contains 26 functions.
  • It has a weird but configurable algorithm to calculate function score.

The only weird thing is that it miscalculates at which line a function is, I think this has something to do with the way we comment our code but I haven't figured it out yet (does not seem related to macros). And it also does not work really well with MOCK_IMPL etc.

I can also pass it all the .c files of Tor and it will give me a project wide summary which I have attached to this ticket.

Does this seem like a plausible way forward? Like, having a script tha runs this in a useful way and parses the output to give some controlled information? Or do we car eabout other best-practices from #29219 more? This likely seems to be the lowest hanging fruit, but I'm open to more.

Also what do we want to do? Throw error if we have a nesting score above 7 (example)? Or just give out information?

Last edited 9 months ago by asn (previous) (diff)

Changed 9 months ago by asn

Attachment: tor_complexity.log added

comment:4 in reply to:  3 Changed 9 months ago by nickm

Hi, George!

I think that the function/file length thing I mentioned was just ad hoc one-liners, like find src/ -name '*.[ch]' | xargs wc -l |sort -n and like whatever I used for cyclomatic complexity in #6313.

I think you should feel free to go with whatever fruit seems lowest-hanging here. complexity seems like a good start for measurement, but I don't see immediately how to use it for enforcing rules. What do you think there?

comment:5 Changed 9 months ago by asn

Status: newneeds_review

Initial PoC posted in https://github.com/asn-d6/tor/tree/bug29221_draft

Needs further work based on the TODO in practracker.py.

Would like some feedback from Nick on how this looks.

comment:6 Changed 9 months ago by asn

OK. Something that looks like a plausible MVP here can be found in my bug29221_draft above. Or: https://github.com/torproject/tor/pull/742

Let me know how this looks like to you.

comment:7 Changed 9 months ago by asn

Reviewer: nickm

comment:8 Changed 9 months ago by asn

OK fixed the CI issues and also added a squash commit that slightly improves the error message in case of problems. We can always do better but this might be OK for a minimum-viable-product here.

comment:9 Changed 9 months ago by nickm

Status: needs_reviewneeds_revision

I've made a few comments in the PR. This is generally looking very solid.

comment:10 in reply to:  9 Changed 9 months ago by asn

Status: needs_revisionneeds_review

Replying to nickm:

I've made a few comments in the PR. This is generally looking very solid.

OK I fixed the easy stuff in a new commit ac05cc7437. I made a ticket #29746 for the rest of good-to-have stuff asked on the PR.

In general, some of the good-to-have stuff were not too hard to add, but IMO there should be tests to make sure that they don't influence results.

comment:11 Changed 9 months ago by nickm

LGTM, except that I think one of the requested changes is important for correctness. I've added a commit for that, plus another commit for the message change, in a new PR: https://github.com/torproject/tor/pull/782

If you like that, let's squash and merge.

comment:12 Changed 9 months ago by asn

Status: needs_reviewneeds_revision

Looks good but perhaps we should use the "FAILED" keyword when the program fails.
After that, let's mark as merge-ready.

comment:13 Changed 9 months ago by nickm

Status: needs_revisionneeds_review

Done; thanks!

comment:14 Changed 9 months ago by nickm

Status: needs_reviewmerge_ready

comment:15 Changed 9 months ago by nickm

Keywords: teor-merge added
Milestone: Tor: 0.4.1.x-final

comment:16 Changed 9 months ago by teor

Type: defectenhancement

I did a squashed branch, but the CI for the original pull request failed, so I will only merge if the CI for the squashed branch works:
https://github.com/torproject/tor/pull/786

comment:17 Changed 9 months ago by nickm

I added two more exceptions to the branch to try to make the PR not fail post-merge.

comment:18 Changed 9 months ago by teor

Status: merge_readyneeds_revision

Sorry, that didn't seem to fix CI.

comment:19 in reply to:  18 Changed 9 months ago by asn

Replying to teor:

Sorry, that didn't seem to fix CI.

Almost there. The problem with your fix was that you added a problem line for hs_cell.c:hs_cell_parse_introduce2 but there was already one in the exceptions file. Since the second one was more down it had more priority (having two same problems in the exception file is currently UB).

Perhaps we can soft-solve this problem by editing the right exception line, and then as part of #29746 we can figure out what to do about double exception lines?

Please see my branch bug29221_more for a fix to the issue and special handling of duplicate exception lines: https://github.com/torproject/tor/pull/787

Last edited 9 months ago by asn (previous) (diff)

comment:20 Changed 9 months ago by asn

Status: needs_revisionmerge_ready

comment:21 Changed 9 months ago by teor

I'm going to assume that small fixes like this don't need another review before merging?

comment:22 in reply to:  21 Changed 9 months ago by teor

Replying to teor:

I'm going to assume that small fixes like this don't need another review before merging?

I have document this exception in our merge policy:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/MergePolicy?action=diff&version=7

comment:23 Changed 9 months ago by asn

CI passes. One Appveyor build failed but it's unrelated (it's a stochastic prob distr fail which I noted down at #29767).

Last edited 9 months ago by asn (previous) (diff)

comment:24 Changed 9 months ago by nickm

Actual Points: 4
Resolution: implemented
Status: merge_readyclosed

I've merged this to master, since we've all signed off on it in various ways, and since it doesn't actually touch any C code.

comment:25 Changed 9 months ago by teor

Resolution: implemented
Status: closedreopened

This script breaks jenkins on master due to a python error:

23:19:11 python ../tor/scripts/maint/practracker/practracker.py ../tor
23:19:11 Traceback (most recent call last):
23:19:11   File "../tor/scripts/maint/practracker/practracker.py", line 25, in <module>
23:19:11     import problem
23:19:11   File "/srv/jenkins-workspace/workspace/tor-ci-linux-master/ARCHITECTURE/amd64/SUITE/stretch/tor/scripts/maint/practracker/problem.py", line 27
23:19:11     print("No exception file provided", file=sys.stderr)
23:19:11                                             ^
23:19:11 SyntaxError: invalid syntax
23:19:11 Makefile:19273: recipe for target 'check-best-practices' failed
23:19:11 make[1]: *** [check-best-practices] Error 1

https://jenkins.torproject.org/job/tor-ci-linux-master/ARCHITECTURE=amd64,SUITE=stretch/3830/consoleFull#188818640921ba4c2d-fbf2-4975-9955-c2fe5cc665f5

comment:26 Changed 9 months ago by teor

Status: reopenedneeds_revision

comment:27 Changed 9 months ago by teor

I'm guessing you probably wanted to say "python3"?

comment:28 Changed 9 months ago by teor

Hmm, no, travis runs with python 2.7.6 by default, so we should fix the script so it works with python3.

comment:29 in reply to:  28 Changed 9 months ago by asn

Status: needs_revisionmerge_ready

Replying to teor:

Hmm, no, travis runs with python 2.7.6 by default, so we should fix the script so it works with python3.

Thanks for catching that.

Seems like checkIncludes.py is also a python3 script but it also runs with python2 in jenkins/travis because of the from __future__ import print_function trick.

Please see https://github.com/torproject/tor/pull/796 for a branch that does exactly that.

Marking as merge_ready but feel free to push down to needs_rev if needed.

comment:30 Changed 9 months ago by teor

Seems simple enough, I'll merge after CI finishes chewing on the last lot of merges.

comment:31 Changed 9 months ago by teor

I can't merge this today: my day should end soon, I'm on leave tomorrow, and Appveyor still has 3.5 hours to go. I'll be back Monday, or anyone can merge in the meantime?

comment:32 Changed 9 months ago by nickm

okay, I'll do the merging, now that the CI has passed.

comment:33 Changed 9 months ago by nickm

Resolution: implemented
Status: merge_readyclosed
Note: See TracTickets for help on using tickets.