Opened 5 months ago

Closed 3 months ago

#32921 closed enhancement (implemented)

Code and script changes to run clang-format without breaking checkSpaces or coccinelle

Reported by: nickm Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: style, 043-can
Cc: teor, catalyst Actual Points: 1.5
Parent ID: #29226 Points:
Reviewer: catalyst Sponsor:

Description (last modified by nickm)

I've been working to make changes to our code and our scripts to improve our clang-format output. I think they are mature enough that we can merge them now.

I also think it may be time to merge a .clang-format file and a script to run it. We'll want to tweak it a bunch before we actually run it on our code, but getting it into our version control will help us refine our way towards a reasonable target.

Edited to clarify: Neither the .clang-format file, the script, or the post-processing tool are meant to be a final version. This branch does not mean that our style choices are final. The goal here is just to land initial versions that we can start experimenting with.

Child Tickets

Change History (27)

comment:1 Changed 5 months ago by nickm

Actual Points: 1.5
Keywords: style added
Status: assignedneeds_review

See branch clang_format_prep_2 with PR at https://github.com/torproject/tor/pull/1650 . This is the branch to review and merge.

It makes the following kinds of changes:

  • It adds a .clang-format file.
  • It adds an extensible postprocessing tool that works with clang-format to do the right thing for our FOREACH_END and MOCK_IMPL macros.
  • It adds a shell script that applies both of the above to our code.
  • It makes checkSpace.pl tolerate a bunch of things that clang-format wants to do in its output.
  • It tweaks our code a little so that, after clang-format is run:
    • The unit tests all pass
    • checkSpace.pl still passes
    • Our coccinelle tests still pass
    • columnar tables are preserved
    • macro stringifications at the start of a line (e.g. #name) are no longer mangled.

To see the result of running this formatting on our current code, see demo_clang_foramt_20190110 with PR at https://github.com/torproject/tor/pull/1648

Last edited 5 months ago by nickm (previous) (diff)

comment:2 Changed 4 months ago by dgoulet

Reviewer: catalyst

comment:3 Changed 4 months ago by nickm

Keywords: 043-can added

tag all currently needs_review tickets with 043-can. (Since there's code before the feature freeze, maybe we can take it.)

comment:4 Changed 4 months ago by catalyst

What is the minimum clang-format version required for the provided config file? I couldn't get it to work with the default clang-format (3.8) in Xenial.

comment:5 Changed 4 months ago by nickm

I know that it works with clang-format 9.0.0, and I believe it works with 8.x too, but I haven't double-checked this version of it.

comment:6 in reply to:  5 Changed 4 months ago by catalyst

Replying to nickm:

I know that it works with clang-format 9.0.0, and I believe it works with 8.x too, but I haven't double-checked this version of it.

Thanks. The latest that's easily installable for me on Xenial is 6.0, which doesn't support StatementMacros or SpaceAfterLogicalNot. (I stopped trying to comment out settings after that point.)

comment:7 Changed 4 months ago by teor

We're using Ubuntu bionic in Travis:
https://gitweb.torproject.org/tor.git/tree/.travis.yml#n104

And it has clang-format-6.0:
https://packages.ubuntu.com/bionic/clang-format

So if we want to use clang-format in our CI, to check that code has been formatted before merging to master, we can't require a clang-format version higher than 6.0.

If we don't want to use clang-format in CI, here are our options:

Ubuntu releases have:

  • xenial: 6.0
  • bionic: 6.0
  • disco: 8.0
  • eoan: 9.0

https://packages.ubuntu.com/search?keywords=clang-format&searchon=names&suite=all&section=all

Debian has:

  • stretch: 3.8
  • buster: 7.0

https://packages.debian.org/search?keywords=clang-format&searchon=names&suite=all&section=all

macOS has:

  • Xcode doesn't install clang-format
  • Homebrew: 9.0

https://formulae.brew.sh/formula/clang-format

The ideal version to require for developers would be 6.0.

The highest reasonable version we could require is 7.0, and even then, we'd be forcing Ubuntu users to upgrade to Ubuntu disco.

comment:8 Changed 4 months ago by catalyst

I think there are packages in Ubuntu that have specific clang version numbers, like clang-format-9 on Bionic? That's how I was getting 6.0 on Xenial (which otherwise defaults to 3.8).

Ubuntu installs them with executable names like clang-format-6.0, so we'd have to parameterize how our tools invoke clang-format.

comment:9 Changed 4 months ago by catalyst

Comments on the .clang-format file:

  • IndentCaseLabels: true isn't consistent with KNF, nor is it consistent with how we seem to indent goto labels.
  • I'm not sure I like AlignEscapedNewlines: Left. Our existing practice seems to somewhat follow what what emacs does, which tries to line up the newline escaping backslashes of a macro definition so they are at a tab stop. I'm not sure there's an easy way to get clang-format to do that. AlignEscapedNewlines: Right seems more wrong, though.
  • KeepEmptyLinesAtTheStartOfBlocks: false isn't consistent with KNF: functions without local variables are supposed to have an empty line at the start of the block. But we don't seem to do that consistently if at all in our existing code, so I guess it's ok.
  • SpaceAfterLogicalNot: true seems fairly uncommon in the C code I've seen, which is also consistent with it only being supported in fairly recent clang-format. I think other tools may expect SpaceAfterLogicalNot: false.

comment:10 Changed 4 months ago by nickm

Comments on the .clang-format file:

Thank you! I am not attached to IndentCaseLabels or SpaceAfterLogicalNot, though I think others may feel more strongly, and we should probably do a quick straw poll.

I can probably hack something together with codetool.py to replace the output of AlignEscapedNewlines with something closer to what emacs does; do you think that would be worthwhile?

I would like to keep KeepEmptyLinesAtTheStartOfBlocks: false if we can.

comment:11 Changed 4 months ago by catalyst

Proposed changes based on mailing list responses and meetings:

  • change to IndentCaseLabels: false due to lack of objections
  • catalyst will investigate whether Emacs can be easily convinced to produce the results of AlignEscapedNewlines: Left (maybe we can also decide to leave existing ones alone with AlignEscapedNewlines: DontAlign? but maybe instead that puts space-backslash at the end of the each line's visible text. I haven't checked.)
  • retain KeepEmptyLinesAtTheStartOfBlocks: false (no other opinions about changing it, and my preference isn't very strong)
  • change to SpaceAfterLogicalNot: false (it removes the need for at least one other change on the PR that compensates for the introduced space, and brings us closer to KNF)

comment:12 Changed 4 months ago by catalyst

Also, we still haven't decided on the minimum required version of clang-format. If the minimum required version isn't the default clang-format on our CI platforms, we might need to parameterize the clang-format executable name in the helper script to deal with that.

comment:13 Changed 4 months ago by nickm

I don't object to any of those changes. Would you prefer I wait till you've reviewed the rest of the branch for me to make them, or go ahead and update the branch?

comment:14 in reply to:  13 Changed 4 months ago by catalyst

Replying to nickm:

I don't object to any of those changes. Would you prefer I wait till you've reviewed the rest of the branch for me to make them, or go ahead and update the branch?

Please go ahead and update the branch. Force-pushing the branch is ok with me.

comment:15 Changed 4 months ago by nickm

I've pushed a new rebased clang_format_prep_3, which is the branch I am hoping for review on here. This branch forces no formatting per se, and only prepares for formatting. https://github.com/torproject/tor/pull/1709 is the pull request. There's a CI failure, but it looks like a download failure rather than a real bug.

https://github.com/torproject/tor/pull/1710 is the result of running clang-format.sh on that branch. The CI failures are real there due to some functions getting longer and making practracker upset.

comment:16 Changed 4 months ago by catalyst

Doing a little digging through documentation, it seems like StatementMacros is new in clang-format-8.0, and SpaceAfterLogicalNot is new in clang-format-9.0. The latest branch removes SpaceAfterLogicalNot, which helps with the latter. (If we want to go as far back as clang-format-3.8 for Debian stretch, we might have to do a lot more digging.)

Once I comment out those directives, clang-format-6.0 on Ubuntu Xenial seems to accept the .clang-format file. (We'll have to parameterize the clang-format executable name in the helper script if we want this to run easily on Xenial, though, because 3.8 is the default there.)

We can remove StatementMacros by changing the ht.h macros to use the "redeclare incomplete struct" trick for eating semicolons, and adjusting uses of those macros. (The examples in the comments in ht.h use semicolons after those macros anyway, even though it might cause compiler errors/warnings.) We might want to do that as a child ticket of this ticket.

(Note that we seem to have two readily findable instances of this technique with different struct tags: struct dummy_semicolon_eater__ in src/lib/cc/compat_compiler.h and struct tor_semicolon_eater in src/lib/conf/conftesting.h. We might want to consolidate them. I prefer the tor_ prefix for future library-friendliness.)

comment:17 Changed 4 months ago by nickm

I agree with combining dummy_semicolon_eater__ and tor_semicolon_eater. The trouble with ht.h is that it's in src/ext, so we don't technically "own" it: we try to keep it usable outside of Tor. That said, we can create an ht_semicolon_eater for ht.h.

Let me know if you'd like me to do the StatementMacros removal in this branch or open a new ticket.

comment:18 in reply to:  17 Changed 4 months ago by catalyst

Replying to nickm:

I agree with combining dummy_semicolon_eater__ and tor_semicolon_eater. The trouble with ht.h is that it's in src/ext, so we don't technically "own" it: we try to keep it usable outside of Tor. That said, we can create an ht_semicolon_eater for ht.h.

Let's open an independent ticket for combining the two existing dummy struct tags. I agree we can make a ht_semicolon_eater specifically for ht.h.

Let me know if you'd like me to do the StatementMacros removal in this branch or open a new ticket.

I think both ht_semicolon_eater and removing StatementMacros can be in this branch.

comment:19 Changed 4 months ago by nickm

Okay; I've updated the clang_format_prep_3 branch.

comment:20 Changed 4 months ago by catalyst

Remaining things that I think are excessive changes by clang-format:

  • clang-format is too eager to move line breaks around so that a long argument list wraps with the longest-possible lines. This can make code more confusing if arguments have a useful semantic grouping. I'm not sure how to turn this off. I think it might do a similar thing to long expressions in the controlling expressions of conditional statements? Maybe we can work around this by adding redundant parentheses to make clang-format more reluctant to break lines at certain places. What do people think about that?
  • extra indentation for array initializers, e.g. clang produces
    struct foo a[] = {
        V(x),
        V(y),
    };
    
    while existing code tends to have
    struct foo a[] = {
      V(x),
      V(y),
    };
    
    I think we can fix this with ContinuationIndentWidth: 2, but I'm not sure. (This would change to 4 once we change to a 4-column indent.)
  • clang-format doesn't retain spaces inside curly braces in one-line initializer items, e.g. clang produces
    struct foo a[] = {
      {0, 0, 0},
    };
    
    while existing code tends to have
    struct foo a[] = {
      { 0, 0, 0 },
    };
    
    I find the style with the internal spaces more readable. This is not a strong perference.
  • clang-format doesn't have a way to maintain tabular alignment in initializer lists. We have a lot of these. The branch doesn't turn off clang-format in all of them. Do we want to maintain tabular alignment? (I think it makes large tables more readable, but I'd like to hear other opinions.)
  • I don't like the way AlignTrailingComments: false looks. I like alignment for stuff like Doxygen in-line comments for structures. It's not a strong preference, though. (Among other things, I think Doxygen might do its own alignment in its HTML output.)

comment:21 in reply to:  20 Changed 4 months ago by teor

Replying to catalyst:

Remaining things that I think are excessive changes by clang-format:

  • clang-format is too eager to move line breaks around so that a long argument list wraps with the longest-possible lines. This can make code more confusing if arguments have a useful semantic grouping. I'm not sure how to turn this off. I think it might do a similar thing to long expressions in the controlling expressions of conditional statements? Maybe we can work around this by adding redundant parentheses to make clang-format more reluctant to break lines at certain places. What do people think about that?

I don't think using whitespace as implicit documentation is ideal.

Instead of giving whitespace an implicit meaning, we should explicitly document that meaning. If we just use whitespace:

  • it's easy for other developers to miss subtle formatting, and
  • any changes to the code can easily destroy the grouping.

So I suggest that we add comments to these kinds of argument lists.

If necessary, we can add whitespace or punctuation to the comments, to achieve a particular argument alignment.

  • extra indentation for array initializers, e.g. clang produces
    struct foo a[] = {
        V(x),
        V(y),
    };
    
    while existing code tends to have
    struct foo a[] = {
      V(x),
      V(y),
    };
    
    I think we can fix this with ContinuationIndentWidth: 2, but I'm not sure. (This would change to 4 once we change to a 4-column indent.)

I don't see a problem here. I don't mind either way.

  • clang-format doesn't retain spaces inside curly braces in one-line initializer items, e.g. clang produces
    struct foo a[] = {
      {0, 0, 0},
    };
    
    while existing code tends to have
    struct foo a[] = {
      { 0, 0, 0 },
    };
    
    I find the style with the internal spaces more readable. This is not a strong perference.

I find the internal spaces slightly more readable, particularly with single digits. I don't think it matters as much with variable names.

But I don't mind either way.

  • clang-format doesn't have a way to maintain tabular alignment in initializer lists. We have a lot of these. The branch doesn't turn off clang-format in all of them. Do we want to maintain tabular alignment? (I think it makes large tables more readable, but I'd like to hear other opinions.)

I think tabular alignment is useful, particularly for large tables.

We use *.inc files for some large tables. We could do the same thing with all the tables we want to have specific alignment?

(The config code is in the process of migrating to *.inc files for its config variable tables.)

  • I don't like the way AlignTrailingComments: false looks. I like alignment for stuff like Doxygen in-line comments for structures. It's not a strong preference, though. (Among other things, I think Doxygen might do its own alignment in its HTML output.)

Can you point to a before and after example?

comment:22 Changed 3 months ago by nickm

Description: modified (diff)

comment:23 Changed 3 months ago by catalyst

Status: needs_reviewneeds_revision

Following up from last week's C style meeting: I think this is almost ready, given what we agreed on as the scope of this ticket.

There should be disclaimers in the .clang-format and the script about how the style isn't official yet, and that people shouldn't commit or merge the results yet. Maybe also a comment about the minimum required clang-format version, which I think is 6.0 but might be earlier?

We should open a separate ticket to parameterize the clang-format executable name to make it easier for developers to run the minimum required clang-format version even when it's not called clang-format (e.g., clang-format-6.0 on Xenial). nickm, would you like to do that? I can also open it if you like.

comment:24 Changed 3 months ago by nickm

I've opened #33415 for the parameterization issue.

comment:25 Changed 3 months ago by nickm

Status: needs_revisionneeds_review

Oops! I forgot to push my commit from last week that added the warnings about not running the scripts on master yet. I've added it to the branch, along with a note about what we know so far about required versions.

comment:26 in reply to:  25 Changed 3 months ago by catalyst

Status: needs_reviewmerge_ready

Replying to nickm:

Oops! I forgot to push my commit from last week that added the warnings about not running the scripts on master yet. I've added it to the branch, along with a note about what we know so far about required versions.

Thanks! Looks good.

comment:27 Changed 3 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

Woo! Merged this to master.

Note: See TracTickets for help on using tickets.