Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite indentation code #31

Draft
wants to merge 49 commits into
base: master
Choose a base branch
from
Draft

Rewrite indentation code #31

wants to merge 49 commits into from

Conversation

axvr
Copy link
Member

@axvr axvr commented Apr 27, 2023

This PR contains a rewrite of the whole Clojure indentation code with several aims:

  1. Improve the (currently abysmal) indent performance (Improve indent performance #6),
  2. Make indentation rules more configurable (such that Question: is there a way to configure Tonsky's indentation rule? #21 and more are possible),
  3. Change the defaults to be more similar to the Clojure Style Guide, Emacs and VS Code defaults,
  4. Simplify the indentation code as much as possible (currently so complex, it is near impossible to work on).

Along with these aims, this PR makes several further quality-of-life improvements to the indentation system, such as making it such that = does not alter the indentation of multi-line strings.

Note
Currently the "list" indentation algorithm and new clojure_indent_rules options have not been implemented, but will be soon. The goal is to replace the many previous indentation options with a unified option similar to what is found in Emacs' clojure-mode and cljfmt.

As of the latest change, the indentation is over twice as fast as before and implemented in about half the code.

To do

  • Core indentation algorithm.
  • Core string indentation rules.
  • Add the no searchpairpos fallback.
  • Add option for traditional Lisp-style multi-line string indentation.
  • Don't alter indentation of multi-line strings when the = operator is used.
  • Fix incorrect indentation caused by syntax glitches.
  • Build a custom pair analysis algorithm that eliminates all syntax lookups and hacks.
  • Add function operand alignment.
  • Fix bugs. (Especially the comment bugs.)
  • Indentation rules for special syntax (e.g. ~, ~@, ', @, #', #_, etc.).
  • Add clojure_indent_rules option and implement it.
  • Update and add more indentation tests (Add unit tests #1) + even test the different config options (as Rewrite indentation testing code #26 made that possible).
  • Update documentation.
  • Resolve most of the remaining TODOs.

axvr added 15 commits April 21, 2023 01:29
The objectives are:

    1. Simplify the indentation code; previous implementation has become
       so complex it is impossible to maintain,

    2. Significantly improve performance; previous indentation code was
       painfully slow, (see issue #6)

    3. Maximum configurability; should be configured similarly to cljfmt
       and make previously impossible things possible (e.g. issue #21).

As of this commit, objectives 1 and 2 have been met, but work on
objective 3 has not yet begun.  There will continue to be further
improvements, particularly around performance and the "what if syntax
highlighting is disabled?" scenario.

These changes will unfortunately be backwards incompatible, but
hopefully the improved performance and API will make up for it.
By utilising some ugly mutable state, this change more than doubles the
performance of the indentation code.  You can expect even further
benefits in larger code blocks.  This should completely eliminate any
need for the `clojure_maxlines` configuration option.
Since maps are far more likely to across multiple lines than vectors
(and when they are they tend to be much larger, e.g. EDN files!), we can
further optimise indentation by checking if the line is inside a map
*before* checking it is inside a vector.

(This happens because of the performance improvement made with the
addition of the optimised `CheckPair` function.)
The `=` operator will no longer alter the indent of lines within
a Clojure multi-line string or regular expression.

The previous behaviour was annoying for writing detailed doc-strings, as
it made reformatting the file with `gg=G` not possible as it would screw
up the indentation within the doc-strings.

Now the behaviour matches that of VS Code and Emacs.
By setting the `clojure_align_multiline_strings` option to `-1` it will
switch to traditional Lisp multi-line string indentation, of no indent.
Sometimes it seems that when `=`ing over a block, Vim will sometimes
not re-highlight strings correctly until we have already ran
`searchpairpos`.  In this case, we implement a hacky workaround which
detects the false-positive and recovers from it.
@axvr
Copy link
Member Author

axvr commented Apr 30, 2023

I've been experimenting with a custom algorithm as per the check list item above. This version eliminates all syntax highlight checks (and therefore no longer requires the various hacks to make those checks work) plus it is approx the same speed as my first rewrite attempt and uses about the same amount of code.

However the core benefits of this version is that it works when syntax highlighting is switched off, and eliminates the performance bottle neck of syntax highlight checks. This means that it will be possible to write a version of the algorithm in Vim9 script, to get another significant performance improvement! (Maintaining both a legacy Vim script and Vim9 script versions should be feasible due to how much simpler this is than the original implementation.)

TL;DR: I'll push a new version to this branch soon which will be:

  1. Simpler with no hacks,
  2. Can be become even faster (via Vim9 script),
  3. Works even when syntax highlighting is switched off.

Once this branch is feature complete, I'll share benchmarks comparing it to the original as is on master.

axvr added 4 commits May 1, 2023 00:58
Some refactoring should be possible here and further optimisations.
Once all optimisations I can think of have been implemented, I'll try
writing an alternate Vim9 script version.

(The syntax highlight group checks used in previous implementations of
the indentation code was the core bottleneck, so a Vim9 script version
would not have been much faster.)
The performance of the new reader-like indentation algorithm has now
surpassed the performance of my previous syntax highlighting approach
with better accuracy and no hacky code required.
@axvr
Copy link
Member Author

axvr commented May 1, 2023

Performance benchmarks as of current change (+ unpushed Vim9 version). This is measuring the time to reindent a whole file on a Macbook Pro 2021, so expect much more noticable performance gains on slower hardware. 🚀

Code version 200 lines of Clojure 600 lines of EDN
Original 0.96s 4.6s (clojure_maxlines = 0), 3.5s (clojure_maxlines = 300)
New (legacy) 0.33s 2.6s
New (Vim9) 0.06s 0.46s

(I'll share new benchmarks (and the full reports generated by Vim) once this branch is feature complete, although I don't expect it to change much.)

Even with the Vim9 implementation in the code too (not pushed yet, still figuring out how best to integrate it in a maintainable way), we can expect the file size of indent/clojure.vim to be ~150 LOC less than the original!

Regarding Neovim, until Vim9 script support is added, the "New (legacy)" code will be used. If anyone would like to volunteer to write a Lua implementation of this code for Neovim please open an issue and we can discuss how to integrate it.

Previously backslashes were accidentally detected as tokens by the
indentation tokeniser.  This meant that character literals, would break
indentation of everything after them.
@NoahTheDuke
Copy link

Any updates on this?

@axvr
Copy link
Member Author

axvr commented Aug 28, 2023

@NoahTheDuke yes, slow steady progress during the limited time I have to work on it at the moment. GitHub doesn't show it well, but there have been a bunch of bug fixes over the last few weeks. Currently my focus in on getting the majority of the tests to pass.

If you wanted to use it, this branch is actually usable if you don't mind a few minor formatting issues.

@NoahTheDuke
Copy link

Cool. I mistakenly trusted Github's "added 11 commits 2 months ago", which it seems bundles recent stuff too. I don't mean to pressure, just saw a bunch of activity then looked like nothing. Glad to hear you're still poking at it. I'll give it a whirl locally!

@NoahTheDuke
Copy link

Just tried it out and it's great so far. Thanks so much!

See: neovim/neovim#23576

Unintentional multiline string indentation will no longer occur in
future Neovim versions.
Previously indenting code containing inline comments like this:

    {:foo 1  ; Hello [
     :bar 2}

Would have resulted in this incorrect indentation:

    {:foo 1  ; Hello [
                      :bar 2}
In the original indent script, code like this:

    ((foo) 1
               3)

Would be indented as this:

    ((foo) 1
     3)

Now it is *correctly* indented as this:

    ((foo) 1
           3)
@axvr
Copy link
Member Author

axvr commented Dec 25, 2023

Update: nearing readiness! 🎉

This branch has now surpassed the original indentation code in pretty much every aspect (~2–3x faster—with potential to get ~10x faster with a little bit of Lua and Vim9 script—, less and simpler code, more accurate indentation, better defaults, more customisable, not dependent on syntax highlighting, etc.). There are a few small known bugs left to fix, the documentation needs updating and a little bit of code clean up will be required.

(The greatly expanded test suite is also already paying dividends; helping catch accidental regressions! More tests to come soon.)

@axvr axvr linked an issue Dec 26, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
indentation Affects indentation performance Affects performance
Projects
None yet
2 participants