Styler is slow #558

rillig · 2019-10-16T17:54:30Z

Formatting a real-life example file takes 17 seconds on my computer. That's really much because the files is just 1215 lines. Styling this file should take at most 1 second.

lorenzwalthert · 2019-10-16T20:43:33Z

Yes, but you get the follwing styled right compared to the fast formatter formatR:

Not all valid R code can be parsed by formatR, e.g. Parse error at comment after condition yihui/formatR#80 (comment) and Parsing Error with try-catch yihui/formatR#44.
x %>% \nc()
call({{ x }})
call(!!x) and call(!!! x).
R6 + roxygen2 comments (formatting a R6 object with roxygen2 documentation fails yihui/formatR#83)
add newline after ggplot2's + (newline after + sign yihui/formatR#100).
line breaks can be formatted incorrectly: Empty lines in function calls results in an error yihui/formatR#94 (comment)
and more

We also have features such as:

formatting of roxygen comments.
formatting of Rnw and Rmd code chunks.
caching.
turn styler off for some lines.
detecting alignment (Detect alignment #258), partly implemented with Detect alignment #537.

We are working on various features such as:

parallel processing (Multicore support #370).

Often there is the trade-off between:

fast code
readable / maintainable code

We choose an approach that is quite elegant, flexible and customizable (read more here). As of v1.3. , we can cache on top-level expression level which should mitigate the problem, at least for repeated styling.

Now the ad block is over :-)

Indeed you are right, styler is not fast. Although it used to be twice as slow (#78), it's also bothering me. To identify more bottlenecks, I think we'd need to:

make use of profiling (which is quite difficult because of recursive function calls, but I think @krlmlr found a way to do it).
move some code to C or C++, but I am not an expert on that so I'd leave that to others.
identify other shortcuts from looking at logic in code (for people like me who are very familiar with the codebase). This includes moving certain rules into one transformer function, so the visitor has less function calls to make.
import more namespaces and avoid ::. No idea what the speed implications are, but we can spare the .onLoad() call (compare Benchmark study: Implicit imports #685).
use [['x']] instead of $ [[ subsetting much slower than $ tidyverse/tibble#780 (comment).
general tibble speed things: https://github.com/tidyverse/tibble/issues?q=speed.
remove as.character(transformers) since we have transformer name and version (Speed improvement in caching #679).
In parse_transform_serialize_r_block() or parse_transform_serialize_r(): only pass a subset of all transformers to apply_transformers(), namely remove those who we know wont ever be applied (e.g. force_assignment_op() must not be applied to all nests if we know there is not a single instance of EQ_ASSIGN. This will have some speed-up for transformers that are expensive to run but hardly change any nest. Note that we should use a function to subset the transformers (e.g. subset_tranformers(), and that function should be packed in the style guide and not be hardcoded in styler (Remove unused transformers before processing #711).

We need to distinguish:

changes that will make styler pareto faster because it involves operations that are always faster when used an alternative to the current implementation. This can be tested easily.
changes that involve a trade off, e.g. will make it faster on cached code and make it slower on uncached code, or ignored code or aligned code etc. Or will make it faster on cached code if this code has structure x and slower when it has structure y. In particular changing the way caching works (e.g. when we introduced shallow parse tables, see 2921c7e and similar.

I can't say much more about that right now, unfortunately. If you want to contribute to any of the above or you have other suggestions, please let me know.

~~I think we should also establish benchmarking with CI services to understand the speed implications of new features more holistically.~~

We use {touchstone} to benchmark every PR: https://github.com/lorenzwalthert/touchstone

lorenzwalthert · 2019-12-12T09:57:31Z

Anyone interested in doing speed profiling with the new proffr package?

lorenzwalthert · 2020-02-04T23:28:29Z

#578 is now ready for testing (and looking for feedback). Styling your real world file form above took my notebook 40s (with some other tasks going on) in the first round. Then, I made 14 changes to the file (in like 10 different top-level expressions, you have about 190 top-level expressions, including 32 comments) and restyled. It took 10s. For all untouched top-level expressions, the cache was used. Assuming you run styler and change only a few top level expressions yields significant speed boosts. I agree 10s is still a lot but it's 4x faster than before. 🎉

Edit: We released v1.3.2 on CRAN that contains #578 and some fixes required later.

y9c · 2020-03-23T04:00:50Z

@lorenzwalthert

Thanks for the improvement, but the latest version of styler is still quite slow...

$ R --version
R version 3.6.3 (2020-02-29) -- "Holding the Windsock"

$ R -e 'packageVersion("styler")'
[1] ‘1.3.2’

$ wc -l test.R
1243

$ time cat test.R | R --slave --no-restore --no-save -e "con <- file(\"stdin\");styler::style_text(readLines(con));close(con)" > /dev/null
cat test.R  0.00s user 0.00s system 75% cpu 0.001 total
R --slave --no-restore --no-save -e  > /dev/null  16.71s user 0.08s system 99% cpu 16.817 total

It is very similar with this issue, I still take ~17s for a single file.

lorenzwalthert · 2020-03-23T06:25:26Z

Yes. This is why this issue is still open. Caching only improves speed on repeated styling.

y9c · 2020-03-23T09:13:09Z

I don't know much about the principles of styler, but I wonder if adding asynchronous programming can help, which is widely used in the formatter of other languages.

lorenzwalthert · 2020-03-23T09:15:13Z

Can you elaborate a bit on that?

y9c · 2020-03-23T09:26:22Z

I suppose most of the functions in styler don't need to run sequentially, thus wrapping those functions by promises package in R might bring some improvement. Just in the same way as the improved shiny app.
This may not fit the styler, I may be wrong...

lorenzwalthert · 2020-03-23T09:29:34Z

We can parallelize on different levels. The most outer way is over files, as in #617. Indeed we could also parallelize over expressions at some stage but this was not a priority because I worked on caching.

rillig · 2020-03-23T19:40:02Z

Instead of tweaking the implementation details, I wonder whether the algorithms that are used here are appropriate. Using parallelism will not reduce the overall CPU time, choosing a different algorithm may.

Is it possible to use a simpler formatting algorithm for the easy cases and resort to the current algorithm in the really complicated cases? In other programming languages, formatting can be done in O(n).

lorenzwalthert · 2020-03-23T19:59:11Z

What do you mean exactly by "the algorithm"? How the core of styler works is probably best documented in this vignette. I think it would be a lot of work to change the core of styler, because that would be a fundamental change and for me, the advantages of the approach taken are significant (described above). But if you have suggestions, I am open to discuss them. I don't even know if styler is O(n) (depending on how you measure n here I think it might be), I guess we could do a little empirical investigation to figure that out...

I don't see why parallelism would not reduce overall CPU time in any case, #370 certainly in many cases because additional start-up time is made up for with parallelization. Maybe it does not help when styling a single file, I agree.

lorenzwalthert added Complexity: High Priority: Medium Type: Enhancement Status: Unassigned labels Oct 16, 2019

renkun-ken mentioned this issue Dec 19, 2019

styler too slow for large script and blocks lsp REditorSupport/languageserver#141

Closed

lorenzwalthert pinned this issue Aug 8, 2020

lorenzwalthert mentioned this issue Jan 3, 2021

Remove unused transformers before processing #711

Merged

8 tasks

krlmlr mentioned this issue Mar 25, 2021

Understand where styler spends most of the time #759

Open

lorenzwalthert mentioned this issue Jun 11, 2021

add styler package jdblischak/reproducible-research-ctv#32

Closed

lorenzwalthert mentioned this issue Aug 9, 2021

ARROW-13562: [R] Styler followups apache/arrow#10879

Closed

lorenzwalthert mentioned this issue Apr 27, 2022

Styling of R6 Classes feels a little slow #942

Closed

This was referenced Aug 21, 2022

Optimize visiting code #979

Merged

Refactor: better stack traces for profiling #980

Merged

IndrajeetPatil mentioned this issue Sep 13, 2022

WIP: Check for performance improvements with C++ #997

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Styler is slow #558

Styler is slow #558

rillig commented Oct 16, 2019

lorenzwalthert commented Oct 16, 2019 •

edited

lorenzwalthert commented Dec 12, 2019 •

edited

lorenzwalthert commented Feb 4, 2020 •

edited

y9c commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

y9c commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

y9c commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

rillig commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

Styler is slow #558

Styler is slow #558

Comments

rillig commented Oct 16, 2019

lorenzwalthert commented Oct 16, 2019 • edited

lorenzwalthert commented Dec 12, 2019 • edited

lorenzwalthert commented Feb 4, 2020 • edited

y9c commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

y9c commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

y9c commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

rillig commented Mar 23, 2020

lorenzwalthert commented Mar 23, 2020

lorenzwalthert commented Oct 16, 2019 •

edited

lorenzwalthert commented Dec 12, 2019 •

edited

lorenzwalthert commented Feb 4, 2020 •

edited