Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of strip_trailing_whitespace() #63

Open
iainh opened this issue Jul 23, 2022 · 8 comments
Open

Performance of strip_trailing_whitespace() #63

iainh opened this issue Jul 23, 2022 · 8 comments

Comments

@iainh
Copy link
Contributor

iainh commented Jul 23, 2022

At the moment, I am considering possible solutions to the Colour Panic bug, see #42, and came across the strip_trailing_whitespace() function, as well.

I found this implementation working identically compared to the current one:

pub fn strip_trailing_whitespace(text: Rope) -> Rope {
    text.lines()
        .map(|x| x.to_string().trim_end().to_string() + "\n")
        .collect()
}

Does this version raise the performance when changing the implementation in zee-edit/src/graphemes.rs, line 122, or is it even worse? Maybe the performance boost suffices to save the file before the exit?

Originally posted by @kevinmatthes in #58 (comment)

@iainh iainh changed the title At the moment, I am considering possible solutions to the Colour Panic bug, see #42, and came across the strip_trailing_whitespace() function, as well. Performance of strip_trailing_whitespace() Jul 23, 2022
@iainh
Copy link
Contributor Author

iainh commented Jul 23, 2022

Moving discussion of the performance of strip_trailing_whitespace() here since while it aggravates the behaviour discussed in #58, the two are really separate issues.

@iainh
Copy link
Contributor Author

iainh commented Jul 23, 2022

Here are some benchmark results the current method (bench_strip_trailing_whitespace below) and @kevinmatthes's iterator based approach (bench_strip_trailing_whitespace_iterator below).

test tests::bench_strip_trailing_whitespace                  ... bench:  29,072,004 ns/iter (+/- 11,509,890)
test tests::bench_strip_trailing_whitespace_iterator         ... bench:  45,511,129 ns/iter (+/- 2,071,287)

The current implementation is still faster, though with a wider variance between runs.

@mcobzarenco
Copy link
Collaborator

mcobzarenco commented Jul 23, 2022

The current implementation is careful in not allocating any memory -- which is precisely why it's so convoluted -- where as

pub fn strip_trailing_whitespace(text: Rope) -> Rope {
    text.lines()
        .map(|x| x.to_string().trim_end().to_string() + "\n")
        .collect()
}

Allocates memory twice for every line (the calls to .to_string()). For a pathological file, e.g. minified Javascript, this allocates a string with the same file length twice.

@mcobzarenco
Copy link
Collaborator

mcobzarenco commented Jul 23, 2022

Just to say, I'm sure there's a better way of expressing the existing strip_trailing_whitespace in terms of Rope operations that still doesn't allocate memory, but it's also easier to read / comprehend.

@kevinmatthes
Copy link
Contributor

I agree that the string conversion is not optimal but I still think that an iterator approach should at least be tested since I expect the Rope queries to be performance bottlenecks while the iterator runs just once over the buffer. The question for a Rope-based iterator approach is how to emulate the string operations best without having to convert the data.

An iterator approach is usually also very well readable, so researching in this direction might have benefits also for other parts of the implementation.

@mcobzarenco
Copy link
Collaborator

I still think that an iterator approach should at least be tested

More than @iainh's benchmarks?

since I expect the Rope queries to be performance bottlenecks while the iterator runs just once over the buffer

Why? I would be very surprised if this is the case -- I wasn't exactly sure if you mean using String instead of Rope

  • Ropes as data structures are precisely optimised for this kind of non-contiguous, random edits without having to copy / allocate. As we do here when stripping white-space. Using something else would make it hard to avoid allocation / memcpy-ing. It would also be a fundamental change to how the editor works.
  • you may be right that we could be using ropey::Rope more efficiently -- the queries are O(log(n)) but we could be making fewer of them.
  • ropey (the rope library used), in my experience with it is fast, correct and has a pretty nice API.

@kevinmatthes
Copy link
Contributor

Okay, then just not applying the iterator pattern.

@kevinmatthes
Copy link
Contributor

#82 outlines another point: CRLF. At the moment, Zee replaces them by simple line feeds. I have absolutely no problem with that but maybe some people would like to keep their CRLF for some reasons. When working on the whitespace cropping function, it should be kept in mind to enable a simple opt-out of the CRLF removal depending on the respective settings of the configuration file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants