Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory efficency of rope diffing #1284

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

TDecking
Copy link
Collaborator

@TDecking TDecking commented Jun 26, 2020

Fixes #946 . I've used the first approach using a RopeSlice-struct. I've tried to stay as close as possible to the original code,
and my measurements confirm a reduced memory consumption for test_larger_diff.

  • I have responded to reviews and made changes where appropriate.
  • I have tested the code with cargo test --all / ./rust/run_all_checks.
  • I have updated comments / documentation related to the changes I made.
  • I have rebased my PR branch onto xi-editor/master.

Comment on lines 293 to 307
let mut looping = true;

while looping {
if let Some(c1) = it1.next() {
if let Some(c2) = it2.next() {
looping = c1 == c2;
} else {
return false;
}
} else {
return it2.next().is_none();
}
}

false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it is possible to avoid resetting looping variable.

loop {
    let c1 = it1.next();
    let c2 = it2.next();

    if c1 != c2 {
        return false;
    }

    if c1.is_none() {
        return true;
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

looping = false;
}
} else {
looping = false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break would do the job.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. This loop also used while let now.

Copy link
Member

@cmyr cmyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just spent a while with this, and I'm actually a bit confused; the issue referenced in #946 was actually addressed in #1137, a little over a year ago.

So: I'm not totally sure where this patch fits. Running the benchmarks, it appears that we're looking at a significant slowdown (~2x) over the existing implementation, and I think memory savings would have to be pretty significant for us to want to take that tradeoff. If you can included any measurements that would be helpful; as-is I'm not sure this is buying us much 😞

I've reviewed this as I would if we were going to merge it, and there are some little bugs; it's quite possible that performance will be improved significantly when those are addressed.

rust/rope/src/diff.rs Show resolved Hide resolved
@@ -74,10 +87,13 @@ impl Diff<RopeInfo> for LineHashDiff {
let mut prev_base = 0;

let mut needs_subseq = false;
for line in target.lines_raw(start_offset..target_end) {
for line in SliceIter::new(base, start_offset) {
let len = line.range.end - line.range.start;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just add a len() method to this type.

if let Some(base_off) = line_hashes.get(&line[non_ws..]) {
if len - non_ws >= MIN_SIZE {
if let Some(base_off) = line_hashes
.get(&RopeSlice { rope: base, range: line.range.end + non_ws..line.range.end })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be line.range.start+non_ws? When I make this change though I notice that the deltas that are generated are different from what were previously generated, and it looks like there are a lot of empty Copy operations getting included; something else to investigate.

} else {
break;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance matters here, and doing codepoint calculations adds a bunch of expense; since we're only looking for acsii characters it's much cheaper to work with bytes.

impl PartialEq for RopeSlice<'_> {
fn eq(&self, other: &Self) -> bool {
let mut it1 =
self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here, and chars; we want to use bytes.

I would just add a method to RopeSlice, like

impl<'a> RopeSlice<'a> {
    fn iter_bytes(&'a self) -> impl Iterator<Item=u8> + 'a {
            self.rope.iter_chunks(self.range.clone()).flat_map(|chunk| chunk.bytes())
        }
}

let mut it1 =
self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars());
let mut it2 =
other.rope.iter_chunks(Interval::from(other.range.clone())).flat_map(|x| x.chars());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we can omit the Interval::from here and just pass a Range.

impl Hash for RopeSlice<'_> {
fn hash<H: Hasher>(&self, state: &mut H) {
let iter =
self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto chars being slow here.

It might be worth looking into what the contract is around the hash impls of std types; I suspect we can actually just hash the chunks returned by iter_chunks one by one and get the same hash for a given string, regardless of how it is broken up.

@TDecking TDecking mentioned this pull request Sep 17, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve memory efficiency of rope diffing
4 participants