Improve memory efficency of rope diffing #1284

TDecking · 2020-06-26T21:07:49Z

Fixes #946 . I've used the first approach using a RopeSlice-struct. I've tried to stay as close as possible to the original code,
and my measurements confirm a reduced memory consumption for test_larger_diff.

I have responded to reviews and made changes where appropriate.
I have tested the code with cargo test --all / ./rust/run_all_checks.
I have updated comments / documentation related to the changes I made.
I have rebased my PR branch onto xi-editor/master.

Menschenkindlein · 2020-07-03T08:07:05Z

rust/rope/src/diff.rs

+        let mut looping = true;
+
+        while looping {
+            if let Some(c1) = it1.next() {
+                if let Some(c2) = it2.next() {
+                    looping = c1 == c2;
+                } else {
+                    return false;
+                }
+            } else {
+                return it2.next().is_none();
+            }
+        }
+
+        false


This way it is possible to avoid resetting looping variable.

loop { let c1 = it1.next(); let c2 = it2.next(); if c1 != c2 { return false; } if c1.is_none() { return true; } }

Menschenkindlein · 2020-07-03T08:09:47Z

rust/rope/src/diff.rs

+                looping = false;
+            }
+        } else {
+            looping = false;


break would do the job.

Fixed. This loop also used while let now.

cmyr

I just spent a while with this, and I'm actually a bit confused; the issue referenced in #946 was actually addressed in #1137, a little over a year ago.

So: I'm not totally sure where this patch fits. Running the benchmarks, it appears that we're looking at a significant slowdown (~2x) over the existing implementation, and I think memory savings would have to be pretty significant for us to want to take that tradeoff. If you can included any measurements that would be helpful; as-is I'm not sure this is buying us much 😞

I've reviewed this as I would if we were going to merge it, and there are some little bugs; it's quite possible that performance will be improved significantly when those are addressed.

rust/rope/src/diff.rs

cmyr · 2020-09-16T17:14:49Z

rust/rope/src/diff.rs

@@ -74,10 +87,13 @@ impl Diff<RopeInfo> for LineHashDiff {
        let mut prev_base = 0;

        let mut needs_subseq = false;
-        for line in target.lines_raw(start_offset..target_end) {
+        for line in SliceIter::new(base, start_offset) {
+            let len = line.range.end - line.range.start;


I would probably just add a len() method to this type.

cmyr · 2020-09-16T17:23:31Z

rust/rope/src/diff.rs

-                if let Some(base_off) = line_hashes.get(&line[non_ws..]) {
+            if len - non_ws >= MIN_SIZE {
+                if let Some(base_off) = line_hashes
+                    .get(&RopeSlice { rope: base, range: line.range.end + non_ws..line.range.end })


this should be line.range.start+non_ws? When I make this change though I notice that the deltas that are generated are different from what were previously generated, and it looks like there are a lot of empty Copy operations getting included; something else to investigate.

cmyr · 2020-09-16T17:24:29Z

rust/rope/src/diff.rs

+        } else {
+            break;
+        }
+    }


performance matters here, and doing codepoint calculations adds a bunch of expense; since we're only looking for acsii characters it's much cheaper to work with bytes.

cmyr · 2020-09-16T17:26:33Z

rust/rope/src/diff.rs

+impl PartialEq for RopeSlice<'_> {
+    fn eq(&self, other: &Self) -> bool {
+        let mut it1 =
+            self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars());


ditto here, and chars; we want to use bytes.

I would just add a method to RopeSlice, like

impl<'a> RopeSlice<'a> { fn iter_bytes(&'a self) -> impl Iterator<Item=u8> + 'a { self.rope.iter_chunks(self.range.clone()).flat_map(|chunk| chunk.bytes()) } }

cmyr · 2020-09-16T17:28:06Z

rust/rope/src/diff.rs

+        let mut it1 =
+            self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars());
+        let mut it2 =
+            other.rope.iter_chunks(Interval::from(other.range.clone())).flat_map(|x| x.chars());


also we can omit the Interval::from here and just pass a Range.

cmyr · 2020-09-16T17:29:37Z

rust/rope/src/diff.rs

+impl Hash for RopeSlice<'_> {
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        let iter =
+            self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars());


ditto chars being slow here.

It might be worth looking into what the contract is around the hash impls of std types; I suspect we can actually just hash the chunks returned by iter_chunks one by one and get the same hash for a given string, regardless of how it is broken up.

Menschenkindlein reviewed Jul 3, 2020

View reviewed changes

Tiggilyboo approved these changes Sep 2, 2020

View reviewed changes

TDecking force-pushed the T946_fix branch from 4d078e3 to d12135e Compare September 8, 2020 15:32

TDecking added 3 commits September 8, 2020 18:25

Improve memory efficency of rope diffing.

77bc202

apply suggestions.

2bc1c0d

lint

b5318fd

TDecking force-pushed the T946_fix branch from d12135e to b5318fd Compare September 8, 2020 16:26

cmyr requested changes Sep 16, 2020

View reviewed changes

TDecking mentioned this pull request Sep 17, 2020

More base cases in rope diffing. #1295

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory efficency of rope diffing #1284

Improve memory efficency of rope diffing #1284

TDecking commented Jun 26, 2020 •

edited

Menschenkindlein Jul 3, 2020

TDecking Jul 3, 2020

Menschenkindlein Jul 3, 2020

TDecking Jul 3, 2020

cmyr left a comment

cmyr Sep 16, 2020

cmyr Sep 16, 2020

cmyr Sep 16, 2020

cmyr Sep 16, 2020

cmyr Sep 16, 2020

cmyr Sep 16, 2020

Improve memory efficency of rope diffing #1284

Are you sure you want to change the base?

Improve memory efficency of rope diffing #1284

Conversation

TDecking commented Jun 26, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmyr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TDecking commented Jun 26, 2020 •

edited