New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve memory efficency of rope diffing #1284
base: master
Are you sure you want to change the base?
Conversation
rust/rope/src/diff.rs
Outdated
let mut looping = true; | ||
|
||
while looping { | ||
if let Some(c1) = it1.next() { | ||
if let Some(c2) = it2.next() { | ||
looping = c1 == c2; | ||
} else { | ||
return false; | ||
} | ||
} else { | ||
return it2.next().is_none(); | ||
} | ||
} | ||
|
||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way it is possible to avoid resetting looping
variable.
loop {
let c1 = it1.next();
let c2 = it2.next();
if c1 != c2 {
return false;
}
if c1.is_none() {
return true;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
rust/rope/src/diff.rs
Outdated
looping = false; | ||
} | ||
} else { | ||
looping = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break
would do the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. This loop also used while let
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just spent a while with this, and I'm actually a bit confused; the issue referenced in #946 was actually addressed in #1137, a little over a year ago.
So: I'm not totally sure where this patch fits. Running the benchmarks, it appears that we're looking at a significant slowdown (~2x) over the existing implementation, and I think memory savings would have to be pretty significant for us to want to take that tradeoff. If you can included any measurements that would be helpful; as-is I'm not sure this is buying us much 😞
I've reviewed this as I would if we were going to merge it, and there are some little bugs; it's quite possible that performance will be improved significantly when those are addressed.
@@ -74,10 +87,13 @@ impl Diff<RopeInfo> for LineHashDiff { | |||
let mut prev_base = 0; | |||
|
|||
let mut needs_subseq = false; | |||
for line in target.lines_raw(start_offset..target_end) { | |||
for line in SliceIter::new(base, start_offset) { | |||
let len = line.range.end - line.range.start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just add a len()
method to this type.
if let Some(base_off) = line_hashes.get(&line[non_ws..]) { | ||
if len - non_ws >= MIN_SIZE { | ||
if let Some(base_off) = line_hashes | ||
.get(&RopeSlice { rope: base, range: line.range.end + non_ws..line.range.end }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be line.range.start+non_ws
? When I make this change though I notice that the deltas that are generated are different from what were previously generated, and it looks like there are a lot of empty Copy
operations getting included; something else to investigate.
} else { | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performance matters here, and doing codepoint calculations adds a bunch of expense; since we're only looking for acsii characters it's much cheaper to work with bytes.
impl PartialEq for RopeSlice<'_> { | ||
fn eq(&self, other: &Self) -> bool { | ||
let mut it1 = | ||
self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here, and chars
; we want to use bytes
.
I would just add a method to RopeSlice
, like
impl<'a> RopeSlice<'a> {
fn iter_bytes(&'a self) -> impl Iterator<Item=u8> + 'a {
self.rope.iter_chunks(self.range.clone()).flat_map(|chunk| chunk.bytes())
}
}
let mut it1 = | ||
self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars()); | ||
let mut it2 = | ||
other.rope.iter_chunks(Interval::from(other.range.clone())).flat_map(|x| x.chars()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we can omit the Interval::from
here and just pass a Range
.
impl Hash for RopeSlice<'_> { | ||
fn hash<H: Hasher>(&self, state: &mut H) { | ||
let iter = | ||
self.rope.iter_chunks(Interval::from(self.range.clone())).flat_map(|x| x.chars()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto chars being slow here.
It might be worth looking into what the contract is around the hash impls of std
types; I suspect we can actually just hash the chunks returned by iter_chunks
one by one and get the same hash for a given string, regardless of how it is broken up.
Fixes #946 . I've used the first approach using a
RopeSlice
-struct. I've tried to stay as close as possible to the original code,and my measurements confirm a reduced memory consumption for
test_larger_diff
.cargo test --all
/./rust/run_all_checks
.