Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix textLinesMutator when dealing with insertions/deletions at the end of pad #5253

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

webzwo0i
Copy link
Member

@webzwo0i webzwo0i commented Oct 29, 2021

Will clean this up tomorrow (fixing eslint etc). This should fix #5214 #5213 and replace #5215 and #2911

Also adds some comments and additional coverage.

@webzwo0i webzwo0i marked this pull request as draft October 29, 2021 23:03
@webzwo0i
Copy link
Member Author

As a side note: The commits that fixed #2802 should be re-evaluated.

@webzwo0i webzwo0i mentioned this pull request Oct 30, 2021
@webzwo0i webzwo0i marked this pull request as ready for review October 30, 2021 22:39
@webzwo0i
Copy link
Member Author

This should be ready now.

Copy link
Member

@rhansen rhansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The "chore Changeset.js: use template strings" commit missed one: Line 1383 (in outputMutOp()).
  • The "easysync tests: move to separate files" commit is difficult to review, mostly because stuff was reordered. Please split this commit into two: One that splits easysync.js into multiple files but preserves relative order, and one that reorders stuff.
  • The "easysync tests: move to separate files" commit introduces a //TODO: why isn't this |2+2*x|1+4 ? comment to src/tests/frontend/specs/easysync-assembler.js. The comment doesn't belong in that commit.
  • Please keep the first paragraph of each commit message short—ideally less than 50 chars but definitely less than 70. If you need to say more, put it in a separate paragraph (blank line between the paragraphs).
  • Please add the regression tests in the commit that fixes the bug.

src/static/js/Changeset.js Outdated Show resolved Hide resolved
src/tests/frontend/specs/easysync-helper.js Outdated Show resolved Hide resolved
src/tests/frontend/specs/easysync-helper.js Outdated Show resolved Hide resolved
src/tests/frontend/specs/easysync-helper.js Outdated Show resolved Hide resolved
src/tests/frontend/specs/easysync-helper.js Outdated Show resolved Hide resolved
src/tests/frontend/specs/easysync-helper.js Outdated Show resolved Hide resolved
@webzwo0i
Copy link
Member Author

Thanks for the input @rhansen! Will clean up.

Regarding putting the test into the same commit as the fix, I thought we also put the test in a separate commit (before all other commits) as stated somewhere in CONTRIBUTING.md. But in this case, as it's fixing multiple bugs in one PR, I'll definitely include them into the correct commit.

@rhansen
Copy link
Member

rhansen commented Nov 9, 2021

Rebased onto latest develop.

@rhansen
Copy link
Member

rhansen commented Nov 9, 2021

Regarding putting the test into the same commit as the fix, I thought we also put the test in a separate commit (before all other commits) as stated somewhere in CONTRIBUTING.md.

It's OK to add an xit() test before the commit that fixes the bug, then change the xit() to it() in the commit that fixes the bug. I generally don't like that approach for a couple of reasons:

  • It disconnects the addition of the new test from the reason the new test was added (as a regression test for a bugfix). This makes it harder for reviewers to determine that the new test is actually a proper regression test for the bugfix.
  • Mocha doesn't attempt xit() tests. Other test frameworks support the concept of "expected to fail" tests that are still run but don't result in overall failure unless they actually pass (in which case the "expected to fail" should be converted to the usual "expected to pass").

@rhansen
Copy link
Member

rhansen commented Nov 14, 2021

@webzwo0i I rebased onto latest develop and made some changes to address my feedback. The "easysync tests: move to separate files" commit was split into 5 different commits to make it easier to review. I think the only thing left is to add the regression tests to the commits that fix the associated bugs.

@webzwo0i
Copy link
Member Author

webzwo0i commented Nov 29, 2021

Thanks! I added the tests to the appropriate commits, added poolOrArray to easysync-assembler and also refined the commit messages of the last three commits.

Rebased onto latest develop

@webzwo0i webzwo0i force-pushed the fix-textlinesmutator branch 2 times, most recently from 96a398b to 6d96eed Compare November 29, 2021 22:47
@@ -903,20 +907,24 @@ class TextLinesMutator {
let removed = '';
if (this._isCurLineInSplice()) {
if (this._curCol === 0) {
// First line to be removed is in splice.
removed = this._curSplice[this._curSplice.length - 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes the following assumptions:

  • this._curSplice.length > 2
  • this._curLine === this._curSplice[0] + (this._curSplice.length - 3) (in other words, the current line is the last entry in this._curSplice)

It's not immediately obvious to me that either assumption is guaranteed to be true. Examples:

  • If this._curLine < this._curSplice[0] then this._isCurLineInSplice() will return true even if this._curSplice.length === 2.
  • If this._curLine === this._curSplice[0] and this._curSplice.length === 4, then the first removed line should be this._curSplice[this._curSplice.length - 2].

Maybe those assumptions are guaranteed to be true due to (undocumented?) invariants. If so, it would be good to assert them here.

}
} else {
// Nothing that is removed is in splice. Implies curCol === 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to assert that this._curCol === 0.

@@ -903,20 +907,24 @@ class TextLinesMutator {
let removed = '';
if (this._isCurLineInSplice()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this._curCol is greater than or equal to the current line's length? In that case, shouldn't we set this._curCol to 0 and increment this._curLine before this check? Or is that case not possible due to invariants?

If invariants make that case impossible, I think it would be worthwhile to add an assert statement.

Copy link
Member Author

@webzwo0i webzwo0i Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip:

  • if L present, curCol is set to 0 in all branches (in skipLines)
  • if L not present, curCol will be +=N (number of skipped chars) [0]

insert:

  • if L present, curCol is set to 0 after applying L, because of the "multiline end with newline" constraint
  • if L is present, but curLine was not in splice, curCol isn't touched
  • if L is not present and we are processing the last line of lines array, curCol is increased by N chars [1]
  • if L is not present and we are not processing the last line, curCol is increased by N chars [2]

remove:

  • no changes to curCol

In cases [1] and [2] curCol calculation is safe, as a text string is given to the respective function (the term that's added to curCol is derived from from text).

Also after calling close or skipping over multiple L that don't have attributes, the splice is not active anymore and it's possible that curCol is > 0. So there are four possible "start" states:

  • splice not active, curCol is 0 [X1]
  • splice not active, curCol > 0 [X2]
  • splice active, curCol is 0 [X3]
  • splice active, curCol > 0 [X4]

So for all 4 "start" states, I'll add coverage for [0] exceeding the number of chars in the line (ie a text of "test\n" and skipping over 6 chars so we end in the middle of the next line), followed by a call to removeLines.

If we want to assert I think it's better to do it where curCol is changed.

@@ -903,20 +907,24 @@ class TextLinesMutator {
let removed = '';
if (this._isCurLineInSplice()) {
if (this._curCol === 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about line markers? If the first character is a line marker and this._curCol === 1, shouldn't it be treated the same as if this._curCol === 0? Or is that not possible due to invariants? Unfortunately, I don't think it's possible to add an assert here because TextLinesMutator doesn't have access to the attributes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't understand, but the ops that are applied to lines shouldn't care about lineMarker's special meaning in the editor. ie lineMarker is special in the editor, because it's not rendered and it changes the "meaning" of the whole line, but when it comes to Changeset.js whatever attributes apply to the lineMarker are only applied to this single, "real" character. So we should not treat this._curCol === 1 as this._curCol === 0

this._curSplice[1] += 1;
// Is there a line left?
const remaining = this._linesGet(this._curSplice[0] + this._curSplice[1]) || '';
this._curSplice[sline] = this._curSplice[sline].substring(0, this._curCol) + remaining;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If remaining is the empty string, then this._curSplice[sline] will not end with a newline. Don't all entries in this._lines need to end with a newline character?

diff --git a/src/static/js/Changeset.js b/src/static/js/Changeset.js
index 5e0bc34fc..defc08852 100644
--- a/src/static/js/Changeset.js
+++ b/src/static/js/Changeset.js
@@ -918,11 +918,23 @@ class TextLinesMutator {
         removed = nextKLinesText(L - 1);
         this._curSplice[1] += L - 1;
         const sline = this._curSplice.length - 1;
-        removed = this._curSplice[sline].substring(this._curCol) + removed;
-        // Is there a line left?
-        const remaining = this._linesGet(this._curSplice[0] + this._curSplice[1]) || '';
-        this._curSplice[sline] = this._curSplice[sline].substring(0, this._curCol) + remaining;
-        this._curSplice[1] += remaining ? 1 : 0;
+        removed = this._curSplice[sline].slice(this._curCol) + removed;
+        this._curSplice[sline] = this._curSplice[sline].slice(0, this._curCol);
+        // The current line's newline was deleted, which means that the line after the splice (if
+        // there is one) should be joined with what remains of the current line.
+        const n = this._curSplice[0] + this._curSplice[1];
+        if (n < this._linesLength()) {
+          // There is a line that can be joined with the current line.
+          // TODO: What if the joined line starts with a line marker?
+          // TODO: What if the joined line has line attributes that differ from the current line's
+          // line attributes?
+          this._curSplice[sline] += this._linesGet(n);
+          this._curSplice[1] += 1;
+        } else {
+          // There is no line that can be joined with the current line. The current line's newline
+          // was removed, but all lines must end with a newline so add it back.
+          this._curSplice[sline] += '\n';
+        }
       }
     } else {
       // Nothing that is removed is in splice. Implies curCol === 0.

// There are no additional lines. Although the line is put into splice, curLine is not
// increased because there may be more chars in the line (newline is not reached). We are
// inserting at the end of lines. curCol is 0 as curLine is not in splice.
this._curSplice.push(text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right. If my understanding is correct, each entry in this._lines represents a line of the document and must end with a newline character. By pushing to this._curSplice, this code creates a new line at the end of the document, one that line doesn't end with a newline. This code should instead insert the text after column this._curCol (which should be before the line's newline) in line this._curLine.

I think the actual bug is inside this._putCurLineInSplice(): That method should not attempt to read past the end of this._lines or delete lines that don't exist in this._lines. Something like this (untested):

diff --git a/src/static/js/Changeset.js b/src/static/js/Changeset.js
index 5e0bc34fc..930a5e187 100644
--- a/src/static/js/Changeset.js
+++ b/src/static/js/Changeset.js
@@ -823,11 +823,18 @@ class TextLinesMutator {
    * @returns {number} the index of the added line in curSplice
    */
   _putCurLineInSplice() {
-    if (!this._isCurLineInSplice()) {
-      this._curSplice.push(this._linesGet(this._curSplice[0] + this._curSplice[1]));
-      this._curSplice[1]++;
+    assert(this._inSplice, 'not in splice');
+    assert(this._curLine >= this._curSplice[0], 'curLine is before splice start');
+    while (this._curSplice[0] + this._curSplice.length - 2 < this._curLine + 1) {
+      const n = this._curSplice[0] + this._curSplice[1];
+      if (n < this._linesLength()) {
+        this._curSplice.push(this._linesGet(n));
+        this._curSplice[1]++;
+      } else {
+        // No more lines available to add to the splice. Create a new line from scratch.
+        this._curSplice.push('\n');
+      }
     }
-    // TODO should be the same as this._curSplice.length - 1
     return 2 + this._curLine - this._curSplice[0];
   }
 

Copy link
Member Author

@webzwo0i webzwo0i Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing to _putCurLineInSplice. Will take a look.

Let's make sure newlines are present at all times

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if an op removes the last line's newline but that op is immediately followed by another op that restores the newline? For example:

const text = 'x\n';
const lines = [text];
const cs = '=1|1-1|1+1$\n'; // Keep a char, then remove a newline, then add a newline.
Changeset.mutateTextLines(cs, lines);
assert.equal(lines.join(''), 'x\n');
assert.equal(Changeset.applyToText(cs, text), 'x\n');

Similarly, what if an op adds text to the end of the document without a trailing newline, then the next op adds the newline:

const text = 'x\n';
const lines = [text];
const cs = '|1=2+1|1+1$y\n'; // Keep existing text, add a char, then add a newline.
Changeset.mutateTextLines(cs, lines);
assert.equal(lines.join(''), 'x\ny\n');
assert.equal(Changeset.applyToText(cs, text), 'x\ny\n');

If we force newlines to always be present in each entry in this._lines then we're going to end up inserting extra newlines when we don't want them. I think the way to fix this is to strip the newlines from this._lines and add them back before returning.

this._curSplice.push(theLine.substring(lineCol));
this._curCol = 0; // TODO(doc) why is this not set to the length of last line?
const remaining = theLine.substring(lineCol);
if (remaining !== '') this._curSplice.push(remaining);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If remaining === '', then that means that this._curCol points just past the line's newline, correct? If so, I think the better fix would be to set this._curCol to 0 and increment this._curLine before doing anything else.

Also, this._curCol pointing just past the newline feels like a violation of an invariant. Maybe instead of checking if remaining !== '', it should assert that and we should fix whatever logic is causing this._curCol to point past the newline.

Copy link
Member Author

@webzwo0i webzwo0i Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not necessarly a newline at every line. Etherpad uses it with lines that always end with newlines (e.g. rep.lines), except for mutateAttributionLines, but this has other logic.
Also take a look at the test case, this is valid imo.
Update: I do think now, that it should end with a newline now. So when it comes to "valid" ignore the newline for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand what you wrote in the first paragraph, though. If there is a newline, we would never go "beyond" that. Let me test

@webzwo0i
Copy link
Member Author

webzwo0i commented Dec 2, 2021

OK, I think I ensure that newlines are everywhere first. It's possible that I didn't saw missing newlines because it's dealing with edits at the end of the pad.

@rhansen
Copy link
Member

rhansen commented Dec 14, 2021

Any update @webzwo0i?

@webzwo0i
Copy link
Member Author

webzwo0i commented Dec 28, 2021

Will have time in the next few days. I'll probably tackle curCol/curLine behavior first, including asserting the invariants. For newlines behavior I need to take a look at mutateAttributionLines

webzwo0i and others added 6 commits December 31, 2021 22:55
...in case we are in the middle of a line (curCol !== 0) and we remove
everything up to the end of the line.

Before this commit a string 'undefined' could make it into the splice.
The new insertions are directly pushed to curSplice now instead of
trying to incorporate an undefined line into the splice, which would
result in an error: TypeError: Cannot read property 'substring' of
undefined
In such cases the remaining part of the old line is directly pushed to
the splice but we need to ensure it is not an empty string.
Before this commit an empty string was pushed to the splice.
@rhansen
Copy link
Member

rhansen commented Jan 1, 2022

I rebased onto latest develop and pushed a couple of fixup commits.

@stale
Copy link

stale bot commented Apr 25, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix Wont Fix these things, no hate. label Apr 25, 2022
@LeaChemoul
Copy link

Any update on thoses fixup commits ?

@stale stale bot removed the wontfix Wont Fix these things, no hate. label Apr 28, 2022
@stale
Copy link

stale bot commented Jun 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix Wont Fix these things, no hate. label Jun 27, 2022
@stale stale bot closed this Jul 10, 2022
@rhansen rhansen removed the wontfix Wont Fix these things, no hate. label Jul 16, 2022
@rhansen rhansen reopened this Jul 16, 2022
@JohnMcLear
Copy link
Member

Bump @webzwo0i -- any progress?

@JohnMcLear
Copy link
Member

@dependabot rebase

@github-actions github-actions bot added the Stale No recent activity label Sep 1, 2023
@github-actions github-actions bot removed the Stale No recent activity label Dec 20, 2023
@github-actions github-actions bot added the Stale No recent activity label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

This .etherpad contents breaks timeslider playback
4 participants