Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) #2107

Closed
SkoricIT opened this issue Mar 14, 2014 · 95 comments
Closed

Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) #2107

SkoricIT opened this issue Mar 14, 2014 · 95 comments

Comments

@SkoricIT
Copy link

Hey guys. We are using stable and have the problem that some pads randomly stop working and throw an uncaught error in the console.

Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) 

Example:

https://etherpad.tugraz.at/p/l3tsbet

When this happens, the "loading" overlay blocks any action. It's unlikely to be a copy&paste issue because it sometimes happens to entirely handwritten pads.

An interesting thing is, that the timeslider (opened by appending /timeslider to the url) always works without problems.

https://etherpad.tugraz.at/p/l3tsbet/timeslider

Right now we are manually fixing the pads by exporting+importing with HTML (losing all changesets). Any idea whats wrong?

@JohnMcLear
Copy link
Member

Try the latest Develop branch and let us know if it still occurs

@SkoricIT
Copy link
Author

This is not trivial as it occurs by chance on the prod server with many users and I cannot reproduce it.
I can only test if broken pads stay broken on develop, if that helps.

@JohnMcLear
Copy link
Member

yes please

@SkoricIT
Copy link
Author

I will get the db copied to dev next week. Will report as soon as possible. Thank you for the fast response.

@SkoricIT
Copy link
Author

Unfortunately moving to develop has not repaired the damaged pads. Interestingly, moving servers (simple sql export/import) has "repaired" one of the pads. It works on the new server (even on 1.3.0) but other damages pads don't. Still the same error.

This is a really strange bug and pads seem to sometimes just "self-repair" even on PROD and with no change whatsoever from us.

@JohnMcLear
Copy link
Member

In theory this problem shouldn't occur at all as the bad data should never find it's way in now..

@JohnMcLear
Copy link
Member

I can leave this open and if it occurs on fresh data we can try to resolve it..

@SkoricIT
Copy link
Author

What do you mean with "fresh data" ? Do I need to put PROD on develop and try to get new broken pads?

@JohnMcLear
Copy link
Member

Hrm, that'd be painful potentially.. Maybe wait for 1.4 which should be out in a few weeks max.

@SkoricIT
Copy link
Author

We might do that. Thanks a lot.

@usabilidoido
Copy link

I also have that problem, with the same strange randomness. I updated to etherpad-lite 1.4 and it is still there. URL for one of the pads http://etherpad.usabilidoido.com.br:8080/p/07318a9b2b5666637d870fc50656323620af4df4

@SkoricIT
Copy link
Author

I am right now in the process of upgrading to 1.4 and hopefully going live with the new version, so I can check if new defects come up in fresh pads after running on the new version.

@usabilidoido you can tell your users to export-import the pad. You can access controls by adding /timeslider to the url like this: http://etherpad.usabilidoido.com.br:8080/p/07318a9b2b5666637d870fc50656323620af4df4/timeslider

Export as html and then import into a new pad.

@ericpedia
Copy link

I am experiencing this issue in 1.4. On broken pads, command line shows [WARN] client - Uncaught TypeError: Cannot read property 'length' of undefined when browser shows the Failed assertion error.

Hat tip to @Ra1d3n for the /timeslider workaround. Glad to see pad content is still accessible there.

marcelklehr added a commit that referenced this issue Apr 25, 2014
@marcelklehr
Copy link
Contributor

This is just a hunch, but if you like, try with the experimental try/client-init-remove-checkRep branch I just created. (This is generally a dangerous thing to do, but it's worth a try, I think.)

@SkoricIT
Copy link
Author

SkoricIT commented May 9, 2014

I have upgraded everything to 1.4. and yesterday we had a broken pad again. Might still be one that broke before update, but I am not sure. I will keep on looking.

@marcelklehr Unfortunately, I can't move production server to a dangerous version. And I can't mirror requests to a secondary server because I don't have the resources. :-/

@marcelklehr
Copy link
Contributor

Sorry, I wasn't clear: Don't run try/client-init-remove-checkRep in production, but try to access the broken pads with etherpad running on that branch.

I removed checkRep in that branch, because I suspect that normalization may not work correctly in some cases. So, when a broken pad works on that branch without any problems at all, we've got to revisit this method.

@SkoricIT
Copy link
Author

SkoricIT commented May 9, 2014

@marcelklehr I just did, and unfortunately it did not help.

Process:

git fetch
git checkout try/client-init-remove-checkRep
git status
On branch try/client-init-remove-checkRep
Your branch is up-to-date with 'origin/try/client-init-remove-checkRep'.

I confirmed the change is actually in the file system. (comment and new line are there) Error is still the same.

asd

I did get the error that @ericpedia mentioned, and it did not occur on pads other than the corrupted one.

console on server:

[2014-05-09 16:55:39.152] [WARN] client - Uncaught TypeError: Cannot read property 'length' of undefined -- { errorId: 'dTtndCRA5gonLZyvMlqw',
  msg: 'Uncaught TypeError: Cannot read property \'length\' of undefined',
  url: 'http://localhost:9001/p/OkTJWMYVNs',
  linenumber: 15,
  userAgent: 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36' }

When I kill the server the message is displayed in the client, also:

asd

I might be able to give you an SQL import of the broken pad, but you would need to tell me what exactly you need extracted from the database first. :-)

@marcelklehr
Copy link
Contributor

The actual pad content would be very helpful, so that'd be db key pad:<PADID> (http://etherpad.org/doc/v1.4.0/#index_pad_padid) and perhaps a few of the last revisions.

@marcelklehr
Copy link
Contributor

I think I'm getting closer:

checkPad says:

$ bin/checkPad <padID>
[WARN] console - DirtyDB is used. This is fine for testing but not recommended for production.
[ERROR] console - Bad changeset at revision 4901 - Failed assertion: mismatched apply: 11636 / 11635
[ERROR] console - Bad changeset at revision 11401 - Failed assertion: mismatched apply: 42094 / 42093
[ERROR] console - Bad changeset at revision 12301 - Failed assertion: mismatched apply: 48875 / 48874
[ERROR] console - Bad changeset at revision 13601 - Failed assertion: mismatched apply: 60227 / 60226
[INFO] console - finished

Mismatched apply in all cases means that the changeset in question expected one character less than actually present (so, one unexpected character). After examining the db dump, I found out that all of these changesets follow a revision with an atext attached to it (not all revisions include the full pad contents, but instead just the delta, however, every so often for performance reasons the full contents are attached).

Presumably, something went awry with this meta property either upon storing the revision or upon creating the current atext for a new pad client.

@marcelklehr
Copy link
Contributor

I just rewrote checkPad to use the computed atext instead of the cached one and the pad passes. Even the cached atexts are correct! This makes me think there's a bug in some algorithm that computes the full atext for the client_vars!

@SkoricIT
Copy link
Author

Nice work so far Marcel. Sounds like you are close to solving this.

@marcelklehr
Copy link
Contributor

Ah, it's not the client_vars generator. The algorithm responsible for creating the cached atext is broken.

@Ra1d3n As a quickfix, you can revive your broken pads by deleting the atext meta property from all key revisions (where revNo % 100 == 0). "Re-inserting" all records related to a broken pad was reported as a fix for similar problems a long time ago -- now we know why it works :)

@SkoricIT
Copy link
Author

@marcelklehr Will that make EP rebuild the pad from revision 0 on load? I guess I should not go around deleting all atext properties from all revisions then... :-) Any hope for a "regenerate Pad" script that rebuilds key revisions with a correct algorithm?

@Gared
Copy link
Member

Gared commented May 13, 2014

I've made some changes to the checkPad-script. See here: #1653
But this is more a dirty hack. If my script will resolve this problem I will write a script to fix pads

@SkoricIT
Copy link
Author

Cool, I look forward to it. Right now we get only about one broken pad a week, so I am inclined to wait for a proper fix. :-) Thanks.

@marcelklehr
Copy link
Contributor

Yea, I was thinking of a script.

@marcelklehr
Copy link
Contributor

So, the diff between the computed atext and the cached one shows that: "концертов" somehow got turned into "ко��цертов". In another revision, "з" was turned into "��" as well...

I'm not sure what this is about. These chars are in the Unicode BMP, afaik, so I don't know what happened to them.

@marcelklehr
Copy link
Contributor

The mutations in @Ra1d3n's pad occur in key revisions 4900, 11400, 12300 and 13600 (every 100th rev is a key rev, meaning it'll cache the full pad contents). Also, the AttributedText stored in the pad record is corrupt, too. All other key revisions are fine. (analyzed with this script)

The chars seem offly random. Nothing sticks out, really. I don't see a pattern.

I suspect there's something going wrong when storing the AttributedText in the database. Since sometimes the pad recovers in between broken key revisions, I'm guessing that the atext stored in memory is good. Sometimes, when it is stored in the db it somehow gets corrupted, though. If authors continue to edit the document until the next key revision is created and that revision doesn't end up getting corrupted, then nobody will notice anything.
However, if the server shuts down before managing to successfully save the valid AText held in memory to the database, the pad will be broken on the next server start.

@SkoricIT
Copy link
Author

@marcelklehr We had etherpad crash and recover a few times because of a plugin that was not well coded, could this be the cause? Strange that it's just one letter that got corrupted every time.

@JohnMcLear
Copy link
Member

You'd be surprised just how many editors (and very popular ones with developers) have a similar experience to Etherpad tho. Playing around today I had some crazy experiences.

muxator pushed a commit to JohnMcLear/etherpad-lite that referenced this issue Mar 27, 2020
From ether#3717 (comment)

> Afaik I used async / await that's pretty much all, I think I had to do some
> polish because something was broken, remember stuff like pad.getPadAuthors was
> b0rked in 1.7 or so

Fixes ether#2107.
@muxator
Copy link
Contributor

muxator commented Mar 27, 2020

I included an updated version of Check Pad Deltas that works, if people can give that a try to see if it helps when experiencing this problem I'd appreciate it.

Pulled in in the master branch with #3717 (14ae2ee).

@gnd
Copy link

gnd commented May 4, 2020

Hi, we are having a similar issue with one of our Pads.
@JohnMcLear unfortunately the latest version of checkPadDeltas did not help :/

@JohnMcLear
Copy link
Member

@gnd do you have a public instance?

Can you hit the padId/export/etherpad url and get the .etherpad file?

Are you running latest develop?

What's your database backend?

So many questions, please provide as much details as possible

@gnd
Copy link

gnd commented May 5, 2020

@JohnMcLear
Yes, its a public instance: https://pad.xpub.nl/p/CareCircle
Unfortunately i get a 502 Bad Gateway error trying to get the .etherpad file
We are running latest develop (git pull origin) on nodejs 12.16.3-1nodesource1, with the db backend being 10.3.22-MariaDB-0+deb10u1.

Im available today to help you with any sort of debugging you might want to do. I have already tried the last version of checkPadDeltas, however it just hangs for hours after start. This is the only output it gives:

All relative paths will be interpreted relative to the identified Etherpad base dir: /opt/etherpad
[2020-05-05 00:04:12.330] [DEBUG] AbsolutePaths - Relative path "settings.json" can be rewritten to "/opt/etherpad/settings.json"
[2020-05-05 00:04:12.346] [DEBUG] AbsolutePaths - Relative path "credentials.json" can be rewritten to "/opt/etherpad/credentials.json"
settings loaded from: /opt/etherpad/settings.json
No credentials file found in /opt/etherpad/credentials.json. Ignoring.
[2020-05-05 00:04:12.369] [INFO] console - Using skin "no-skin" in dir: /opt/etherpad/src/static/skins/no-skin
[2020-05-05 00:04:12.371] [INFO] console - Session key loaded from: /opt/etherpad/SESSIONKEY.txt
[2020-05-05 00:04:12.541] [ERROR] console - table is not configured with charset utf8 -- This may lead to crashes when certain characters are pasted in pads
[2020-05-05 00:04:12.543] [INFO] console - RowDataPacket { character_set_name: 'utf8mb4' } utf8

@JohnMcLear
Copy link
Member

Dude, the error is in your log!

[2020-05-05 00:04:12.541] [ERROR] console - table is not configured with charset utf8 -- This may lead to crashes when certain characters are pasted in pads
[2020-05-05 00:04:12.543] [INFO] console - RowDataPacket { character_set_name: 'utf8mb4' } utf8

See: #3959

@gnd
Copy link

gnd commented May 5, 2020

@JohnMcLear
our db has

+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8 | utf8_general_ci |
+----------------------------+------------------------+

While the store table has

+--------------------+
| character_set_name |
+--------------------+
| utf8mb4 |
+--------------------+

So should i convert using
ALTER DATABASE etherpad_lite_db CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

?

@gnd
Copy link

gnd commented May 5, 2020

@JohnMcLear

The misconfiguration was twofold, the database was using utf8 and utf8_general_ci, but also in the settings.json the charset for the database was set as "utf8". Having fixed that all to utf8mb4 still didnt help, and the pad in question doesnt load, and the checkPadDeltas still hangs:

All relative paths will be interpreted relative to the identified Etherpad base dir: /opt/etherpad
[2020-05-05 13:17:43.443] [DEBUG] AbsolutePaths - Relative path "settings.json" can be rewritten to "/opt/etherpad/settings.json"
[2020-05-05 13:17:43.444] [DEBUG] AbsolutePaths - Relative path "credentials.json" can be rewritten to "/opt/etherpad/credentials.json"
settings loaded from: /opt/etherpad/settings.json
No credentials file found in /opt/etherpad/credentials.json. Ignoring.
[2020-05-05 13:17:43.463] [INFO] console - Using skin "no-skin" in dir: /opt/etherpad/src/static/skins/no-skin
[2020-05-05 13:17:43.464] [INFO] console - Session key loaded from: /opt/etherpad/SESSIONKEY.txt

@JohnMcLear
Copy link
Member

@gnd It's a GiGo problem. Once you have garbage in, it can't be changed. Now all you know is the problem wont appear in the future!

@caugner
Copy link
Contributor

caugner commented May 5, 2020

@gnd It's a GiGo problem. Once you have garbage in, it can't be changed. Now all you know is the problem wont appear in the future!

Wouldn't repairPad.js be able fix these broken pads?

@JohnMcLear
Copy link
Member

Oh hi @caugner - sadly no, repairPad.js generally sucks and doesn't really work. https://github.com/ether/etherpad-lite/blob/develop/bin/repairPad.js#L48

The best thing I can suggest is to pull the atext/text out of the pad and bring it into a new pad.

@gnd I can write you a script to test to try and get the text if you want?

bin/extractPadData.js with a change to output to stdout might be sufficient here.. 2mins I will create an extractPadText.js

@gnd
Copy link

gnd commented May 5, 2020

@JohnMcLear that would be quite helpful indeed )

@JohnMcLear
Copy link
Member

JohnMcLear commented May 5, 2020

Extracting

Use node bin/extractPadData.js $padid
Then cat $padid.db | grep \"text\" | grep revNum | tail -1

The text is the val.atext.text item, you could json parse this at cli.. I will do that next if you need it.. For now do these commands making sure you replace $padid with your PadID

Parsing

sudo apt-get install jq to install jq then cat $padid.db | grep \"text\" | grep revNum | tail -1 | jq .val.atext.text to see just the text.

To write the Pad text to a text file cat $padid.db | grep \"text\" | grep revNum | tail -1 | jq .val.atext.text > $padid.txt

Now you have the pad text you can just put that in a text file and import or or you setText API or whatever...

Lemme know if extraction fails and I will consider another approach.

@gnd
Copy link

gnd commented May 5, 2020

The extraction is running, however it is quite slow. In the file CareCicle.db I see the latest line at revs:80, while the script already runs for 20m. The pad in questions has over 12k revisions..

@JohnMcLear
Copy link
Member

Oh man, that sucks.. I guess it can't build the pad object after 80 revisions.. It should only take 30 seconds or so for the script to run.

@JohnMcLear
Copy link
Member

the last suggestion would be a big one, to dump the entire db and send it to me and then I can write a script to parse out what you need. Alternatively I can try to write a script here but there might be some back & forth to get it working that way.

@gnd
Copy link

gnd commented May 7, 2020

Hi @JohnMcLear, the script has finally finished. I have no idea why it took so long (almost 40 hrs). Anyway, when looking into it, it seems to me, the whole exercise can be done by selecting the highest revision which is divisible by 100 from the store table and extracting the text from it ? In the future ill do this by hand :) Thanks a lot for your help

@JohnMcLear
Copy link
Member

Exactly this, but I often get told off by our users when I make the assumption they can perform database queries so I try to avoid it. I think I know why it took so long btw, are you using MySQL @ Etherpad 1.8.3 ?

@gnd
Copy link

gnd commented May 7, 2020

I'm using the latest master from git (not sure which version that is)

@JohnMcLear
Copy link
Member

Assuming MySQL it's a known bug that we're due to have the patch land today.

@gnd
Copy link

gnd commented May 7, 2020

yes sorry, its latest MariaDB - 10.3.22-MariaDB

@gnd
Copy link

gnd commented May 11, 2020

@JohnMcLear im sorry to spam this ticket, but do you have an issue open for the MySQL patch you mentioned ? I want to see if our performance troubles with etherpad might be resolved by it.. thanks

@JohnMcLear
Copy link
Member

No but just do npm install ueberdb@0.4.9 to fix

@JohnMcLear
Copy link
Member

Btw the new logic for storing additional atext is in so this should be closed but if people experience an issue please do create a new issue and refer to this one. I want to deal with each individual cause of problem case-by-case with the main goal to create automated logic to restore a pad upon detected corruption in real time. That's the dream as corruption is inevitable.

@pedro-nonfree
Copy link

pedro-nonfree commented Sep 22, 2020

This is a message for people getting to this recently (when upgrading from older versions of etherpad).

Today I upgraded an etherpad service from 1.6.3 to 1.8.6 (what a change!!!!! congratulations to all developers)

I had problems with one pad, the checkers (checkPad, checkAllPads, etc.) failed to detect it (or I don't know how to run node fine, anyway).

I verified the charset is utf8mb4 in my settings.json (saw last version in settings.json.template).

  "dbType" : "mysql",
  "dbSettings" : {
    "user":     "etherpaduser",
    "host":     "localhost",
    "port":     3306,
    "password": "PASSWORD",
    "database": "etherpad_lite_db",
    "charset":  "utf8mb4"
  },

for case https://pad.example.com/p/my-broken-pad I did:

mysql
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:my-broken-pad"

and it worked again 🎉 🦄 ✨

this solution was above (I put a +1 on previous messages with the solution to help find it), but I wanted to have it more clear

@JohnMcLear
Copy link
Member

I guess one thing we could do here is check for ???? in pad contents and provide a warning that includes a suggested solution. @pedro-nonfree please could you submit a patch to checkPad.js or something then I'd happily merge that :)

@InterFelix
Copy link

This error occured with one single pad on an instance that was never upgraded and has been pinned to version 1.8.6 since initial deployment today. I fixed the issue, however I don't know what actually helped. First I tried the SQL query, that seemed not to help. Then I set the charset as an env variable on my kubernetes deployment, which redeployed the pod. I can't say if it was the charset or the SQL query in combination with the redeploy, but it's fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests