New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) #2107
Comments
Try the latest Develop branch and let us know if it still occurs |
This is not trivial as it occurs by chance on the prod server with many users and I cannot reproduce it. |
yes please |
I will get the db copied to dev next week. Will report as soon as possible. Thank you for the fast response. |
Unfortunately moving to develop has not repaired the damaged pads. Interestingly, moving servers (simple sql export/import) has "repaired" one of the pads. It works on the new server (even on 1.3.0) but other damages pads don't. Still the same error. This is a really strange bug and pads seem to sometimes just "self-repair" even on PROD and with no change whatsoever from us. |
In theory this problem shouldn't occur at all as the bad data should never find it's way in now.. |
I can leave this open and if it occurs on fresh data we can try to resolve it.. |
What do you mean with "fresh data" ? Do I need to put PROD on develop and try to get new broken pads? |
Hrm, that'd be painful potentially.. Maybe wait for 1.4 which should be out in a few weeks max. |
We might do that. Thanks a lot. |
I also have that problem, with the same strange randomness. I updated to etherpad-lite 1.4 and it is still there. URL for one of the pads http://etherpad.usabilidoido.com.br:8080/p/07318a9b2b5666637d870fc50656323620af4df4 |
I am right now in the process of upgrading to 1.4 and hopefully going live with the new version, so I can check if new defects come up in fresh pads after running on the new version. @usabilidoido you can tell your users to export-import the pad. You can access controls by adding /timeslider to the url like this: http://etherpad.usabilidoido.com.br:8080/p/07318a9b2b5666637d870fc50656323620af4df4/timeslider Export as html and then import into a new pad. |
I am experiencing this issue in 1.4. On broken pads, command line shows Hat tip to @Ra1d3n for the /timeslider workaround. Glad to see pad content is still accessible there. |
This is just a hunch, but if you like, try with the experimental |
I have upgraded everything to 1.4. and yesterday we had a broken pad again. Might still be one that broke before update, but I am not sure. I will keep on looking. @marcelklehr Unfortunately, I can't move production server to a dangerous version. And I can't mirror requests to a secondary server because I don't have the resources. :-/ |
Sorry, I wasn't clear: Don't run I removed |
@marcelklehr I just did, and unfortunately it did not help. Process:
I confirmed the change is actually in the file system. (comment and new line are there) Error is still the same. I did get the error that @ericpedia mentioned, and it did not occur on pads other than the corrupted one. console on server:
When I kill the server the message is displayed in the client, also: I might be able to give you an SQL import of the broken pad, but you would need to tell me what exactly you need extracted from the database first. :-) |
The actual pad content would be very helpful, so that'd be db key |
I think I'm getting closer: checkPad says:
Mismatched apply in all cases means that the changeset in question expected one character less than actually present (so, one unexpected character). After examining the db dump, I found out that all of these changesets follow a revision with an atext attached to it (not all revisions include the full pad contents, but instead just the delta, however, every so often for performance reasons the full contents are attached). Presumably, something went awry with this meta property either upon storing the revision or upon creating the current atext for a new pad client. |
I just rewrote checkPad to use the computed atext instead of the cached one and the pad passes. Even the cached atexts are correct! This makes me think there's a bug in some algorithm that computes the full atext for the client_vars! |
Nice work so far Marcel. Sounds like you are close to solving this. |
Ah, it's not the client_vars generator. The algorithm responsible for creating the cached atext is broken. @Ra1d3n As a quickfix, you can revive your broken pads by deleting the atext meta property from all key revisions (where |
@marcelklehr Will that make EP rebuild the pad from revision 0 on load? I guess I should not go around deleting all atext properties from all revisions then... :-) Any hope for a "regenerate Pad" script that rebuilds key revisions with a correct algorithm? |
I've made some changes to the checkPad-script. See here: #1653 |
Cool, I look forward to it. Right now we get only about one broken pad a week, so I am inclined to wait for a proper fix. :-) Thanks. |
Yea, I was thinking of a script. |
So, the diff between the computed atext and the cached one shows that: "концертов" somehow got turned into "ко��цертов". In another revision, "з" was turned into "��" as well... I'm not sure what this is about. These chars are in the Unicode BMP, afaik, so I don't know what happened to them. |
The mutations in @Ra1d3n's pad occur in key revisions 4900, 11400, 12300 and 13600 (every 100th rev is a key rev, meaning it'll cache the full pad contents). Also, the AttributedText stored in the pad record is corrupt, too. All other key revisions are fine. (analyzed with this script) The chars seem offly random. Nothing sticks out, really. I don't see a pattern. I suspect there's something going wrong when storing the AttributedText in the database. Since sometimes the pad recovers in between broken key revisions, I'm guessing that the atext stored in memory is good. Sometimes, when it is stored in the db it somehow gets corrupted, though. If authors continue to edit the document until the next key revision is created and that revision doesn't end up getting corrupted, then nobody will notice anything. |
@marcelklehr We had etherpad crash and recover a few times because of a plugin that was not well coded, could this be the cause? Strange that it's just one letter that got corrupted every time. |
You'd be surprised just how many editors (and very popular ones with developers) have a similar experience to Etherpad tho. Playing around today I had some crazy experiences. |
From ether#3717 (comment) > Afaik I used async / await that's pretty much all, I think I had to do some > polish because something was broken, remember stuff like pad.getPadAuthors was > b0rked in 1.7 or so Fixes ether#2107.
Hi, we are having a similar issue with one of our Pads. |
@gnd do you have a public instance? Can you hit the padId/export/etherpad url and get the .etherpad file? Are you running latest develop? What's your database backend? So many questions, please provide as much details as possible |
@JohnMcLear Im available today to help you with any sort of debugging you might want to do. I have already tried the last version of checkPadDeltas, however it just hangs for hours after start. This is the only output it gives:
|
Dude, the error is in your log!
See: #3959 |
@JohnMcLear
While the store table has
So should i convert using ? |
The misconfiguration was twofold, the database was using utf8 and utf8_general_ci, but also in the settings.json the charset for the database was set as "utf8". Having fixed that all to utf8mb4 still didnt help, and the pad in question doesnt load, and the checkPadDeltas still hangs:
|
@gnd It's a GiGo problem. Once you have garbage in, it can't be changed. Now all you know is the problem wont appear in the future! |
Wouldn't |
Oh hi @caugner - sadly no, repairPad.js generally sucks and doesn't really work. https://github.com/ether/etherpad-lite/blob/develop/bin/repairPad.js#L48 The best thing I can suggest is to pull the atext/text out of the pad and bring it into a new pad. @gnd I can write you a script to test to try and get the text if you want?
|
@JohnMcLear that would be quite helpful indeed ) |
ExtractingUse The text is the Parsing
To write the Pad text to a text file Now you have the pad text you can just put that in a text file and import or or you setText API or whatever... Lemme know if extraction fails and I will consider another approach. |
The extraction is running, however it is quite slow. In the file CareCicle.db I see the latest line at revs:80, while the script already runs for 20m. The pad in questions has over 12k revisions.. |
Oh man, that sucks.. I guess it can't build the |
the last suggestion would be a big one, to dump the entire db and send it to me and then I can write a script to parse out what you need. Alternatively I can try to write a script here but there might be some back & forth to get it working that way. |
Hi @JohnMcLear, the script has finally finished. I have no idea why it took so long (almost 40 hrs). Anyway, when looking into it, it seems to me, the whole exercise can be done by selecting the highest revision which is divisible by 100 from the store table and extracting the text from it ? In the future ill do this by hand :) Thanks a lot for your help |
Exactly this, but I often get told off by our users when I make the assumption they can perform database queries so I try to avoid it. I think I know why it took so long btw, are you using MySQL @ Etherpad 1.8.3 ? |
I'm using the latest master from git (not sure which version that is) |
Assuming MySQL it's a known bug that we're due to have the patch land today. |
yes sorry, its latest MariaDB - 10.3.22-MariaDB |
@JohnMcLear im sorry to spam this ticket, but do you have an issue open for the MySQL patch you mentioned ? I want to see if our performance troubles with etherpad might be resolved by it.. thanks |
No but just do npm install ueberdb@0.4.9 to fix |
Btw the new logic for storing additional atext is in so this should be closed but if people experience an issue please do create a new issue and refer to this one. I want to deal with each individual cause of problem case-by-case with the main goal to create automated logic to restore a pad upon detected corruption in real time. That's the dream as corruption is inevitable. |
This is a message for people getting to this recently (when upgrading from older versions of etherpad). Today I upgraded an etherpad service from I had problems with one pad, the checkers (checkPad, checkAllPads, etc.) failed to detect it (or I don't know how to run node fine, anyway). I verified the
for case https://pad.example.com/p/my-broken-pad I did:
and it worked again 🎉 🦄 ✨ this solution was above (I put a +1 on previous messages with the solution to help find it), but I wanted to have it more clear |
I guess one thing we could do here is check for ???? in pad contents and provide a warning that includes a suggested solution. @pedro-nonfree please could you submit a patch to checkPad.js or something then I'd happily merge that :) |
This error occured with one single pad on an instance that was never upgraded and has been pinned to version 1.8.6 since initial deployment today. I fixed the issue, however I don't know what actually helped. First I tried the SQL query, that seemed not to help. Then I set the charset as an env variable on my kubernetes deployment, which redeployed the pod. I can't say if it was the charset or the SQL query in combination with the redeploy, but it's fixed now. |
Hey guys. We are using stable and have the problem that some pads randomly stop working and throw an uncaught error in the console.
Example:
When this happens, the "loading" overlay blocks any action. It's unlikely to be a copy&paste issue because it sometimes happens to entirely handwritten pads.
An interesting thing is, that the timeslider (opened by appending /timeslider to the url) always works without problems.
Right now we are manually fixing the pads by exporting+importing with HTML (losing all changesets). Any idea whats wrong?
The text was updated successfully, but these errors were encountered: