Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module:Jpan-sortkey edit war broke module and blocks kaikki.org from regenerating. #238

Open
kristian-clausal opened this issue Apr 25, 2023 · 15 comments

Comments

@kristian-clausal
Copy link
Collaborator

https://en.wiktionary.org/w/index.php?title=Module:Jpan-sortkey&action=history

This is creating 640k Lua errors when generating kaikki.org, which is way over the normal allowable Lua error threshold.

Seeing as how the module is closed to editing, I assume it will be left like this for a long time. There are some ways to override files on the command-line, but this might need a more permanent solution, like a shadow directory in parallel to the usual pages directory containing "temporary" override files.

@kristian-clausal
Copy link
Collaborator Author

Committed possible fixed to wiktextract and kaikki's repo.

Changed how the wiktwords --override parameter handles strings: if a path given to it is a directory, it scans that directory for files and adds those. Not that different from the original, but this means we can have a persistent overrides/ directory in wiktextract that can contain module files that cause problems for parsing en.wiktionary by default.

The change for kaikki's regen script was just to add that --override in there to take advantage of the new "default override directory".

Anyhow, Jpan-sortkey's old version has been copied into the repo and should hopefully work for the next regen.

@kristian-clausal
Copy link
Collaborator Author

Overriding the module seems to have worked out and now Lua errors have dropped down to 12k, under the allowable threshold. Most of the rest of the Lua errors are also Japanese modules edited by the same user as Jpan-sortkey. Closing as ""fixed"".

@xxyzz
Copy link
Collaborator

xxyzz commented Apr 28, 2023

I think this error happens because the extensionTag() function doesn't return the same value as MediaWiki. The Lua code mw.getCurrentFrame():extensionTag('nowiki', '') should return a strip marker like this "`UNIQ--nowiki-00000001-QINU`"

xxyzz added a commit to xxyzz/wikitextprocessor that referenced this issue Apr 28, 2023
MediaWiki returns strip
marker(https://www.mediawiki.org/wiki/Strip_marker) in Lua code,
should fix error in Jpan-sortkey model(tatuylonen/wiktextract#238).
@xxyzz
Copy link
Collaborator

xxyzz commented Apr 28, 2023

I pushed a commit which should fix this error but I got lots of the following error instead:

閣/Japanese/kanji: ERROR: LUA error in #invoke ('ja-kanji-readings', 'show') parent ('Template:ja-readings', {'goon': 'かく', 'kanon': 'かく', 'kun': 'かんぬき, たな-, たかどの-', 'nanori': ''}) at ['閣', 'ja-readings', '#invoke']
[string "Module:ja-translit"]:16: attempt to index a nil value (global 'package')
stack traceback:
        [string "Module:ja-translit"]:16: in upvalue 'get_data'
        [string "Module:ja-translit"]:44: in function 'Module:ja-translit.kana_to_romaji'
        [string "Module:ja-kanji-readings"]:275: in function 'Module:ja-kanji-readings.show'
        [C]: in function 'xpcall'
        [string "_sandbox_phase2"]:219: in function <[string "_sandbox_phase2"]:140>

But I don't think they're related. How come package is nil?

@kristian-clausal
Copy link
Collaborator Author

These are different errors in different modules created by the same editor (Huhu9000). I wouldn't bother touching these until the editor war ends. These are probably not our fault, seeing as how they popped up just now.

@xxyzz
Copy link
Collaborator

xxyzz commented Apr 28, 2023

But his code in Module:Jpan-sortkey doesn't raise error in MediaWiki's environment...

@kristian-clausal
Copy link
Collaborator Author

Then we've found new errors. I would still wait to see until the situation calms down.

xxyzz added a commit to xxyzz/wikitextprocessor that referenced this issue Apr 28, 2023
MediaWiki returns strip
marker(https://www.mediawiki.org/wiki/Strip_marker) in Lua code,
should fix error in Jpan-sortkey model(tatuylonen/wiktextract#238).
@kristian-clausal
Copy link
Collaborator Author

After you test to see whether generating the strip-markers mess up anything make a pull request and I'll merge it next week.

Prediction: we haven't been generating wikitext strip-markers (we have our own system) and doing so will break something and generate some garbage data, hopefully just strip-marker strings. Thing is, you'd need to test this on the whole corpus to see every place it could conceivably slip through, so it needs to be done on kaikki, or @xxyzz you could try out testing the whole generation process on the dev machine.

@kristian-clausal
Copy link
Collaborator Author

There are some discussions about this Module on Wiktionary :

* https://en.wiktionary.org/w/index.php?title=Module_talk%3AJpan-sortkey#What_is_this_supposed_to_do_%3F

* https://en.wiktionary.org/w/index.php?title=User_talk%3AHuhu9001#Please_explain_one_of_your_Module_modification

The situation over there is just devolving more and more. Yeah, I think we will have to wait. Using strip-markers like this apparently is definitely not kosher.

@kristian-clausal
Copy link
Collaborator Author

Added more old versions of Japanese modules that break kaikki again. Conforming to the hacks done with the strip-markers (though it would be "correct") is such a huge hassle that I am willing to wait until the situation calms down on wiktionary and we're not trying to hit a moving target. Best case scenario, the Scribunto devs shut down this path completely and we don't have to worry about it.

@kristian-clausal
Copy link
Collaborator Author

Module:Jpan-sortkey has seen some activity removing the stripmarker hack, so I'm removing the files from override/ in the hopes that the 20230520 dump is mostly functional by now.

@kristian-clausal
Copy link
Collaborator Author

Everything seems OK right now, so closing this issue.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 23, 2024

The strip marker Lua error happens again at here: https://en.wiktionary.org/wiki/Module:utilities#L-152

Only return the strip marker from frame:extensionTag for empty nowiki tag shouldn't break other tests.

Update: also need to return strip marker from frame:preprocess.

xxyzz added a commit to xxyzz/wikitextprocessor that referenced this issue Jan 23, 2024
Not full implementation of the MediaWiki API, just minimal changes to
fix Lua errors in en edition Module:utilities.

GitHub issue tatuylonen/wiktextract#238
@xxyzz
Copy link
Collaborator

xxyzz commented Jan 25, 2024

There is no discussion on en.wiktionary.org about using strip marker like the last time, looks like they won't change the Lua code for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants