New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshooting minecraftwiki_zh_all recipe #1995
Comments
The following description is mostly taken from my comment. In v1.13.0 (I will test git main later), MWoffliner accepts three different APIs:
|
I am currently testing git main. @kelson42 switched to another scraper running git main. However, it failed because the arguments between v1.13.0 and git main differ. To fix this:
The next issue I encountered after fixing this was:
I modified
|
Finally, I find out the cause of the issue: same as before. Lines 413 to 428 in ad5dc1d
Since the logic of retrieving main page remains unchanged, we still have to modify the code to make it work. mwoffliner/src/mwoffliner.lib.ts Lines 203 to 204 in ad5dc1d
mwoffliner/src/mwoffliner.lib.ts Line 609 in ad5dc1d
In regular cases:
However, in this situation, @kelson42 Could you please create a pull request? (The solution is at the end of my first comment.) Update: Checking API capabilities is no longer a problem in git main, since Line 162 in ad5dc1d
|
@TripleCamera Thank you! I will have a look in rhe next days to your analysis. |
@kelson42 How is everything going? |
I fixed the main page issue and started a scrape on my machine. Two problems arose:
|
@TripleCamera Sorry for not coming back to you earlier, not lack of interest, but lack of time. Plan to look to your ticket in detail this WE. |
Thank you! After fixing the issues mentioned above, the scraper was running smoothly. However, I had to stop it because I don't have a lot of time either. It is estimated to finish in 5 hours (using the config below). Here is a list of things I have done so far:
Could you please apply these changes and relaunch the scraper? Next I have to rely on openZIM's scraper. |
Any progress so far? |
Great, Kelson is back. It seems that this task can move forward a little bit more. 😊 Update: @kelson42 Hello? |
@kelson42 Hi. Have you been busy recently? Maybe you can assign this task to your colleagues (if they are free). |
Hi. I just created a pull request which contains the patch. Can someone review & merge it? @kelson42 |
Note: This is only tested on MWoffliner v1.13.0 (since all openZIM scrapers are using this version). Both the code and the config between v1.13.0 and git main differs a lot. So this needs to be tested on git main.
The following description is mostly taken from my comment when troubleshooting the scrape for Minecraft Wiki (zh) (openzim/zim-requests#755).
The scraper reports
Unable to find appropriate API end-point to retrieve article HTML
when scraping Minecraft Wiki (zh). Here is a code analysis of MWoffliner v1.13.0.Before the scrape starts, MWoffliner checks mobile REST API, desktop REST API, and VE REST API capabilities for a specific page (parameter
testArticleId
) inDownloader.checkCapabilities
:mwoffliner/src/Downloader.ts
Lines 243 to 263 in e9d4113
The default value
MediaWiki:Sidebar
is never used because the value ofmwMetaData.mainPage
is passed:mwoffliner/src/mwoffliner.lib.ts
Line 206 in e9d4113
The value of
mwMetaData.mainPage
comes from API. The base URL is stripped and its last part is taken. (This is a bad idea because different wikis have different URL rewrites.)mwoffliner/src/MediaWiki.ts
Lines 290 to 325 in e9d4113
mwoffliner/src/MediaWiki.ts
Lines 235 to 279 in e9d4113
This works for many wikis like English Wikipedia, but not for Chinese Minecraft Wiki. The reason is that MCW-zh has URL rewrite:
There are two ways to fix this:
mwMetaData.mainPage
toentries.mainpage
, which is already included in the API result. (MediaWiki documentation)Downloader.checkCapabilities
:I have tested both, and both worked.
The text was updated successfully, but these errors were encountered: