Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce MediaWiki Parsoid API to render articles #1899

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

VadimKovalenkoSNF
Copy link
Collaborator

@VadimKovalenkoSNF VadimKovalenkoSNF commented Sep 4, 2023

@kelson42
Copy link
Collaborator

kelson42 commented Sep 6, 2023

@VadimKovalenkoSNF I'm a bit lost, can you please tell:

  • Which MWOffliner ticket it fixes
  • What the approaches to fix it

Any chance to get that completed today?

@VadimKovalenkoSNF
Copy link
Collaborator Author

@kelson42 , I haven't noticed dedicated ticket for mwoffliner. This patch mostly replicates functionality from #1846 but on top of recent changes. In fact, it solves both problems - reduces traffic to MW infrastructure + allows mwoffliner to avoid redundant request to get modules per article if Wiki supports action API with parsoid=1 parameter. Do you want me to open the ticket here as well? This can be fixed today, I need to take a look at some broken tests. Note, that I still didn't get the answer to https://phabricator.wikimedia.org/T324866#9139172 but my solution seems to work without useparsoid.

@kelson42
Copy link
Collaborator

kelson42 commented Sep 6, 2023

No real need to open an issue our side... glad if i can review tge PR soon.

@VadimKovalenkoSNF
Copy link
Collaborator Author

Any chance to get that completed today?

Update: I've noticed that parsoid API has troubles with media treatment, and probably other issues in the output.
Compare the example of https://en.m.wikipedia.org/wiki/User:Kelson/MWoffliner_CI_reference on the screenshot below.
On the left side is the output of WikimediaDesktop and on the right side - MediaWiki Parsoid. Debugging these treatments might take some time, at least I can try to figure out how to enable missing media in the gallery, etc.

media-content-user-page

@kelson42
Copy link
Collaborator

kelson42 commented Sep 6, 2023

@VadimKovalenkoSNF can you please explain in description the principle of your PR because i don't get it. It shoukd be onkt about adding "parsoid" to an url... and now we talk about something very different.

@VadimKovalenkoSNF
Copy link
Collaborator Author

@kelson42 This PR introduces new renderer based on parsoid=1 in the MediaWiki Action API. Instead of WikimediaDesktop that represented by this example https://en.wikipedia.org/api/rest_v1/page/html/Foobar, mwoffliner will query this endpoint:
https://en.m.wikipedia.org/w/api.php?action=parse&format=json&prop=text%7Cmodules%7Cjsconfigvars%7Cheadhtml&parsoid=1&page=Foobar

As you can see, it has text property with article HTML as well as headhtml, modules, modulescripts, modulestyles and jsconfigvars. Having these properties in the single response will prevent mwoffliner from triggering additional request to get modules.

The problem I denoted is about different article html from MediaWiki Action API and WikimediaDesktop response which in result will lead to the different output (missing media, etc)

@kelson42
Copy link
Collaborator

@VadimKovalenkoSNF Needs to be rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants