Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve infobox keys #561

Open
deneb2 opened this issue Oct 13, 2023 · 2 comments
Open

resolve infobox keys #561

deneb2 opened this issue Oct 13, 2023 · 2 comments

Comments

@deneb2
Copy link

deneb2 commented Oct 13, 2023

I was trying the Toronto Raptors example.

The let fields = doc.infobox(0).json() returns a json where the infobox keys are exactly those in the wikitext (coach: { text: 'Darko Rajaković', links: [ [Object] ] },).
Also trying wtf-plugin-html it seems the rendered keys are the same capitalized and cleaned.

But looking at the wikipedia page the coach entry is actually Head Coach.
The step missing is the one that looks at the infobox template and adjust key names for rendering.

Is this feature missing or I did something wrong?

@spencermountain
Copy link
Owner

You know what, I find this very confusing too.
You can see this library simply grabs whatever's in the wikitext, and the names of the keys are speced in the template documentation.
image

But yeah, there seems to be a ton of formatting that is done by wikipedia at render-time. These formatting rules don't appear to be anywhere in wikipedia-land, and must be in the source-code of the parsoid Html renderer. I've looked around before, and come up with nothing. Please let me know if you can find where this logic is stored, and if it is available to be re-used in projects like this one.

yeah, as you found, the wtf-plugin-html doesn't do anything clever, but really should.
cheers

@einSelbst
Copy link

einSelbst commented Nov 5, 2023

out of curiosity I took a look at this and being a noob in all of this it seems to me that wikipedia is a cascade of templates in templates and the specifically mentioned key is coming from a "sub-template"

So on this page
https://en.wikipedia.org/w/index.php?title=Toronto_Raptors&action=edit

it says "Pages transcluded onto the current version of this page" and mentions the template for the infobox of basketball clubs:

https://en.wikipedia.org/w/index.php?title=Template:Infobox_basketball_club&action=edit

which is referenced in the 5th line of the template:

{{Infobox basketball club

have a look for

| label19 = Head coach{{#if:{{{coaches|}}}|es}}
| data19 = {{if empty|{{{coaches|}}}|{{{coach|}}}}}

HTH

PS: sorry if this was already clear as the template was already mentioned in the initial question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants