Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse templates in Wiktionary #88

Open
Nakilon opened this issue Sep 9, 2021 · 1 comment
Open

Parse templates in Wiktionary #88

Nakilon opened this issue Sep 9, 2021 · 1 comment

Comments

@Nakilon
Copy link

Nakilon commented Sep 9, 2021

I wonder how do I use it to get the Wiktionary data. For example an Etymology section for "Russian".

Infoboxer.wiktionary.get("Russian").sections.first.sections("Etymology").text

=> "{der:en|ML.|-} (11th century) {m:la|Russiānus}, the adjective of {m:la|Russia}, a Latinization of the {der:en|orv|Русь}. Attested in English (both as a noun and as an adjective) from the 16th century.\n\n"

How do I replace those templates like {der:en|ML.|-} with their real meaning to get:

Medieval Latin (11th century) Russiānus, the adjective of Russia, a Latinization of the Old East Slavic Русь (Rusĭ). Attested in English (both as a noun and as an adjective) from the 16th century.

@zverok
Copy link
Contributor

zverok commented Sep 14, 2021

There is (kind of) answer, but you wouldn't like it :(
The thing you want is called "template expansion", and Infoboxer can't do it by itself (it was meant for "template extraction" rather), so you'll need to call low-level API:

Infoboxer.wiktionary.api.expandtemplates.text('{{m|la|Russiānus}}').prop(:wikitext).response['wikitext']
# => <i class="Latn mention" lang="la">[[Russianus#Latin|Russiānus]]</i>

...unfortunately, to do so, you'll need the template source, and Infoboxer, somewhat dumbly, doesn't provide a way to do it. The best guess is to imitate it by recreating:

class Infoboxer::Tree::Template
  def source
    [
      "{{#{name}",
      *unnamed_variables.map(&:text),
      *named_variables.map { |v| "#{v.name}=#{v.text}"},
    ].join('|') + '}}'
  end
end

wiktionary = Infoboxer.wiktionary
section = wiktionary.get("Russian").sections.first.sections("Etymology")

section.templates.map(&:source).each { |t|
  puts t
  puts wiktionary.api.expandtemplates.text(t).prop(:wikitext).response['wikitext']
}

...this will print

{{der|en|ML.|-}}
<span class="etyl">[[w:Medieval Latin|Medieval Latin]][[Category:English terms derived from Medieval Latin|API]]</span>
{{m|la|Russiānus}}
<i class="Latn mention" lang="la">[[Russianus#Latin|Russiānus]]</i>
{{m|la|Russia}}
<i class="Latn mention" lang="la">[[Russia#Latin|Russia]]</i>
{{der|en|orv|Русь}}
<span class="etyl">[[w:Old East Slavic|Old East Slavic]][[Category:English terms derived from Old East Slavic|API]]</span> <i class="Cyrs mention" lang="orv">[[Русь#Old East Slavic|Русь]]</i> <span class="mention-gloss-paren annotation-paren">(</span><span lang="orv-Latn" class="mention-tr tr Latn">Rusĭ</span><span class="mention-gloss-paren annotation-paren">)</span>

...but, unfortunately again, in extracting readable text from it you are on your own mostly. Though, Infoboxer's parser can provide a bit of help:

section.templates.map(&:source).each { |t|
  print "expanding `#{t}`: "
  expanded = wiktionary.api.expandtemplates.text(t).prop(:wikitext).response['wikitext']
  puts Infoboxer::Parser.inline(expanded).text
}

output:

expanding `{{der|en|ML.|-}}`: Medieval LatinAPI
expanding `{{m|la|Russiānus}}`: Russiānus
expanding `{{m|la|Russia}}`: Russia
expanding `{{der|en|orv|Русь}}`: Old East SlavicAPI Русь (Rusĭ)

(yeah, those API provided by [[Category: links are weird, but it is what it is)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants