Skip to content

ValYauw/Scribunto-Parse-Wikitemplates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

About

ParseTemplate is a Lua-based module for wikis (powered by MediaWiki) that provides a handy way of parsing wiki templates, variables and parser functions from a given portion of wikitext.

The extension Scribunto is required to use this module.

Written for the Vocaloid Lyrics Wiki to extract information from thousands of pages.

Quick Start Guide

To start using this module, type the following lines of code in a module page:

-- Import module
local module = require('Module:ParseTemplate')

-- Extract the actual wikitext string
local page_contents = mw.title.new("EXAMPLE PAGE"):getContent()

-- Parse the templates, variables and parser functions in the wikitext and organize them into a Lua table
local table_of_templates = module.extractTemplates(page_contents)

You can then index the templates in table_of_templates:

-- Index the first invocation of the template 'Template:Foo' parsed in the wikitext
local obj_template = table_of_templates["Foo"][1]

-- Get the contents of this template invocation
print(obj_template["template_contents"])  -- Example output: {{foo|param1|name=param2}}

-- Get the first unnamed parameter of this template invocation
print(obj_template["template_params"][1])  -- Example output: param1

-- Get the named parameter "name" of this template invocation
print(obj_template["template_params"]["name"])  -- Example output: param2

To iterate through the templates organized in table_of_templates:

-- Iterate through each group of template invocations
for template_group_name, arr_templates in ipairs(table_of_templates) do

  -- Iterate through each template in the sub-group
  for i, obj_template in ipairs(arr_templates) do

    -- Iterate through all parameters in each template
    for param_name, param_value in ipairs(obj_template["template_params"]) do
      ...
    end

  end

end

You can also use mw.text.jsonEncode to encode table_of_templates into a human-readable JSON string:

print( mw.text.jsonEncode(table_of_templates) )

Example Usage

Take the example wikitext portion of a page:

{{Stub}}{{Infobox character
 | title         = Daisy
 | image         = Example.jpg
 | imagecaption  = Daisy, blowing in the wind
 | position      = Supreme flower
 | age           = 2 months
 | status        = Active
 | height        = 5 inches
 | weight        = 20 grams 
}}

lorem ipsum dolor sit amet

==References==
{{Reflist}}

extractTemplates will extract the three templates (Stub, Infobox character, and Reflist) in the form of a Lua table as follows:

table_of_templates = {
  ["Stub"] = {
    [1] = {
	  ["start_pos"] = 1,
	  ["end_pos"] = 8,
	  ["template_contents"] = "{{Stub}}",
	  ["template_name"] = "Stub",
	  ["template_params"] = { }
    }
  },
  ["Infobox character"] = {
    [1] = {
	  ["start_pos"] = 9,
	  ["end_pos"] = 277,
	  ["template_contents"] = [=[{{Infobox character
 | title         = Daisy
 | image         = Example.jpg
 | imagecaption  = Daisy, blowing in the wind
 | position      = Supreme flower
 | age           = 2 months
 | status        = Active
 | height        = 5 inches
 | weight        = 20 grams 
}}]=],
	  ["template_name"] = "Infobox character",
	  ["template_params"] = { 
        ["title"] = "Daisy",
        ["image"] = "Example.jpg",
        ["imagecaption"] = "Daisy, blowing in the wind",
        ["position"] = "Supreme flower",
        ["age"] = "2 months",
        ["status"] = "Active",
        ["height"] = "5 inches",
        ["weight"] = "20 grams"
      }
	}
  },
  ["Reflist"] = {
    [1] = {
	  ["start_pos"] = 323,
	  ["end_pos"] = 333,
	  ["template_contents"] = "{{Reflist}}",
	  ["template_name"] = "Reflist",
	  ["template_params"] = { }
    }
  }
}

Which is equivalent to the following JSON data tree:

{
   "Stub":[
      {
         "start_pos":1,
         "end_pos":8,
         "template_contents":"{{Stub}}",
         "template_name":"Stub",
         "template_params":{},
      }
   ],
   "Infobox character":[
      {
         "start_pos":9,
         "end_pos":277,
         "template_contents":`{{Infobox character
 | title         = Daisy
 | image         = Example.jpg
 | imagecaption  = Daisy, blowing in the wind
 | position      = Supreme flower
 | age           = 2 months
 | status        = Active
 | height        = 5 inches
 | weight        = 20 grams 
}}`,
         "template_name":"Infobox character",
         "template_params":{
            "title":"Daisy",
            "image":"Example.jpg",
            "imagecaption":"Daisy, blowing in the wind",
            "position":"Supreme flower",
            "age":"2 months",
            "status":"Active",
            "height":"5 inches",
            "weight":"20 grams"
         }
      }
   ],
   "Reflist":[
      {
         "start_pos":323,
         "end_pos":333,
         "template_contents":"{{Reflist}}",
         "template_name":"Reflist",
         "template_params":{}
      }
   ]
}

Notes

  • Templates are grouped based on the template base page name. I.e. separate invocations using the call {{some template}}, {{Some template}}, and {{some_template}} will be grouped into the same group by the name of "Some template".
  • Variables and parser functions (such as {{DEFAULTSORT}}) will be grouped based on the base name of the variables/parser functions.
  • Because Lua tables are unordered by default, order of keys and values in the output may be different than expected.
  • This module is able to deal with templates nested within other templates.
  • This module is able to deal with characters escaped using the {{=}} & {{!}} magic words as well as characters enclosed within <nowiki> tags.

About

A Scribunto (Lua) module for use in MediaWiki-powered sites (e.g. Wikipedia, FANDOM wikis, Miraheze wikis). Used to parse wikitext templates into a Lua table.

Topics

Resources

Stars

Watchers

Forks

Languages