Plugin Concept
The plugin concept allows to add additional features to wtf_wikipedia
without touching the source code. The benefit of plugins is, that the evolution of the underlying wtf_wikipedia
and the plugins can be performed independently and requires no forking and merging into the underlying wtf_wikipedia
. The wtf_wikipedia
plugins require an update if the the interface for generating plugins changes or the structure of the parse nodes in the Abstract Syntax Tree (AST) has e.g. new nodes, that must be handled also from the plugin.
Plug-ins are stored in the folder plug-in. Analyse the plug-in of the HTML export and adapt the plug-in to e.g. to your preferred file format as output. If you want to analyse the structure of your preferred export format you can try the online converter based on Pandoc by John MacFarlane and explore how format converter operate in general. Also the concept of an Abstract Syntax Tree (AST) may be helpful to understand to processing of an AST into an output format. Decide to support of new output formats plug-ins can also be used to improve the parsing functionalities of the repository wtf_wikipedia
and extract structured data into the JSON that was not extracted before in the provided version of wtf_wikipedia
.
Assume you have created a plug-in with the name wtf-my-new-plugin
on GitHub. Update your plug-in as usual with the standard git
procedures.
Now you can install your plugin after an npm publish
for plug-in wtf-my-new-plugin
and install the module wtf_wikipedia
and require the plugin of the repository wtf-my-new-plugin
in the following way, by extending the wtf
with the method extend()
. New new plug-in adds e.g. a new the method extract_something()
to the doc
object (not the wtf
object in NodeJS):
let wtf = require('wtf_wikipedia');
wtf.extend(require('wtf-my-new-plugin'))
wtf.fetch('Toronto Raptors').then((doc) => doc.extract_something())
// the function will return the specific extracted data, you want to have from the Wikipedia article of the `Toronto Raptors`.
You can find already implemented plug-ins in the plug-in folder of repository. To understand the require process for plug-ins require the existing plug-ins first and understand the workflow before you create your own.
wtf.extend(require('wtf-plugin-summary'))
wtf.fetch('Pulp Fiction').then((doc) => doc.summary())
// 'a 1994 American crime film'
wtf.extend(require('wtf-plugin-person'))
wtf.fetch('David Bowie').then((doc) => doc.birthDate())
// {year:1947, date:8, month:1}
wtf.extend(require('wtf-plugin-i18n'))
wtf.fetch('Ziggy Stardust', 'fr').then((doc) => {
doc.infobox().json()
//{ nom:{text:"Ziggy Stardust"}, oeuvre:{text:"The Rise and Fall of Ziggy Stardust"} }
})
As usual you can require the repository
-
wtf_wikipedia
withnpm install wtf_wikipedia --save
and - your own plugin
wtf-my-new-plugin
withnpm install wtf-my-new-plugin --save
in your new packagewtf_wikipedia2
. The you extendwtf
as mentioned above with theextend(...)
method and then you apply the build process ofwtf_wikipedia
to your mainindex.js
inwtf_wikipedia2/src/
similar to the build ofwtf_wikipedia
.
You might want to share e.g. a new plug-in and integrate it in the standard wtf_wikipedia
repository you can create a pull request for this repository and explain in the comments what additional functionality can be expected from that new plugin.
- Parsing Concepts are based on Parsoid - https://www.mediawiki.org/wiki/Parsoid
- Output: Based on concepts of the swiss-army knife of
document conversion
developed by John MacFarlane PanDoc - https://www.pandoc.org