Skip to content
Engelbert Niehaus edited this page Aug 15, 2018 · 19 revisions

Citation Management

The citations/references integrated the Wiki markdown are parsed by wtf_wikipedia and the aggregraed references in the text can be accessed and exported from the generated document doc with doc.json() or doc.toJSON().

wtf_wikipedia does a great job for extracting citations and references from a given wiki markdown source.

Remark on Refactoring

The following description is not implemented in wtf_wikipedia 5.0 and serves as a basis for the software design process. Please feel free to adapt the content of this GitHub wiki and improve the description prior to implementation of the solution. In the sense of Agile Software Developement some code is inserted here for checking the basic concept, but the code should be regards more or less a pseudo code to understand the proposal. The code is not meant to be a copy and paste resource for the implementation.

The parsed result JSON doc that contains all parsed citation from the Wiki Markdown article in doc.citations, which is an array of all collected citations during the wtf.parse(...)-Call.

Basic Concepts of Citation Management

This is a text about Swarm 
Intelligence<ref>Swarm Behaviour, SomeAuthor, 
(2009), SomeEditor, Publisher</ref> and other content. 

Example as Introduction

Parsing the document extracts the content inside the ref-tags and pushes the content into the doc.citations array. The desired output should be in plain text.

This is a text about Swarm Intelligence [1] and other content. 

or depending on the Citation Style

This is a text about Swarm Intelligence [SomeAuthor2009] and other content. 

References are added to wiki document in a different way than you might expect that from other citation management software like Zotero. Classically a bibliographic citation is a reference to a database record that defines a book, article, web page, or other published item by attributes, that allow other scientistic to validate statements and make scientific results reproducible. Authors in Wikipedia, Wikiversity, ... do not create a link to database entry of a book or article, they add the references completely at the location in wiki document were the citation is needed. The MediaWiki aggregates these references and lists the bibliography at the end of the document (by default) or at a location where a marker for injecting the bibliography is placed.

  • wtf_wikipedia extracts the references into JSON which allows the citation management within a database.
  • a consequence of that is, that it is necessary to place a kind of unique marker at the place where ref-tags defined a reference to book or article.

Citation Markers Injection

A citation marker is in the sense of an Abstract Syntax Tree (AST) a tree node Cite that contains an unique identifier for a reference in JSON database of citations. The unique identier could be

  • the array index (not recommended, if citations sorted alphabetically), better to store a counter value, that is incremented for all parsed references in cite.id = cite_counter().
  • the ref-tags are removed by the parser, so a Cite node must be inserted in the AST on the sentence level of parsing (see ContentList proposal).
  • as an interims solution of citation could be replaced by marker (e.g. ___CITE_1___ or ___CITE_SomeAuthor2009___) that do not create conflicts with other parsing processes that follow.
  • an other alternative is to use the standard citation markers e.g. in a Wikipedia source. If your already inserted a new citation in a wiki article with a ref-tag wiki authors may need the reference to the book twice in one wiki article. To avoid a multiple listing of one book or article in the reference list at the end of the document in the wiki source the following code is used.

The citation marker injection was not implemented in wtf_wikipedia release 5.0 and its software design process and parsing issues can be defined in this GitHub-Wiki prior to implementation. A solid software design could reduce the workload of implementation and especially for Spencer Kelly who implemented most of it.

The result of the citation marker injection of the example mentioned above would look like this:

This is a text about Swarm Intelligence ___CITE_1___ and other content. 

or

This is a text about Swarm Intelligence ___CITE_SomeAuthor2009___ and other content. 

A citation marker would like this ___CITE_label___ with a label, consisting of A-Za-z0-9\-, so that the citations markers can replaced later in the output by a reference to a book or article in the approrpriate citation style (see PanDoc Citation Management

If we use the standard ref-name citation marker in the wiki source is will look like this

This is a text about Swarm 
Intelligence <ref name="SomeAuthor2009"/> and other content. 

Remark: This type of citation marker are currently removed in by the function kill_xml() in the file /src/document/preProcess/kill_xml.js.

Remark: The marker injection is currently necessary because parsing of references needs to preserve location in the text, where the literature was cited and the currently designed parsers take strings as input. Non-conflicting markers seems to be a workaround for this until the parsing of Abstract Syntax Tree allows the tree node generation at the time citation and reference detection. There might be better iterims solution. Please discuss and propose alternatives in this wiki prior to implementation, to minimize workload for implementation and deadends of development.

Status of Citation Management in wtf_wikipedia

Version 5.0

The parsing of references/citation are called in /src/section/index.js in the doSection() method

const doSection = function(section, wiki, options) {
  ...
  //parse the <ref></ref> tags
  wiki = parse.references(section, wiki, options);
  ...

The version removes citation markers in kill_xml() method (see /src/document/preProcess/kill_xml.js). If you want play around with citation marker replacement, you must comment out especially the citation marker removal that eliminates citation in the form of

<ref name="SomeAuthor2009"></ref>  or <ref name="SomeAuthor2009"/>

See /src/document/preProcess/kill_xml.js l.10:

  //only kill ref tags if they are selfclosing
  wiki = wiki.replace(/ ?< ?(ref) [a-zA-Z0-9=" ]{2,100}\/ ?> ?/g, ' '); 
  // removes tags like <ref name="asd"/> but not <ref name='asd'/> in 5.0 - not used/allowed??

Parsing Citations/References

If you want to analyse how wtf_wikipedia parses the citations you should look at /src/section/references.js. The parsing process is part of section parsing method for the section body

doSection(section, wiki, options) 

in /src/section/index.js via the call of

wiki = parse.references(section, wiki, options);

The method of parsing the references is defined in /src/section/references.js - see also the basic structure of parsing method in Parsing Wiki Source)

Helpful Links for Citation Handling in JavaScript

  • Use the template mechanismn of the MediaWiki to render a output in a specific format. wtf_wikipedia is able to resolve a template.
  • (Alternative) https://citation.js.org/demo/ how to convert citations with a specific style into an output format.
  • (Alternative) HandleBarsJS as a template engine might be helpful to convert JSON data about a citation into a specific output format.

Where to add the Citation Management - Output Format

The way how citations are handled is depending on the Output Format and the preferences of the user of wtf_wikipedia

  • export all citations in a BibTeX-format (use FileSaver.js by Eli Grey to generate a file save as Download for exporting generated BibTex-files in a browser).
  • generate a bibliography and inject this bibliography into the output text at the marker {{Reflist|2}}.
\bibliography{mybib}{}
\bibliographystyle{apalike}

The replacement inserts all cited literature in the bibliography.

  • create a citation helper function that is performed whenever a citation is found. It determines, what to inject at the location where the location is found in the Wiki markdown text.

Conclusion: A solution for the citation management could be a citation.js for all output formats in /src/output (e.g. /src/output/latex/citation.js). This library processes

<ref>{{cite web|title=What is OER?|url=http://wiki.creativecommons.org/What_is_OER|work=wiki.creativecommons.org|publisher=Creative Commons|accessdate=18 April 2013}}</ref>

LaTeX Citation Handling

Replace the citation in Wiki Markdown with a cite-command in LaTeX that uses the id of the citation record.

citations : [
  {
      "id": "C1D20180327T1503",
      "type": "book",
      "title": "Swarm Intelligence: From Natural to Artificial Systems",
      "author":[
        {
          "given": "Eric",
          "family": "Bonabeau",
        },
        {
          "given": "Marco",
          "family": "Dorigo",
        },
        {
          "given": "Guy",
          "family": "Theraulaz",
        }
    ],
     "year": 1999,
     "isbn": "0-19-513159-2"
   },
   ....
]

The citation mechanism of BibTex will work if the citations in the JSON array is part of the BibTeX database of your LaTeX enviroment. So alteration and/or export of the collected citations in wtf_wikipedia is necessary.

\cite{C1D20180327T1503}

The cite command will be replaced by LaTeX according to your selected citation style (e.g. APA with (Bonabeau, 1999)).

Citation JSON Post-Processing (ToDo)

The citations in the parse JSON by wtf_wikipedia.js needs some post-processing.

Alteration of the current JSON format

The current JSON format for the citation array is a result of the storage of citations in the Wiki markdown language

citations : [
  {
     "cite": "book",
     "title": "Swarm Intelligence: From Natural to Artificial Systems",
     "first1": "Eric",
     "last1": "Bonabeau",
     "first2": "Marco",
     "last2": "Dorigo",
     "first3": "Guy",
     "last3": "Theraulaz",
     "year": 1999,
     "isbn": "0-19-513159-2"
   },
   {
       "cite": "journal",
       "last1": "Bertin",
       "first1": "E.",
       "last2": "Droz",
       "first2": "M.",
       "last3": "Grégoire",
       "first3": "G.",
       "year": 2009,
       "arxiv": 907.4688,
       "title": "Hydrodynamic equations for self-propelled particles: microscopic derivation and stability analysis",
       "journal": "[[J. Phys. A]]",
       "volume": 42,
       "issue": 44,
       "page": 445001,
       "doi": "10.1088/1751-8113/42/44/445001",
       "bibcode": "2009JPhA...42R5001B"
     }
]

must be converted into The citations in the parse JSON by wtf_wikipedia.js needs some post-processing.

citations : [
  {
      "id": "C1D20180327T1503",
      "type": "book",
      "title": "Swarm Intelligence: From Natural to Artificial Systems",
      "author":[
        {
          "given": "Eric",
          "family": "Bonabeau",
        },
        {
          "given": "Marco",
          "family": "Dorigo",
        },
        {
          "given": "Guy",
          "family": "Theraulaz",
        }
    ],
     "year": 1999,
     "isbn": "0-19-513159-2"
   },
   {
     "id": "C1D20180327T1503",
     "type": "journal",
     "author":[
       {
         "family": "Bertin",
         "given": "E.",
        },
       {
         "family": "Droz",
         "given": "M.",
       },
       {
         "family": "Grégoire",
         "given": "G.",
       }
     ],
       "year": 2009,
       "arxiv": 907.4688,
       "title": "Hydrodynamic equations for self-propelled particles: microscopic derivation and stability analysis",
       "journal": "[[J. Phys. A]]",
       "volume": 42,
       "issue": 44,
       "page": 445001,
       "doi": "10.1088/1751-8113/42/44/445001",
       "bibcode": "2009JPhA...42R5001B"
     }
]

After this conversion is done, the citations can be cross-compiled in the output format with a template or added to a BibTeX-file that is used for creating a LaTeX document.

  • Create an attribute author in all bibliographic records in the author array citations,
  var c = data.citations;
  for (var i = 0; i < c.length; i++) {
    // add to author array to all bibitem records b=c[i]
    c[i]["author"] = [];
    var b = c[i];
    // add an unique ID for bibitem records b=c[i]
    if (!(b.hasOwnProperty("id"))) {
      // if bibitem has no id-key add a unique id
      const now = new Date();
      b["id"] = "T"+now.getTime()+"R"+i;
      // e.g. T1508330494000R2
    };
    var count = 1;
    var family = "";
    var given = "";
    var delkeys = [];
    var key = "";
    for (var k in c[i]) {
      key = "first"+count;
      if (b.hasOwnProperty(key)) {
        // store given name
        given = b[key];
        // store the key for delete
        delkeys.push(key);
      } else {
        given = ""  
      };
      key = "last"+count;
      if (b.hasOwnProperty(key)) {
        // store family name
        family =  c[i]][key];
        // store the key for delete
        delkeys.push(key);
        // add author to author array with family and given name
        (b["author"]).push({"family":family,"given",given})
      };
      count++;
    };
    // clean up key/value pairs
    // remove first1, last1, ... as key/value pairs from bibitem records b=c[i]
    for (var i = 0; i < delkeys.length; i++) {
      // delete keys first1, last1, first2, last2, ... if they exist.
      delete c[i][delkeys[i]];
    };
  }
  • HandleBarsJS can be used to generated the citation in a specific format. E.g. the content of the data.citations[i]["title"] will replace the key marker {{title}} in a HTML template. The wrapped HTML-tags will render the title in italics.
... <i>{{title}}</i>, ({{year}}), {{journal}} ...

HTML Citations - Replacement of Reflist-Marker

In the Wiki Markdown the reference are stored either at the very end of Wiki markdown text or at the reference marker {{Reflist|2}} as the two column reference list of all citations found in the Wiki markdown article. The compilation of the citations in the parsed JSON file of wtf_wikipedia will be converted e.g. with a HandleBarsJS template into an appropriate output format. A citation reference (Bertin2009) will be inserted that links to a HTML page anchor in the reference list:

LaTeX Citations - BibTex or Bibliography

LaTeX has its own citation management system BibTex. If you want to use the BibTex, convert the collected citation in the array data.citations.

wtf.from_api("Swarm intelligence", 'en', function (wikimarkdown, page_identifier, lang_or_wikiid) {
  var options = {
    page_identifier:page_identifier,
    lang_or_wikiid:lang_or_wikiid
  };
var data = wtf.parse(wikimarkdown,options);
console.log(JSON.stringify(data, null, 2));
});

Convert Citations in BibTex-Database or Bibliography

The JSON hash data contains an array with all parsed citations from the Wiki Markdown article. Loop over data.citations and convert all bibitem records from the array of all collected citations into the BibTex format (e.g. with HandleBarsJS ). Without BibTex it is possible to render the citation in the array data.citations into an bibitem in the bibliography. This is the same procedure without a database and explicit list of collected citations similar to an direct approach mentioned for HTML. The bibliography can be added to the end of the LaTeX file to add the citation. (see Bibiography in LaTeX )

Citation and References

In the Wiki markdown syntax the citation is inserted in the Wiki text at a position where the citation is mentioned. Later in the HTML generated output in the MediaWiki the collected citations are listed at the very end of the document or (if applicable) at the marker position (e.g. {{Reflist|2}}) in the Wiki markdown source.

In LaTeX this marker can be replaced by the appropriate LaTeX command (see http://www.bibtex.org/Using/ )

\bibliography{mybib}{}
\bibliographystyle{plain}