Skip to content

cehteh/pipadoc

Repository files navigation

pipadoc - Documentation extractor
=================================
:author:   Christian Thaeter
:email:    ct@pipapo.org
:date:     16. September 2020


[preface]
Introduction
------------

Embedding documentation in program source files often results in the problem that the
structure of a program is not the optimal structure for the associated documentation.
Still there are many good reasons to maintain documentation together with the source right
within the code which defines the documented functionality. Pipadoc addresses this problem
by extracting special comments out of a source file and let one define rules how to compile
the documentation into proper order.  This is somewhat similar to ``literate Programming''
but it puts the emphasis back to the code.

Pipadoc is programming language and documentation system agnostic, all it requires is that
the programming language has some form of comments starting with a defined character sequence
and spanning to the end of the source line. Moreover documentation parts can be written in
plain text files aside from the sources.


Getting the Source
------------------

Pipadoc is managed the git revision control system. You can clone the repository with

 git clone --depth 1 git://git.pipapo.org/pipadoc

Pipadoc is developed in the 'devel' and feature branches using rolling releases. Whenever
stability is reached things get pushed to the 'master' branch. In few cases this may
include backward incompatible changes. When upgrading one should check and eventually fix
resulting issues.


Installation
------------

Pipadoc is single Lua source file `pipadoc.lua` which is portable among most Lua versions
(PUC Lua 5.1, 5.2, 5.3, 5.4, Luajit, Ravi). It ships with a `pipadoc.install` shell script
which figures a suitable Lua version out and installs `pipadoc.lua` as `pipadoc` in a
given directory (the current directory by default).

There are different ways how this can be used in a project:

- When a installed Lua version is known from the build tool chain one can include the
  `pipadoc.lua` into the project and call it with the known Lua interpreter.
- One can rely on a pipadoc installed in '$PATH' and just call that from the build tool chain
- One could ship the `pipadoc.lua` and `pipadoc.install` and do a local install in the build
  directory and use this pipadoc thereafter.


Usage
-----

Pipadoc is called with options and all input files. It may read a configuration file. When
generating output it is either send to _stdout_ or saved as a given output file.

.....
 pipadoc [options...] [inputs..]
   options are:
     -v, --verbose
                         increment verbosity level
 
     -q, --quiet
                         suppresses any messages
 
     -d, --debug
                         set verbosity to debug level
                         one additional -v enables tracing
 
     -n, --dry-run
                         do not generate output
 
     -h, --help
                         show this help
 
     -r, --register <name> <file> <comment>
                         register a filetype pattern
                         for files matching a file pattern
 
     -t, --toplevel <name>
                         sets 'name' as toplevel node [MAIN]
 
     -c, --config <name>
                         selects a config file [pipadoc_config.lua]
 
     --no-defaults
                         disables default filetypes and configfile loading
 
     --list-sections
                         Parses input and lists all sections on stdout
                         appended with '[section]', '[keys]' or [section keys]'
                         depending on the contents, includes '--dry-run'
 
     -m, --markup <name>
                         selects the markup engine for the output [text]
 
     -o, --output <file>
                         writes output to 'file' [stdout]
 
     -a, --alias <pattern> <as>
                         aliases filenames to another filetype.
                         for example, treat .install files as shell files:
                          --alias '(.*)%.install' '%1.sh'
 
     -D, --define <name>[=<value>]
                         define a GLOBAL variable to value or 'true'
     -D, --define -<name>
                         undefine a GLOBAL variable
 
     --
                         stops parsing the options and treats each
                         following argument as input file
 
   inputs are file names or a '-' that indicates standard input
.....


Basic concepts
--------------

Pipadoc is controlled by special line comments. This is chosen because it is the most common
denominator between almost all programming languages. To make a line comment recognized as
pipadoc comment it needs to be followed immediately by a operator sequence. Which in the
simplest case is just a single punctuation character.

These comments are stored to named documentation sections. One of the main concepts of pipadoc
is that one can define which and in what order these sections appear in the final output.

To add special functionality and extend the semantics one can define pre and post processors.
Preprocessors are defined per input programming language and process all source lines. They can
modify the source arbitrary before pipadoc does the parsing. This allows to generate completely
new content. Lift parts on the source code side over to the documentation and generate
additional information such as indices and glossaries.
Postprocessors are defined for output markup and process every line in output order. The allow
to augment output in a markup specific way.

There is a string substitution/template engine which can processes text. The string
substitution engine also implements a small lisp-like programming language which can be used
to generate content programmatically.


Syntax
------

.In short:
A pipadoc comment is any 'line-comment' of the programming language directly (without spaces)
followed by a optional alphanumeric section name which may have a sorting key appended by
a dot, followed by an operator, followed by an optional argument. Possibly followed by the
documentation text itself.

Only lines qualify this syntax are processed as pipadoc documentation. Preprocessors run
before the parsing is done and may translate otherwise non pipadoc sourcecode into documentation.

.The formal syntax looks like:
....
<pipadoc> ::= [source] <linecomment> <opspec> [ <space> [documentationtext]]

<source> ::= <any source code text>

<linecomment> ::= <the filetypes linecomment sequence>

<opspec> ::= [section['.'key]] <operator> [argument]

<section> ::= <alphanumeric text including underscore>

<operator> ::= ":" | "+" | "=" | "{" | "}" | "@" | "$" | "#" | "!"
               | <user defined operators>

<key> ::= <alphanumeric text including underscore>

<argument> ::= <alphanumeric text including underscore>

<documentationtext> ::= <rest of the line>
....


Sections and Keys
-----------------

Text in pipadoc is stored in named 'sections' and can be associated with some additional
alphanumeric key under that section. This enables later sorting for indices and glossaries.

One-line sections are defined when a section and maybe a key is followed by documentation
text. One-line doctext can be interleaved into Blocks.

At the start if a input file the default block section name is made from the files name
up to the first dot.

Sections are later brought into the desired order by pasting them into a 'toplevel' section.
This default name for the 'toplevel' section is 'MAIN_{markup}' or if that does not exist
just 'MAIN'.



Order of operations
-------------------

Pipadoc reads all files line by line. and processes them in the following order:

Preprocessing ::
  Preprocessors are Lua functions who may alter the entire content of a line before any
  further processing. They get a 'context' table passed in with the 'SOURCE' member
  containing the line read from the input file.

Parsing ::
  The line is broken down into its components and the operators processing function
  will be called. The ':' and '+' operators do a first string substitution pass to
  expand variables. This string substitution is done in input order.
  String substitution macros may leverage this for additional state and may generate
  extra content like indices and append that to the respective sections.

Output Ordering ::
  The output order is generated by assembling the '{toplevel}_{markup}' or
  if that does not exist the '{toplevel}' section.
  The paste and sorting operators there define the section order of the document.
  The conditional operators '{' and '}' are also evaluated at this stage and may omit some
  blocks depending on selection predicates.

Postprocessing ::
  For each output context the postprocessors run in output order.
  Finally a last string substitution pass is applied in output order.
  This pass can generate markup specific changes.

Writeout ::
  The finished document is written to the output.

Report empty sections, Orphans and Doubletes::
  Pipadoc keeps stats on how each section was used. Finally it gives a report (as warning)
  on sections which appear to be unused or used more than once. These warnings may be ok, but
  sometimes they give useful hints about typing errors. To suppress such reports of
  intentional left out sections one can use the '!' operator.

It is important to know that reading happens only line by line, operations can not span
lines. While Processing steps can be stateful and thus preserve information for further
processing.


Filetypes
---------

Pipadoc needs to know about the syntax of line comments of the files it is reading.
For this patterns are registered to be matched against the file name together with a
list of line comment characters.

Definitions for a common programming languages are already included. For languages
that support block comments the opening (but not the closing) commenting characters are
registered as well. This allows one to define section blocks right away. Note that using



Markup Languages
----------------

The core of pipadoc is completely agnostic about the markup used within the documentation
strings. The '--markup' option only sets the 'MARKUP' variable and output generation tries
include the markup in the toplevel. Usually only string substitution and postprocessors
should handle markup related things.

The shipped configuration file comes with postprocessors for 'asciidoc' and 'text'
markups. More will will be added in future (org-mode, markdown).


Operators
---------

Operators define how documentation comments are evaluated, they are the core functionality
of pipadoc and mandatory in the pipadoc syntax to define a pipadoc comment line. It is
possible to (re-)define operators. Operators must be a single punctuation character.


`:` ::
  The documentation operator. Defines normal documentation text. Each pipadoc comment using
  the `:` operator is processed as documentation. Later when generating the toplevel
  Section is used to paste all other documentation in proper order together.

`+` ::
  Concat operator. Like ':' but appends text at the last line instead creating a new line.
  Note that only the 'TEXT' is appended all other context information gets lost.

`=` ::
  Section paste operator. Takes a section name as argument and will paste that section in
  place.

`{` ::
  Conditional block start. Needs a string substitution predicate as ARG and its
  arguments (without the closing curly brace). Evaluated at 'output' times. When this
  predicate evaluates to *true* then the following documentation is included in the
  output, when *false* then the following output within this block is supressed. A
  conditional block must end with the matching closing curly brace operator. Conditional
  blocks can be nested. For possible predicates see <<Predicates,below>>.

`}` ::
  Conditional output block end. Must match a preceeding block start.

`@` ::
  Alphabetic sorting operator. Takes a section name as argument and will paste section
  text alphabetically sorted by its keys.

`$` ::
  Generic sorting operator. Takes a section name as argument and will paste section text
  sorted by its keys.

`#` ::
  Numerical sorting operator. Takes a section name as argument and will paste section text
  numerically sorted by its keys.

`!` ::
  Section drop operator. Deletes the section given as argument at output time.
  Used to clean up orphan warnings for unused sections for certain toplevels.



Processors
----------

Pre- and Post- processors are lua functions that allow to manipulate text
programmatically. They get the full context of the current processed line as input and can
return the modified line or choose to drop or keep the line. They are can freely store
some state elsewhere and thus allow some processing that spans lines which is normally not
available in pipadoc.


Preprocessors
~~~~~~~~~~~~~

Preprocessors are per filetypes. A preprocessor can modify any input line
prior it gets parsed and further processed. Preprocessors are used to autogenerate
extra documentation comments from code. Lifting parts of the code to the documentation
side. They operate on the whole source line, not only the pipadoc comment.

This is the place to generate data for new sections (Gloassaries, Indices).

Postprocessors
~~~~~~~~~~~~~~

Postprocessors run at output generation time. They are registered per markup type.
They are used to augment the generated output with markup specific things. They operate
only on parsed documentation comments and only those should be modified. All other data is
already available and stored, thus they may not generate new sections.


String Substitution Engine
--------------------------

Documentation text is be passed to the string substitution engine which recursively
substitutes macros within curly braces. The substitutions are taken from the passed
context (and GLOBAL's).

When a docline is entirely a single string substitution (starting and ending with a
curly brace) and the string substitution resulting in an empty string, then the
whole line becomes dopped. If this is not intended one could add a second empty
'{NIL}' FOO string substitution to the line.

A string substitution can be either a string or a Lua function which shall return
the substituted text.

* When the susbtitution is defined as string, then the argument passed as +__ARG__+ and the
  resulting string will become recursively evaluated by the engine.
* When it is a function, then this function is responsible for calling recursive evaluation
  on its arguments and results.


String Substitution Language
----------------------------

The string substitution engine comes with some macros predefined which implement a simple
lisp-like programming language to allow conditional evaluation and (in future) other useful
features. This language is enabled when assembling the output in order and evaluated in
the last step of the postprocessor.


Configuration File
------------------

Pipadocs main objective is to scrape documentation comments from a project and generate
output in desired order. Such an basic approach would be insufficient for many common cases.
Thus pipadoc has pre and postprocessors and the string substitution engine to generate and
modify documentation in an extensible way. These are defined in an user supplied
configuration file.

Pipadoc tries to load the configuration file on startup. By default it is named
+pipadoc_config.lua+ in the current directory. This name can be changed with the
'--config' option.

The configuration file is used to define pre- and post- processors, define states
for those, define custom operators and string substitution macros. It is loaded and
executed as it own chunk and may only access the global variables and call the
API functions described below.

Without a configuration file none of these processors are defined any only few
variables for string substitution engine are set.


External Libraries
~~~~~~~~~~~~~~~~~~

'pipadoc' does not depend on any external Lua libraries. Nevertheless modules can be loaded
optionally to augment the behavior and provide extra features. Plugin-writers should
use the 'request()' function instead the Lua 'require()', falling back to simpler but usable
functionality when some library is not available or call 'die()' when a reasonable fallback
won't do it.

Pipadoc already calls 'request "luarocks.loader"' to make rocks modules available.


[appendix]
GNU General Public License
--------------------------

----
pipadoc - Documentation extractor
Copyright (C)                        Pipapo Project
 2015, 2016, 2017, 2020              Christian Thaeter <ct@pipapo.org>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
----