New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP resolves Issue 3884: extension documentation #3898
base: main
Are you sure you want to change the base?
WIP resolves Issue 3884: extension documentation #3898
Conversation
97b9def
to
53c8424
Compare
This is now ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a great job -- there's a lot of really good information here and I learnt a lot.
I'm not the best reviewer for this but I've left a a few comments, hopefully which may be of use...
I'll keep reviewing.
Asciidoctor provides the following extension points: | ||
Document Processor:: | ||
A document processor does not have an explicit invocation in the document. | ||
Except for `IncludeProcessor`, a document processor is invoked at the appropriate stage of processing with the entire document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to explain the processing stages. Are these covered elsewhere in the docs?
From the Ruby API I think we need to include something like the section under "Overview" (points 1-6).
I would phrase something like:
Extensions participate during particular stages of Asciidoc processing as follows:
After source lines are read (and normalized) a
Preprocessor
will modify or replace the source lines before parsing begins.
AnIncludeProcessor
is used to process include directives for targets which they claim to handle.The Parser parses the block-level content into an abstract syntax tree (AST).
A custom blocks or block macros is then processed by the associatedBlockProcessor
orBlockMacroProcessor
.At this point each
TreeProcessor
is run on the abstract syntax tree.Conversion of the document begins. This is when inline markup is processed and converted.
This is where one or more custom inline macros are processed by the relevantInlineMacroProcessor
.After conversion of the document is complete, a
Postprocessor
can be used to modify or replace the converted document.The output is then written.
However this now duplicates the content partially of "Available extension points". Perhaps we should have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to experiment to see if there's a way to interleave a description of the processing stages and the processor types. It may take a while :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reasonably happy with what I came up with for an interleaved description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks really great, much better than my attempts 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we are there yet. This is the very first time these two terms are introduced and I think we're focusing on the wrong aspects. Here's what I propose:
Document Processor:: An extension that operates on the document at a key phase during the processing lifecycle. It accepts the document itself and any additional objects that are relevant at that point in the lifecycle (reader, output, etc).
Syntax Processor:: An extension that expands the AsciiDoc syntax at the location of custom element (block, block macro, or inline macro) and is invoked at the time that element is encountered by the processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we're not there yet. I think the crucial distinctions are that a doc processor operates on text, and has no invocation, and a syntax processor operates on the DOM/AST tree.
== Using a registry | ||
|
||
The most direct method is to explicity create a registry using `Asciidoctor::Extensions.create`, directly add processors to that registry, and supply the registry to a document as `extension_registry: my_registry`. | ||
For instance: | ||
|
||
[source,ruby] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's overkill but I really like self-contained examples that an uneducated user can simply copy and paste and then run and get a sane output.
As I see it for the experienced it is not an obstacle but for beginners it is a very great advantage.
How about from: https://github.com/asciidoctor/asciidoctor-extensions-lab/blob/master/lib/shout-block/extension.rb
require 'asciidoctor'
require 'asciidoctor/extensions'
# An extension that transforms the contents of a paragraph
# to uppercase.
#
# Usage
# [shout, 5]
# Time to get a move on.
#
class ShoutBlock < Asciidoctor::Extensions::BlockProcessor
PeriodRx = /\.(?= |$)/
use_dsl
named :shout
on_context :paragraph
name_positional_attributes 'vol'
parse_content_as :simple
def process parent, reader, attrs
volume = ((attrs.delete 'vol') || 1).to_i
create_paragraph parent, (reader.lines.map {|l| l.upcase.gsub PeriodRx, '!' * volume }), attrs
end
end
registry = Asciidoctor::Extensions.create
registry.block ShoutBlock # We are supplying a processor class
source = 'A little extension test
[shout,5]
hey world'
doc = Asciidoctor.load source, extension_registry: registry
puts doc.convert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an advanced technique that just about no one will need to use, so I don't think an example is needed.
The idea is to show that the registration code normally in a group doesn't actually need the group.
== Adding processors to a Registry | ||
|
||
Processors are added to a Registry by calling a registry method named after the type of processor. | ||
For instance, to add a block macro, call `registry.blockMacro ...`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the names are not all intuitive, for instance a BlockProcessor
just is called with block
but most other extension points have _processor
in them. But block_macro
is also missing processor.
Should we list them? Or perhaps provide a link to the Ruby API docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the document processor method names all end in processor and the syntax ones don't, but I'm going to try mentioning the registration methods on the processor specific pages. More detail may be needed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djencks is exactly right. The DSL methods for the syntax processors do not include the term processor because the focus is on the syntax (block, block macro, inline macro). The lifecycle processors need this, otherwise it just doesn't read we'll (pre, tree, post). So I kept the processor term there one so it reads better, but also to distinguish them from syntax processors.
@@ -0,0 +1,109 @@ | |||
= Block extension example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example given is relatively complex to my way of thinking. It involves a specific backend and invokes some concepts of css which won't be known to all users. I'd prefer something like the ShoutBlock example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about this.... so far I like my example. Possibly it can be made simpler, I think removing the 'role' attribute is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now could not be simpler IMO :-). Even if you don't understand why css classes might be interesting, you can see in the output that the blocks turn into "openblock". Since all the examples are html I think that's sufficient. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that the opening example should be the shout block. It is the simplest functional example I could come up with and I think it gives the user a quick win. I'm fine with advancing on to the nestable block example right after that. You could say something like "with that under your belt, let's look at a slightly more complex example". This is a very typical progression. But we can't just throw the reader into the deep end right out of the gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared with the shout block, the nestable block is fewer lines of code, simpler, and useful. How is this the deep end? And how mixed are your metaphors :-)? However, this isn't worth arguing about.
=== Traverse the block tree | ||
|
||
The code here is more complicated and needs to be recursive to find all blocks, but the attributes present at each block may be computed as the traversal proceeds. | ||
To replay the attributes, for each block execute `document.playback_attributes block.attributes`. | ||
Remember to call `document.restore_attributes` before returning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be good to have an example of this -- is this so you can determine the state of an attribute at an arbitrary position in the document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right about needing an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subject of tree traversal deserves its own page. It's an important topic, but also an advanced one. That page can set up the scenario when tree traversal is needed and go over the caveats of doing it. It's not something that is really part of the standard Asciidoctor usage and you need to know that you are offroading it a bit (at least as it stands now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been looking for examples of when tree traversal is needed and the only one I can come up with is fixing STEM processing, which ought to be doable with a block processor rather than a tree processor anyway. Deemphasizing tree traversal is definitely a good idea.
|
||
=== Query the document blocks using the `document.find_by` method | ||
|
||
This provides quick simple access to the blocks of interest, but does not provide access to the attributes set inline in the document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This provides quick simple access to the blocks of interest, but does not provide access to the attributes set inline in the document. | |
This provides quick simple access to the blocks of interest, but does not provide access to the attributes set inside in the document. |
We can't set attributes inline can we? I think we're talking about setting attributes outside the document header, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"set in the document after the document header"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but does not provide access to the attributes set inside in the document.
This should be framed as "does not honor attributes that are set in the body of the document".
docs/modules/extensions/pages/processors/inline-macro-processor.adoc
Outdated
Show resolved
Hide resolved
docs/modules/extensions/pages/processors/inline-macro-processor.adoc
Outdated
Show resolved
Hide resolved
docs/modules/api/pages/options.adoc
Outdated
|Overrides the extensions registry instance. | ||
Instead of providing a Ruby block containing extensions to register, this option lets you replace the extension registry itself, giving you complete control over how extensions are registered for this processor. | ||
|Specifies an explicit xref:extensions:glossary.adoc#id_registry[extensions registry] instance. | ||
Instead of providing a processor extension group to register extensions, using the :extensions option, this option lets you replace the extension registry itself, giving you complete control over how extensions are registered for this processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the :extensions option, this option lets...
We should probably try to avoid this repetition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewritten extensively.
84c20d8
to
a0b5ecd
Compare
First, I want to say thank you for taking this on because this is not an easy task / subject. Right now, I think this PR is too big to review. It needs to be split it up. It definitely shouldn't include a new test for the processor. And it seems to contain edits to docs that are only tangentially related to extensions. Let's process those 1-by-1 to trim this down. I'm also not a fan of the use of a pipe to run the examples. That's a very abnormal usage pattern and we don't want to encourage it. Let's create separate example files and instruct the user to run them. (Just to be clear, I'm find with putting the AsciiDoc in the examples directory). I also don't want to include the output of the example in the docs. That's just going to be a pain to maintain. We should either leave that up to the reader to observe on their on machine or we can find a way to auto-generate them at build time (perhaps using some sort of extension). I'll try my best to provide feedback on the new written material. |
Asciidoctor also allows extensions to be written using the full power of a programming language (whether it be Ruby, Java, Groovy or JavaScript). | ||
You don't have to shave yaks to get the functionality you want, and you can distribute the extension using defacto-standard packaging mechanisms like RubyGems or JARs. | ||
Extensions are central to the power of AsciiDoc because they open up the language to new use cases. | ||
Asciidoctor provides an extension API that offers a set of extension points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to emphasize here that AsciiDoc has the concept of extensions, but it's Asciidoctor, the processor, that provides the integration points to write extensions. In other words, extensions are standard in AsciiDoc, but how they are made is currently up to the processor. We'll eventually need something on the AsciiDoc side to clarify this, but that's for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, that's exactly what my wording says. I'll see if I can find a way to make it even clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about...
Extensions are central to the power of AsciiDoc because they open up the language to new use cases.
To implement this critical functionality, Asciidoctor provides an extension API that offers a set of extension points.
Extensions in Asciidoctor are easy to write, powerful, and simple to distribute. | ||
|
||
Asciidoctor also allows extensions to be written using the full power of the programming language of your Asciidoctor variant: Ruby, Java or other JVM based languages, or Javascript. | ||
An extension may be included in custom code or packaged and distributed using a language-standard mechanism such as Gems, Jars/Maven, or npm packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gems -> RubyGems
Jars/Maven -> jars (Maven is not a detail that is needed here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RubyGems and nom packages are both a packaging and distribution mechanism, whereas jars are only a packaging. To be parallel, Jars/Maven is necessary.
Asciidoctor provides the following extension points: | ||
Document Processor:: | ||
A document processor does not have an explicit invocation in the document. | ||
Except for `IncludeProcessor`, a document processor is invoked at the appropriate stage of processing with the entire document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we are there yet. This is the very first time these two terms are introduced and I think we're focusing on the wrong aspects. Here's what I propose:
Document Processor:: An extension that operates on the document at a key phase during the processing lifecycle. It accepts the document itself and any additional objects that are relevant at that point in the lifecycle (reader, output, etc).
Syntax Processor:: An extension that expands the AsciiDoc syntax at the location of custom element (block, block macro, or inline macro) and is invoked at the time that element is encountered by the processor.
Asciidoctor provides the following extension points, shown in the order of invocation, interleaved with a description of Asciidoctor processing stages. | ||
To be more specific, these are types of Processor, and a concrete implementation extends the corresponding Processor subclass. | ||
|
||
Asciidoctor starts with a source document and starts reading it into an array of lines and preprocessing it. First, as `include::` directives are processed as they are encountered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dlist that follows is missing list continuations, so it ends up getting split into multiple dlists. Either the dlist needs to be made continuous or the terms should be made into section titles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List continuations would conceal the interleaved aspect, and section headings would destroy it. How about making the processing stages a numbered list and the dlist items sublists?
Processes the [.class]#Asciidoctor::Document# (AST) once parsing is complete. | ||
See xref:tree-processor.adoc[]. | ||
The preprocessed lines are streamed into the parse stage. | ||
The parser converts the lines into a tree of blocks, calling block and block macro processors as their uses are encountered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation of the parser does not fit here. What this should say is that the preprocessor can manipulate the lines fed to the parser by pushing lines onto the reader, replacing the lines, or returning a new reader.
However, we need to add a HUGE disclaimer that manipulating the lines in a preprocessor has side effects and should only be used if you are fully aware of what those side effects are and how to deal with them. I'm not able to state what the are off the top of my head, but I know it entails a) line numbers in error messages will be off and b) reading the lines in the preprocessor can break includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I now understand preprocessors better and their dangers even more so I'll see what I can come up with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like my text as is. The details you suggest belong, IMO, on the processor-specific pages.
|
||
registry = Asciidoctor::Extensions::Registry.new :nestable=> NestableGroup | ||
|
||
Asciidoctor.convert_file 'nestable-test.adoc', :extension_registry=> registry, :standalone=> false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same vein, this should be:
Asciidoctor.convert_file 'nestable-test.adoc', extension_registry: registry, standalone: false
| `:comment` | ||
| `////` | ||
|
||
| `:fenced_code` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fenced_code context is obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't documented anywhere I can find, I'm leaving a comment noting obsolescence in place.
Sets the contexts allowed for this block. | ||
The choices are: | ||
|
||
[cols=2,opts=headers] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table should be a referenced rather than defined inline. These are well-defined contexts and they need to be defined elsewhere. They aren't specific to extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about allowing it here for now and finding a better place for it later? I'm not interested in rewriting all of the Asciidoc documentation for now.
@@ -1,11 +1,14 @@ | |||
* xref:index.adoc[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document processors should be grouped under "Document Processors" and the syntax processors grouped under "Syntax Processors". Otherwise, it just looks like a laundry list of extension points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are arranged currently in order of invocation. I've tried adding the type to the page titles, WDYT? Perhaps having a heading, "processors in order of invocation" would be useful? If you still want a type/alphabetic arrangement, OK.
|
||
BlockMacroProcessor is a SyntaxProcessor. | ||
|
||
== Registration method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where you are going with this structure, but Registration method is only relevant when defining the extension using a Ruby block. If you define via a class, then we need to specify the class. So perhaps write this as:
== Registration
=== DSL method
block_macro
=== Parent class
BlockMacroProcessor
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that you always need to call the appropriate registry method to add a processor to a registry.
I agree that explicitly mentioning the superclass is a good idea.
I think there's an important lesson to be learned here in terms of docs management. When we're making a major change to the docs, such as this one, we need to approach that change in stages. The first stage would be some general organization and cleanup to prepare the docs to be changed. (This might involve extracting content to individual files so they can be included and restructuring the nav). Set the stage, so to speak. Then, the next change would be to introduce the informative pages or sections (aka non-normative docs). The final change would be to rework the reference and examples to be more accurate, in-depth, and actionable. Trying to do it all at once just creates gridlock during editing. It also does not give the information architect a chance to review the structure before having a pile of content to address. We need to be more disciplined in order to make the process more efficient for everyone involved and ultimatelyl guarantee a successful outcome. |
...
Your proposed workflow would be more plausible if you were paying me or there were prompt, say overnight, reviews. Without that, it's basically a request not to contribute. I did post my initial thoughts which incorporated a lot of the structure I ended up with about a week before the rest of the major work, with no response. Without really quick review, I think a reasonable way to approach this is to decide if it is a significant improvement on what is there, and if so commit it, and pursue changes from there. What if you approached this with the frame of there being no existing extension documentation, and asking if there is anything wrong (as opposed to badly expressed) in the new docs, and anything missing? At this point I think the plausible choices are:
Looking over your proposed workflow again, I really have no idea what would be involved in any of the steps. It also seems to me that it presupposes that the outcome is known in detail before any work starts, which has never been my experience in any activity, especially programming-related. |
IIRC the test verifies that a claim in the old documentation is wrong, so it still seems reasonable to me to include it here. Perhaps I'll look at the other changes you think are tangential. With the extremely slow PR process for doc changes, there seems to be little point in supplying small PRs, since they tend to go nowhere.
No problem about the pipe, it's used in a lot of tests and is easy and compact. The output is already generated by the test script in examples, I guess it could be committed and 'include::'-ed.
|
…lock-macro-processor is documented
…nd add a way to test
…inlinemacro example
… nestable example
5ee7dfc
to
8aa38ab
Compare
I've responded to most or possibly all of your comments in one way or another, in this branch and ...
(these might be a lot easier to review than the issue-3884-extension-documentation branch. All 3 branches should have the same content. I can open PRs for these if desirable, although it seemed like clutter.)
|
This has most of the basic structure and much of the content I'd like to see in the extensions documentation. Note that so far among the processor pages only block-macro-processor is filled in.