Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please clarify Preprocessor documentation #1262

Open
Vampire opened this issue Feb 22, 2024 · 9 comments
Open

Please clarify Preprocessor documentation #1262

Vampire opened this issue Feb 22, 2024 · 9 comments

Comments

@Vampire
Copy link

Vampire commented Feb 22, 2024

The docs at https://docs.asciidoctor.org/asciidoctorj/latest/extensions/preprocessor/ suggest that a preprocessor is working like I would expect it to and that it is meant for reading all lines and transforming them before the syntax is then further processed and parsed.

The example also shows exactly that.

But @mojavelinux in Zulip said quite the opposite and the canonical preprocessor page at https://docs.asciidoctor.org/asciidoctor/latest/extensions/preprocessor/ also objects.

You should not read from the reader in a preprocessor in its current state as it will break things further on and even using lines() only patially works as the preprocessor extension is only called on the main document but not on any included documents.

Please update the docs to make it clear that what is currently described and also shown in the example is not what you should do.

@mojavelinux
Copy link
Member

In truth, the preprocessor extension should never be used to read lines from the reader (unless they are lines that proceed the start of the AsciiDoc document, such as front matter). Although the prepreprocessor was originally designed to allow this use case, we learned after implementing it that it causes side effects that cannot be easily worked around.

The main issue is that when you read lines from the reader in a preprocessor, it blows by the header without processing it. This means it starts processing lines without considering any attributes defined in the document header. Although there's way to do this correctly, it requires a lot of extra code and that code ends up taking on parsing responsibilities. Only someone who is deeply familiar with how AsciiDoc parsed should attempt to do something like this.

The preprocessor can be used as an event listener and can safely modify options and attributes on the document. However, it should never read lines from the reader (except for front matter lines, as already mentioned).

@mojavelinux
Copy link
Member

even using lines() only patially works as the preprocessor extension is only called on the main document but not on any included documents.

lines is just the raw AsciiDoc source split into lines. It's safe to look at it. And that method is not partially working. It's doing exactly what it is supposed to do, which is to return the source of the current document. The source doesn't know anything about includes and thus isn't going to give you those documents.

The readLines method is what starts to invoke the preprocessor and that's what you should avoid.

@Vampire
Copy link
Author

Vampire commented Feb 22, 2024

I'm sorry I was unclear.
I didn't meant lines() per-se is only partially working.
What I meant was, that even with using lines() you cannot achieve what you want as the preprocessor only gets called for the main document but not for included documents.
At least that is what I observed. :-)

@robertpanzer
Copy link
Member

The current Preprocessor API in AsciidoctorJ unfortunately requires to work on the Reader that was passed as an argument.
Only the next major version of AsciidoctorJ will support creating a new reader from arbitrary content, and returning it from the Preprocessor.
Since this is a breaking change to the API this can only come in a major release.

Since there were complaints about breaking changes in a major release, I have no idea atm when I will be able to resolve this and create a new major release.

@mojavelinux
Copy link
Member

What I meant was, that even with using lines() you cannot achieve what you want as the preprocessor only gets called for the main document but not for included documents.

...and again, because it's not supposed to. The lines() method is returning the (remaining) raw source lines of the original document. The preprocessor, nor it's reader, should be doing anything with include directives. It's just giving you back the original document. See https://github.com/asciidoctor/asciidoctor/blob/main/lib/asciidoctor/reader.rb#L511-L516

From my perspective, no change is needed to AsciidoctorJ. We are simply discussing how Asciidoctor works.

@mojavelinux
Copy link
Member

I did notice that the Reader in AsciidoctorJ does not have methods for getSource() and getSourceLines(), which is truly the raw source and not just the remaining lines. Those should probably be added, though that's a separate issue.

@mojavelinux
Copy link
Member

the preprocessor only gets called for the main document but not for included documents.

...and a preprocessor is not called for included files because they are not themselves documents. The only time the preprocessor is called is just before the input document is parsed. It has no relationship to includes.

@Vampire
Copy link
Author

Vampire commented Feb 22, 2024

From my perspective, no change is needed to AsciidoctorJ. We are simply discussing how Asciidoctor works.

Yes, this issue is only about improving the documentation, not about changing the implementation. :-)

...and a preprocessor is not called for included files because they are not themselves documents.

Maybe this could also be changed with the changes @robertpanzer mentioned. :-)
Included files might themselves be documents or not, depends on the include.
But once the preprocessor is more what the term suggests so that you can actually preprocess and modify the input to the parser, it would be great if it would also work on included files, otherwise it again is much less useful than it could be. :-)

@mojavelinux
Copy link
Member

it would be great if it would also work on included files

That's what the include processor is for, not the preprocessor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants