Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fo transformations #121

Open
rhwood opened this issue Jul 18, 2021 · 10 comments
Open

Fo transformations #121

rhwood opened this issue Jul 18, 2021 · 10 comments

Comments

@rhwood
Copy link

rhwood commented Jul 18, 2021

Should the .fo transformations be ported from xslt 1.0 to this, or is this xslt set intentionally purely focused on HTML output?

@ndw
Copy link
Contributor

ndw commented Jul 20, 2021

That is a question that I wrestle with regularly. I'm inclined to say that the way forward is through an XML to HTML transformation (not the same transformation as for web pages, but still HTML with extra classes and such) and then print formatting via CSS. Given that development of FO has stopped, that seems like the right approach.

On the other hand, FOP is good enough for a lot of folks and there doesn't seem to be a similarly featured open source CSS print formatter. Which surprises me a little bit.

If lack of print output is holding you back from adopting these stylesheets, I guess I want to know that. Particularly if you're relying on FOP. Getting some kind of print output is on my todo list, but it's currently somewhere below "finish XML Calabash 3.x". ☹️

@rhwood
Copy link
Author

rhwood commented Aug 16, 2021

Sorry for the delay in responding.

My requirement is a CLI asciidoc->PDF workflow; I am attempting to make it a workflow that requires minimal installation, consistency across platforms, and some complex formatting of the PDF output (i.e., number every fifth line and print some sections in a two column layout). I have examined the existing asciidoc-to-pdf tools and have determined that an asciidoc->docbook->fo->PDF process run using maven is my current best shot at minimal install (user needs to install a JDK and a recent maven and the workflow takes care of everything else) while ensuring consistency (maven downloads the same JAR versions for every user) and allowing complex outputs (using fop within maven).

ianw added a commit to ianw/bottomupcs that referenced this issue Aug 5, 2022
Update to Docbook xslTNG toolchain.

This drops all sorts of now unneeded dependencies.

The pdf and epub are generated by the print output CSS
by "prince".  It seems to be the best option for now [1].

[1] docbook/xslTNG#121
@tomschr
Copy link
Member

tomschr commented Jan 23, 2023

Just some comments/views from my side.

Me and my company are also interested in a FO transformation for several reasons:

  • FOP is getting better from release to release. In former times it was very limited, but now it's quite good. Perhaps for Arabic or other languages it still needs some work, but it's a good alternative for XEP (and FOP is open source).
  • We still have a XSLT 1.0 code base for FO. If possible, we would like to gradually move to the new stylesheets if they would be available.
  • In most cases, the combination of the XSLT 1.0 stylesheets and FOP works quite well. Of course, there are always some things that can be improved. Moving to a new set of tools with HTML+CSS comes usually with different problems and challenges. Some can be solved or mitigated, but I expect some cannot be solved.

Maybe some pro points why it would be beneficial to have a native XSLT 3.0 implementation of the FO stylesheets:

  • See above about how FOP improved over time.
  • Makes it more feature parity with the DocBook XSLT 1.0 stylesheets
  • Although the XSL-FO spec isn't developed further, for most (technical) documentation it doesn't matter IMHO. It's completely sufficient. As DocBook deals mostly with technical documentation, these types don't tend to have fancy layouts. XSL-FO 1.1 is fully capable of doing this.
  • Just have another option to support a printed output, not only with HTML+CSS.
  • We don't have open source CSS print formatters, but we do have FOP. As long as there are no open source alternative, it would be a good solution to have a FO workflow.
  • Although some browsers support HTML+CSS to PDF in a headless mode, as far as I can see it's subpar in contrast to FO.
  • Maybe as an interims solution, we could use the XSLT 1.0 stylesheets with Saxon 10/11? Over time, we could gradually move to XSLT 3.0

On the contrary, we should also consider the cons:

  • For people who write customization layers, it's probably easier to just have one layer and do all the output for HTML and PDF with HTML+CSS. No need to maintain two different customization layers.
  • Maybe a headless browser output of the HTML+CSS rendering is just "good enough"?
  • If we start to implement the FO stylesheets, they need to be maintained to some degree.

@frank-steimke
Copy link

I think that If we start to implement the FO stylesheets, they need to be maintained to full degree. Either you do it completely or not at all. Halfheartedly is not an option. This means: double the effort , unless you find a more intelligent solution. There are some options, they all need volunteers:

  1. development of XSLT 3 FO stylesheets, independent but carefully aligned to the xslTNG HTML+CSS Styleheets. IMHO not realistic.
  2. development of an open source paged Media processor. IMHO also not realistic.
  3. development of some sophisticated mechanism to generate FO and HTML+CSS Stylesheets from some higher-level intermediate language. Interesting, but also not realistic. You would need a language with enough expressive power to generate XSLT 3 Stylesheets, what language would that be?
  4. development of an open source engine that takes HTML+CSS as input and generates XSL-FO. Maybe this is what the guys from the Oxygen Company (SyncroSoft) are doing with the Oxygen Chemistry product? I tried to generate PDF from xslTNG output within the Oxygen Suite with PDF Chemistry, but without success. Maybe it was just a minor issue? I cant tell since its not open source.

I would think that the last option is the only realistic one, but @tomschr , this would not help the people that still have an old XSLT 1 codebase, would it? Also, you have to deal with CSS combined with XML / (X)HTML as the base for the translation to FO. You can find this technique under the Name CSSa (meaning CSS as Attributes in XML) as part of the transpect open source framework. So maybe there remains only one option like this:

  1. Development of an xslTNG post processing step which produces (X)HTML with CSSa, based on the existing CSS for xslTNG;
  2. Development of an translation engine which takes HTML+CSSa as input and generates XSL-FO for FOP.

Should be possible as an open source project based on transpect (maybe upgraded with a brand new Calabash processor for XProc 3?) I have absolutly no idea how much effort it would take.

Greetings, Frank

@tomschr
Copy link
Member

tomschr commented Aug 21, 2023

Thanks @frank-steimke for your interesting perspective! You made some good points.

Well, developing an open source engine or developing XSL-FO stylesheets need both efforts and time. The question is which one would be more useful.

I would like to bring two other ideas which could mitigate the pain. Not sure if it's completely insane or it has some benefits.

  1. Why not use the existing XSL-FO 1.0 base stylesheets and transform it to get FO stylesheets that are more compatible with XSLT 3.0?

    Presumably it is not a complete replacement for manually written XSLT 3.0 stylesheets. Certainly there are some issues (extension functions, no test suite etc.). If all issues could be solved, we would at least have stylesheets that could be used by Saxon >10.

  2. In regards to an open source engine that takes HTML+CSS and generates PDF, there are some possible solutions.

    We don't need to develop a new engine, we can use already existing tools:

    • Using Google Chrome's Headless Mode
      You can run Google's Chrom from the commandline.

      $ chrome --headless --print-to-pdf="output.pdf" URL
      
    • Using wkhtmltopdf
      An open-source command-line tool which uses the WebKit rendering engine.

      $ wkhtmltopdf --page-size A4 URL output.pdf
      
    • Using chromehtml2pdf
      A JavaScript command-line tool that uses Chrome's headless mode.

      $ chromehtml2pdf --out=file.pdf --landscape=1 URL
      

    I haven't tested all of them (only wkhtmltopdf a bit). Maybe it helps.

@frank-steimke
Copy link

I think we are talking about different scenarios for using DocBook stylesheets. I am sure that in the future there will be more ways to transform HTML+CSS to PDF. But I don't think they will be suitable to produce large and long-lasting documents in high quality. To give a few examples:

  • Hyphenation with custom additions and exceptions for specific terms;
  • Support for double-sided printing, where chapters always start on odd pages;
  • Floating graphics while respecting typographic criteria;
  • Accessibility and tagged PDF
  • Flexible Design of title pages with company logos
  • Systematic handling of external media such as images (relative vs. absolute URL)

The big advantage of DocBook 1.x stylesheets is that they are very mature and implement all the above requirements (and many more). And the biggest advantage: an acitve community, which feels responsible for the stylesheets, and patiently handles any kind of questions.

xslTNG together with paged media CSS will provide at least the same features. The open question is whether an open-source solution is conceivable that provides this performance even if FO is generated instead of (or in a second step from) HTML+CSS.

"Why not use the existing XSL-FO 1.0 base stylesheets and transform it to get FO stylesheets that are more compatible with XSLT 3.0?"

Well, you can use the XSLT 1.0 Stylesheets with the latest saxon Version. I did, and the only change that was absolute necessesary was a patch regarding an ancient node-set()- function. So it is of course possible to take that as the baseline for further development in the XSLT 3 direction.

But in doing so you would open a new line of XSL Stylesheets for docbook,, which would be in competition to the xslTNG line. This is not what i would like to support. My goal would be to support xslTNG as much as i can, and maybe contribute with documents (migration guide, best practice) or maybe an add-on for a translation of (HTML+CSS) to XSL-FO.

But before doing so, one shoud be convinced that there is a real need. Maybe evereyone who really needs high-quality Output with the features namend above already has a licence of a commercial prodict like Prince oder Antenna House and is totally fine with HTML paged media.

Cheers,
Frank

@tomschr
Copy link
Member

tomschr commented Aug 21, 2023

I think we are talking about different scenarios for using DocBook stylesheets.

Perhaps. 😉

I am sure that in the future there will be more ways to transform HTML+CSS to PDF.

I hope so, really.

But I don't think they will be suitable to produce large and long-lasting documents in high quality.

I'm aware that the ideas that I've suggested is probably not a solution for high-quality docs. But for some it would be enough to get at least a "decent" PDF. What's really possible needs to be tested.

xslTNG together with paged media CSS will provide at least the same features. The open question is whether an open-source solution is conceivable that provides this performance even if FO is generated instead of (or in a second step from) HTML+CSS.

And this is the crucial point. It's all nice and dandy, but with the lack of an open-source solution I fear this is difficult. I don't know about an open source implementation.

Who will use it when you have to pay for a license? Wouldn't that divide the community?

But in doing so you would open a new line of XSL Stylesheets for docbook, which would be in competition to the xslTNG line.

Is it? And so does DocBook and the stylesheets compete with other documentation formats (ASCIIDoc, Sphinx, Markdown to name the most well-known). I don't see it as something bad. 🙂

I see it more as offering an alternative. If HTML+CSS paged media cannot or don't want to be used, xou can also view it as an intermediate step to see if XSL-FO is really needed these days. If you can get some feedback or statistics then there is probably some need for it. If not, not much harm development time was wasted.

Maybe evereyone who really needs high-quality Output with the features namend above already has a licence of a commercial prodict like Prince oder Antenna House and is totally fine with HTML paged media.

This is a far-fetched assumption. 😉 Many open source projects would like to get also high-quality output. In former times it was possible with XSL-FO and it did the job well. FOP is now quite stable.

The formatting landscape today is different when XSL-FO was created and when the 1.0 stylesheets were implemented. Now we have not only DocBook, but different formats all competing with each other.

If projects cannot jump on the HTML+CSS page-media wagon, they will either use the limited option (as I showed) or they switch to something else.

@frank-steimke
Copy link

i do not intend to promote commercial products, quite the contrary. And yet the question remains, how much interest is there in a solution for high-quality FO for docbook?

If most users are satisfied with free, but not so high quality solutions, there are already products. You have pointed that out.

If the others who need high quality already have or are willing to buy commercial products for HTML CSS, they have no need for an additional FO solution.

My only point is that a vibrant community needs to come about,. Right now there are exactly two people participating in the discussion about FO for xslTNG. Add the original poster and there are three of us. That's too few in any case.

Maybe the antenna house company knows more. Their product support FO as well as HTML+CSS. They should know their customers needs.

@tomschr
Copy link
Member

tomschr commented Aug 24, 2023

And yet the question remains, how much interest is there in a solution for high-quality FO for docbook?

If most users are satisfied with free, but not so high quality solutions, there are already products. You have pointed that out.

True.

Right now there are exactly two people participating in the discussion about FO for xslTNG. Add the original poster and there are three of us. That's too few in any case.

Well, to some degree that's true, but I'm not completely sure if a hidden issue in a GitHub repo that not many know or aware of it is a good measurement. Perhaps not many have it on their radar or think it's production ready.

It would probably be more efficient and reach many more users if we asked on the docbook-apps mailing list and see what the response is.

@fsteimke
Copy link
Contributor

I found print-css.rocks. Various tools are presented and tested there, commercial as well as open source. This led me to weasyprint. Results are promising, see attached file

xnachweis-w.pdf

There is an issue with references to page numbers which results in 0 (zero) in the table of content, see issue #497. But i think it's an issue of CSS from xslTNG, not the rendering engine.

Maybe weasyprint is the open source rendering engine we were looking for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants