Migrating between the 2020-1 guidelines and the 2015-1 guidelines #404

josteinaj · 2020-11-13T12:43:18Z

Most organizations still use DTBook for certain parts of their systems. And organizations that are using the 2015-1 version of the EPUBs might need some time to adjust to the 2020-1 guidelines.

So, while not part of the guidelines revision project, these migration paths would still be interesting to have:

Nordic EPUB 3 2020-1 -> Nordic DTBook 2015-1
Nordic DTBook 2015-1 -> Nordic EPUB 3 2020-1
Nordic EPUB 3 2020-1 -> Nordic EPUB 3 2015-1
Nordic EPUB 3 2015-1 -> Nordic EPUB 3 2020-1

If we do nothing, then the downgrade to DTBook can be covered by:

DP2 script: EPUB 3 to DTBook (generic script)
XSLT: generic-to-nordic-dtbook.xsl (from nordic migrator project)

If we do something, then conversion to and from DTBook can be covered by the 2015-1 scripts, as long as we have a conversion path back and forth between Nordic EPUB 2015-1 and Nordic EPUB 2020-1.

Conversion between Nordic EPUB 2015-1 and 2020-1 could be implemented with at set of XSLTs. These could be incorporated into the migrator scripts if we want to:

suggested new XSLT: nordic-epub-2020-1-to-html.xsl (given an OPF from a 2020-1 EPUB, combine them to create a single HTML file)
suggested new XSLT: nordic-html-2020-1-to-2015-1.xsl (convert a single HTML file marked up according to the 2020-1 guidelines into a single HTML file marked up according to the 2015-1 guidelines)
suggested new XSLT: nordic-html-2015-1-to-2020-1.xsl (convert a single HTML file marked up according to the 2015-1 guidelines into a single HTML file marked up according to the 2020-1 guidelines)
suggested new XSLT: nordic-html-2020-1-to-epub.xsl (create a package document, a navigation document, and all the content documents, based on a single HTML file)

martinpub · 2021-04-30T09:30:08Z

Setting priority to low for the time being. Currently I think only MTM and SPSM are starting to use the 2020-1 guidelines in production, and none of us prioritise this atm.

martinpub · 2021-10-14T12:03:52Z

@josteinaj We might be interested in this. How much work would this entail, in very rough terms? Big job, small, medium?

josteinaj · 2021-10-17T20:28:53Z

Maybe… medium?

If it's possible to run an XSLT on each HTML file (and the OPF) separately, then we could write some XSLTs and package them as an upgrader and a downgrader script. Do you think we need to do something other than transforming the file contents? We might need to generate an NCX when downgrading, and deleting it when upgrading etc.

martinpub · 2022-03-29T09:42:22Z

So complexity increases considerably if there are actions that need to do checks/fixes across files, right? Not sure if that's the case actually. @AndersEkl, any thoughts?

josteinaj · 2022-03-29T11:06:39Z

Another path could be to go through a single-HTML representation of the book. Merge all HTML files into a single one, then run an XSLT on it, then use the 2015 script to convert from HTML to EPUB 3. I don't know if this is more work or less work than having a straight EPUB-to-EPUB conversion.

Can we simply concatenate the content of all the <body> elements?
Can we do without the OPF file and navigation document (i.e. have a "pure" HTML fileset)?

All the HTML files have a wrapping <section> element, so maybe we can attach some data-attributes or similar there to preserve metadata about the content (spine linear=yes/no, spine/manifest properties). Maybe this is not relevant when converting to the 2015-1 guidelines, I'm not sure.

Most of the package document metadata can be converted to HTML metadata. HTML has no mechanism for the refines attribute in OPF, so that metadata must be discarded. I don't think that's going to be a problem in practice.

As for the navigation document, I don't think there's any information there that we need to preserve.

The Nordic HTML 5 to EPUB 3 (2015-1) script will create a new OPF and navigation document based on the HTML file.

Here's a suggestion for a single HTML representation of a 2020-1 EPUB:

Copy the <html> tag from the first HTML file in the spine
Convert the <metadata> in the OPF to a <head> in HTML. There should be a XSLT for that already.
Concatenate the contents of all the <body> elements and wrap it into a new <body> element
All href attributes, src attributes etc. with relative references to a file with the xhtml file extension should have everything up to the # removed so that they reference the current document

This can be added to the EPUB 3 to HTML 5 script, when passing in an EPUB declaring that it follows the 2020-1 guidelines.

Then we create a new script called for instance "HTML 5 Downgrade".

The only thing in this script is a single XSLT that transforms the HTML file.

josteinaj added Medium priority validator-revision EPUB 3 / HTML Validator revision: 2020-1 labels Nov 13, 2020

martinpub added Low priority and removed Medium priority labels Apr 30, 2021

josteinaj mentioned this issue Oct 18, 2022

Further development, 2023 ("phase two") #523

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating between the 2020-1 guidelines and the 2015-1 guidelines #404

Migrating between the 2020-1 guidelines and the 2015-1 guidelines #404

josteinaj commented Nov 13, 2020 •

edited

martinpub commented Apr 30, 2021 •

edited

martinpub commented Oct 14, 2021

josteinaj commented Oct 17, 2021

martinpub commented Mar 29, 2022 •

edited

josteinaj commented Mar 29, 2022

Migrating between the 2020-1 guidelines and the 2015-1 guidelines #404

Migrating between the 2020-1 guidelines and the 2015-1 guidelines #404

Comments

josteinaj commented Nov 13, 2020 • edited

martinpub commented Apr 30, 2021 • edited

martinpub commented Oct 14, 2021

josteinaj commented Oct 17, 2021

martinpub commented Mar 29, 2022 • edited

josteinaj commented Mar 29, 2022

josteinaj commented Nov 13, 2020 •

edited

martinpub commented Apr 30, 2021 •

edited

martinpub commented Mar 29, 2022 •

edited