Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move as much code as possible from book-to-pef to pipeline-modules #225

Open
7 of 17 tasks
bertfrees opened this issue Oct 10, 2019 · 18 comments
Open
7 of 17 tasks

Move as much code as possible from book-to-pef to pipeline-modules #225

bertfrees opened this issue Oct 10, 2019 · 18 comments
Assignees
Labels

Comments

@bertfrees
Copy link
Collaborator

bertfrees commented Oct 10, 2019

Hyphenation

  • Produce the hyphenation files in a separate project, deploy it somewhere (e.g. in Maven Central in a ZIP) and import it in libhyphen-utils.

    Some useful work has been done in this repository: https://github.com/nlbdev/spell-no (the "norsk/patterns" subdirectory). It includes a build script for the patterns file and a UI for viewing the input words (that are used to build the patterns file) and hyphenations.

    In progress: see Pack hyphenation table in ZIP and publish to Maven spell-no#1.

Script options

  • The script options that are generic enough can be moved to xml-to-pef. This is linked to the "Style sheets" section below because most of the script options are implemented using style sheets. Note that, just like the "Style sheets" section, this is not an essential item because options that are solely used to set Sass variables can now be handled with the "Style sheet parameters" option.
  • The Norwegian translations can also be moved once we have support for internationalization. This is only relevant for manual productions.
  • (What would be really nice, but probably out of scope, is if we would be able to analyze a provided style sheet and based on this dynamically compute the relevant options and present only these to the user.)

Translator features

Style sheets

The items below are not essential because style sheets are considered just another input for Pipeline. They don't need to be included inside the pipeline-mod-nlb module. With some small modifications, the existing style sheets should still work with the latest version. The benefit of porting CSS code is that it becomes available for everyone.

  • We can select bits of CSS code around certain elements, make it more configurable (less NLB specific) and move them to pipeline-modules, just like we did for the volume-breaking.scss module. Candidates are:
    • Captions
      • configurable prefix for image captions
      • configurable formatting of captions
      • option to omit captions
    • Images
      • configurable prefix/suffix for image descriptions
      • configurable formatting of images
      • option to omit images
      • option to omit image groups
    • Notes: done: see the "Notes placement" option (http://daisy.github.io/pipeline/Get-Help/User-Guide/Scripts/html-to-pef/)
      • configurable formatting of noterefs
      • configurable formatting of notes
      • configurable position of notes: bottom of page, end of volume, end of book, end of chapter, bottom of page with fallback to end of volume, beneath paragraph/table/verse
      • different positioning for different classes of notes
      • configurable footnotes page area
      • configurable title of endnotes section
      • configurable page style of endnotes section
      • configurable definition of "chapter" (for chapter notes)
      • configurable "on-volume-start" of end-of-book notes section
      • option to omit note references
    • Print page break indication
      • option to put page break marker in left margin
      • option to put page break marker in header/footer if print page break coincides with braille page break
      • option to render line across width of page
      • option to render print page number (at the position of the break)
      • option to render print page (range) in header/footer
      • configurable format of print page range
    • Tables: done: see http://daisy.github.io/pipeline/modules/braille/html-to-pef/src/main/resources/css/tables.html
      • table layout classes: matrix, simple list, nested list
      • option to transpose tables
      • option dynamically choose table layout based on certain parameters
      • configurable formatting of different table classes
      • configurable formatting of table captions
    • Definition lists: done: see http://daisy.github.io/pipeline/modules/braille/html-to-pef/src/main/resources/css/definition-lists.html
      • option to group dt element with following dd elements (this is not something that can be done with plain CSS)
@bertfrees
Copy link
Collaborator Author

analyze a provided style sheet and based on this dynamically compute the relevant options and present only these to the user

This is probably too complicated. However what would be doable is to filter script options from a "base script XML" at build time (like in the px:extends mechanism), but based on a list of CSS (or XSLT) style sheets. This could work by allowing to associate an option in the base script with a style sheet URL (and possibly a variable name). A parameter port in the "extending script" could be associated with a list of style sheets.

Obviously you'll still need your own custom script, but it will require less maintenance because we can put a lot more options and style sheet modules in the Pipeline, available to everyone. Your custom script will only include some inputs, outputs and standard options (not associated with a style sheet, or specific to NLB), one or more parameter ports, and the logic to invoke the actual conversion (load, convert, store).

The other big advantage of this is that I can remove options from the generic scripts that currently say "Not implemented". I would also make the "default.scss" style sheet, that is currently always included, public and free to include or not by custom scripts.

It will be a good idea to remove the "stylesheet" option from custom scripts because the available options will be determined at build time and therefore related to a fixed style sheet. So all formatting options that you want to present to the user should be in this fixed style sheet.

I think I would wrap the content of style sheet modules inside a big @if, so that you could make conditional imports, for example:

$chose-between-module-x-and-y: x !default;
$enable-module-x: $chose-between-module-x-and-y == x;
@import "http://www.daisy.org/pipeline/modules/braille/html-to-pef/css/module-x.scss";
$enable-module-y: $chose-between-module-x-and-y == y;
@import "http://www.daisy.org/pipeline/modules/braille/html-to-pef/css/module-y.scss";

To avoid that you would have to override option documentation, I would remove the parts that say "includes the following rule by default ...", because you're likely to override some of these default (too simple) rules in your custom style sheet, and so the documentation would be wrong.

Obviously the option documentation should also be internationalized.

@josteinaj
Copy link
Member

Sounds good.

I think it would be nice to have a stylesheet option still though, in case we want to try out some new rules, or override some existing rules (for testing purposes or as a one-off production).

@bertfrees
Copy link
Collaborator Author

OK sure, that's still possible, as long as the user is aware that the options belong to the default style sheet.

bertfrees added a commit to daisy/pipeline-build-utils that referenced this issue Aug 11, 2020
…eter port

For steps that "px:extends" another step, the "px:options" attribute
copies all the options in the specified namespace, and connects the
options with the parameter port.

See nlbdev/pipeline#225 (comment)
@josteinaj josteinaj assigned bertfrees and unassigned bertfrees Mar 26, 2021
@josteinaj
Copy link
Member

CC @kalaspuffar

@josteinaj
Copy link
Member

For automated production (which we do for most of our books), I think it's only hyphenation that is remaining.

For manual production, we might need translations and script options, but we'll see. We can do some testing when hyphenation is done.

@bertfrees
Copy link
Collaborator Author

@josteinaj @kalaspuffar I've updated the list.

@josteinaj
Copy link
Member

Hi @bertfrees and @kalaspuffar.

Just checking in on the status here. Any progress? Any blocking issues?

I think we agreed in our last meeting that the only thing needed for us initially to be able to start testing is that Norwegian hyphenation needs to be moved/migrated/implemented in the main PIP version.

@bertfrees
Copy link
Collaborator Author

bertfrees commented Feb 15, 2022

I am waiting for a Norwegian hyphenation table to become available somewhere, preferrably on Maven (because permanent and versioned), so that I can import it in Pipeline. That is how I see my responsibility in this.

If there is no progress, I'm also willing to manually build and copy the table from https://github.com/nlbdev/spell-no into Pipeline, so that we at least have an initial version that you can test. But I think this is not a good solution for the long run because it does not allow easy updates to newer versions (or at least I don't want to be the one to do the updates).

I'm also willing to take on the job of writing a script to build and deploy the hyphenation table (based on https://github.com/nlbdev/spell-no). But I'd still need your help to set it up (e.g. the Maven groupId, assuming we'll use Maven).

It only really makes sense to go this path though if somebody is actually going to maintain the hyphenation table. If it is not going to be updated, it does not make sense to create a dedicated project for it. And I'm kind of worried that no one might take up the responsibility to maintain it (given the time it takes to just publish an initial version).

@josteinaj
Copy link
Member

josteinaj commented Feb 18, 2022

Ok, thanks.

If it is easy to maintain the table for non-technical people, then I think we could do that part of it.

The technical setup with Maven and other tooling, I don't think we have the time to do much work with internally at NLB. But I suppose that once it's set up, it's not much work to maintain?

I'll try contacting Språkbanken ("the language bank") and Nasjonalbiblioteket (the national library) and see if there's someone interested in helping maintain this.

@josteinaj
Copy link
Member

If there is no progress, I'm also willing to manually build and copy the table from https://github.com/nlbdev/spell-no into Pipeline, so that we at least have an initial version that you can test. But I think this is not a good solution for the long run because it does not allow easy updates to newer versions (or at least I don't want to be the one to do the updates).

@bertfrees I think it would be useful if you could build and copy the table from spell-no into Pipeline, yes. And for future updates of the hyphenation table, we would need to do it using a separate project. The current version of the hyphenation table is good enough for us to use in production (after all, we already use that version today).

@josteinaj
Copy link
Member

I can set up the Maven repository when needed. I would like to do it maybe solely in Github using releases, to keep things more simple, but either way, I can set it up when needed.

@bertfrees
Copy link
Collaborator Author

I would like to do it maybe solely in Github using releases

As you like. As long as it is easy to fetch updates. Like, change some version number in some pom file.

Github Packages might be another option?

I'll try contacting Språkbanken ("the language bank") and Nasjonalbiblioteket (the national library) and see if there's someone interested in helping maintain this.

Good idea.

@josteinaj
Copy link
Member

Github Packages makes sense. So:

  1. you'll set up a maven build for spell-no
  2. I'll set up Github Packages and make a release there
  3. you'll reference that release from pipeline

seems right?

@bertfrees
Copy link
Collaborator Author

Yes, seems right.

@bertfrees
Copy link
Collaborator Author

@josteinaj I created a PR: nlbdev/spell-no#1

@josteinaj
Copy link
Member

Thanks! I will have a look as soon as possible.

@bertfrees
Copy link
Collaborator Author

bertfrees commented Mar 25, 2022

My contribution to solving this issue is now more or less done. Most things that could be done on the Pipeline side are done. The only thing that is left is to add support for internationalization. But that is a big change so we might want to prioritize other things over it.

Regarding this other boxes that have not been checked off yet:

I may also port some more CSS code in the future but that has lower priority.

@josteinaj
Copy link
Member

Great, thanks 👍.

I have the hyphenation issue high on my list, but haven't gotten to it yet, sorry.

I'll leave this issue open until we have started testing hyphenation in the main pipeline branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants