Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FoLiA-page: add support for linebreaks #65

Open
proycon opened this issue Aug 18, 2021 · 6 comments
Open

FoLiA-page: add support for linebreaks #65

proycon opened this issue Aug 18, 2021 · 6 comments
Assignees
Labels
enhancement ready Ready but not yet in a published release

Comments

@proycon
Copy link
Member

proycon commented Aug 18, 2021

PageXML textlines are currently not reflected in the original paragraph text, we can insert linebreaks to make the line more explicit and we can use <t-str> to more explicitly mark the text lines. These are linked to the <str> annotations that are produced already (and where in turn an explicit relation with the original PageXML TextLine is stored). The use-case for this that FLAT requires this explicit information to properly display the document, and we may have an annotation task (knaw-huc/golden-agents-htr#1).

I implemented this in the page-br branch but currently fails because of text validation issue proycon/folia#101.

@proycon proycon self-assigned this Aug 18, 2021
proycon added a commit that referenced this issue Aug 19, 2021
… like this but it seems needed (underlying libfolia issue?) #65
proycon added a commit that referenced this issue Aug 19, 2021
proycon added a commit that referenced this issue Aug 19, 2021
…ur, and an extra --nostrings parameter to omit the strings #65
@proycon proycon added the ready Ready but not yet in a published release label Aug 19, 2021
@kosloot
Copy link
Contributor

kosloot commented Mar 13, 2023

Assuming this is solved

@kosloot kosloot closed this as completed Mar 13, 2023
@pirolen
Copy link

pirolen commented Mar 13, 2023

Would it be possible to treat end-of-line hyphens in the same way as FoLiA-txt does? #67

@kosloot
Copy link
Contributor

kosloot commented Mar 14, 2023

I assume this is doable. But Page documents have a rather exotic structure, so this needs some studying.

If you could provide me with a SHORT page document with a few hyphens? Maybe soft-hyphens too?

@kosloot kosloot reopened this Mar 14, 2023
@kosloot
Copy link
Contributor

kosloot commented Mar 14, 2023

if you could provide me with a SHORT page document with a few hyphens? Maybe soft-hyphens too?
No need to send it, already found one.

@pirolen
Copy link

pirolen commented Mar 14, 2023

Thank you for looking into it. It is by far not a priority, I was just wondering. Happy that the other tools can do it.

@kosloot
Copy link
Contributor

kosloot commented May 9, 2023

I updated git master now with the newpage branch.
FoLiA-page will now interpret trailing hyphens (both - and ¬) and add them as <t-hbr> nodes, just like FoLiA-abby does

@pirolen I tested quite a bit, but feedback is still welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ready Ready but not yet in a published release
Projects
None yet
Development

No branches or pull requests

3 participants