Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to evaluate PAGE-XML with namespace prefix #26

Open
kba opened this issue Sep 30, 2021 · 0 comments
Open

Failure to evaluate PAGE-XML with namespace prefix #26

kba opened this issue Sep 30, 2021 · 0 comments

Comments

@kba
Copy link
Contributor

kba commented Sep 30, 2021

When evaluating PAGE-XML that has a namespace prefix, as is the case for OCR-D, evaluation fails with

Exception in thread "main" eu.digitisation.utils.input.WarningException: Unsupported file format (UNKNOWN format) for file OCR-D-OCR-TESS-ONLY_0001.xml                      
        at eu.digitisation.text.Text.<init>(Text.java:121)                                                                                                                   
        at eu.digitisation.text.Text.<init>(Text.java:153)                                                                                                                   
        at eu.digitisation.output.Report.<init>(Report.java:117)                                                                                                             
        at eu.digitisation.Main.main(Main.java:99)     

If I change

<pc:PcGts xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ...

to

<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

using

sed -i 's,pc:,,g' $f         
sed -i 's,xmlns:pc,xmlns,' $f

evaluation works as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant