Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple OCR + Alfresco with pdfsandwich error with "ocr.extra.commands" #70

Open
stek6 opened this issue Sep 27, 2022 · 0 comments
Open

Comments

@stek6
Copy link

stek6 commented Sep 27, 2022

BUG

Alfresco OCR plugin based on pdfsanwhich returns error when performing OCR transformation from Alfresco Share with additional options (for example "-rgb" or "-resolution") added in alfresco-global.properties in "ocr.extra.commands".

Expected behavior

By adding extra parameters to the OCR configuration ( "ocr.extra.commands") in alfresco-global.properties, launching the transformation manually or activating the transformation rule, the pdf file must be correctly transformed using all the extra parameters added (for example with "-rgb" must also acquire the colors being transformed)

Actual behavior

Added extra parameters (for example "-rgb" or "-resolution") in alfresco-global.properties in "ocr.extra.commands"

By launching the transformation from Alfresco Share (either by uploading a file with the transformation rule set on the folder) or by manually executing the transformation, the pdf file is not transformed and we encounter an error in catalina.out
Without adding the parameters in the section "ocr.extra.commands" the transformation works correctly, both from cmd and from Alfresco Share

By running the transformation from the command line (or with a bash script) as alfresco user, however, the OCR transformation is performed correctly and the additional parameters indicated in alfresco-global.properties, in "ocr.extra.commands", are correctly passed

Steps to reproduce the behavior

  1. Modify alfresco-global.properties and add some extra parameters (for example "-rgb"), as indicated in the documentation (http://www.tobias-elze.de/pdfsandwich/)
  2. After restarting Alfresco insert a pdf file for transformation
  3. Wait for the execution of the job or manually launch the transformation
  4. Verify that the file has not been transformed on Alfresco Share (it remains the same version)
  5. The log (catalina.out) shows the error reported below ([...] Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException:[...])

Additional details (analysis so far, log statements, references, etc.)

### Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 08270022 Failed to perform OCR transformation:
Execution result:
os: Linux
command: /usr/local/bin/pdfsandwich -rgb -verbose -lang spa+eng+fra /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5058787576520103913.pdf -o /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_50587875765201
03913_ocr.pdf
succeeded: false
exit code: 2
out: pdfsandwich version 0.1.6
Checking for convert:
convert -version
Version: ImageMagick 7.0.5-2 Q16 x86_64 2017-04-04 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Featur
err: pdfinfo version 0.26.5
Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
pdfunite version 0.26.5
Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996
at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183)
at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38)
at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164)
at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169)
at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38)
at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 08270022 Failed to perform OCR transformation:
Execution result:
os: Linux
command: /usr/local/bin/pdfsandwich -rgb -verbose -lang spa+eng+fra /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5058787576520103913.pdf -o /opt/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_5058787576520103913_ocr.pdf
succeeded: false
exit code: 2
out: pdfsandwich version 0.1.6
Checking for convert:
convert -version
Version: ImageMagick 7.0.5-2 Q16 x86_64 2017-04-04 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Featur
err: pdfinfo version 0.26.5
Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
pdfunite version 0.26.5

Tell us about your environment

Alfresco Community - 5.2.0
leptonica-1.74.4
pdfsandwich-0.1.6
tessdata-3.04.00
tesseract-3.05.00
unpaper-0.3-4

FEATURE / ENHANCEMENT

If you are requesting a feature or enhancement, please provide as much information as
possible and let us know how you will be able to contribute to resolving the request.

If you write code and can code up the solution, we welcome PRs. If you can do this but
would like guidance from the core team let us know.

Are you willing/able to test any work we do towards your request?

If you plan to contribute to the project and you are not familiar with our current
contribution policy, please make sure you have read that document (HINT: there is
a link at the top of the page when you are creating an issue.)
1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant