Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files was not modified #51

Open
Goku103 opened this issue Jul 27, 2018 · 10 comments
Open

Files was not modified #51

Goku103 opened this issue Jul 27, 2018 · 10 comments

Comments

@Goku103
Copy link

Goku103 commented Jul 27, 2018

Hello,

I've been encountering an issue recently.
When I attempt to use the OCR on a PDF file, a message appears saiying "the document will be availabe in minutes" but i can't find any file converted. And the original file was not modified.

I want to kown if someone has ever seen this issue or help me to get the pdf converted.

Thank you in advance.

@angelborroy-ks
Copy link
Contributor

Please include a detailed stacktrace from alfresco.log or catalina.out.

Thanks.

@Goku103
Copy link
Author

Goku103 commented Jul 27, 2018

In alfresco.log, i don't have errors.

Catalina.out

Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 06270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183) at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161) at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464) at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169) at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 06270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86) at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:181) ... 10 more Caused by: org.alfresco.service.cmr.repository.ContentIOException: 06270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 11 more

@angelborroy-ks
Copy link
Contributor

Probably you can test from command line the transformation that is not working to find more details:

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf

@Goku103
Copy link
Author

Goku103 commented Jul 27, 2018

Ok, First I have

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf DEBUG - ocrmypdf 7.0.0 DEBUG - tesseract 4.0.0-beta.3-249-g607e DEBUG - qpdf 8.0.2 ERROR - The installed version of tesseract does not have language data for the following requested languages: spa

I launch the command line without "spa"
Now I have

`/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf
DEBUG - ocrmypdf 7.0.0
DEBUG - tesseract 4.0.0-beta.3-249-g607e
DEBUG - qpdf 8.0.2
WARNING - The installed version of Ghostscript does not work correctly with the OCR languages you specified. Use --output-type pdf or upgrade to Ghostscript 9.20 or later to avoid this issue.Found Ghostscript 9.18
DEBUG - os.symlink(/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/origin)


Tasks which will be run:

Task enters queue = 'ocrmypdf._pipeline.triage'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/origin, /tmp/com.github.ocrmypdf.jqqkrft4/origin.pdf)
Completed Task = 'ocrmypdf._pipeline.triage'
Task enters queue = 'ocrmypdf._pipeline.repair_and_parse_pdf'
DEBUG - <PdfInfo('...'), page count=1>
Completed Task = 'ocrmypdf._pipeline.repair_and_parse_pdf'
Task enters queue = 'ocrmypdf._pipeline.marker_pages'
Task enters queue = 'ocrmypdf._pipeline.generate_postscript_stub'
Completed Task = 'ocrmypdf._pipeline.marker_pages'
Task enters queue = 'ocrmypdf._pipeline.ocr_or_skip'
Completed Task = 'ocrmypdf._pipeline.generate_postscript_stub'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.marker.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.page.pdf)
Completed Task = 'ocrmypdf._pipeline.ocr_or_skip'
Task enters queue = 'ocrmypdf._pipeline.orient_page'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.page.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.oriented.pdf)
Completed Task = 'ocrmypdf._pipeline.orient_page'
Task enters queue = 'ocrmypdf._pipeline.rasterize_with_ghostscript'
DEBUG - Rasterize 000001.ocr.oriented.pdf with pngmono
DEBUG - ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pngmono', '-dFirstPage=1', '-dLastPage=1', '-r600x600', '-o', '/tmp/tmpanbb5z_n', '-dAutoRotatePages=/None', '-f', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.oriented.pdf']
DEBUG -
DEBUG - Rotating output by 0
Completed Task = 'ocrmypdf._pipeline.rasterize_with_ghostscript'
Task enters queue = 'ocrmypdf._pipeline.preprocess_remove_background'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.page.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-background.png)
Completed Task = 'ocrmypdf._pipeline.preprocess_remove_background'
Task enters queue = 'ocrmypdf._pipeline.preprocess_deskew'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-background.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-deskew.png)
Completed Task = 'ocrmypdf._pipeline.preprocess_deskew'
Task enters queue = 'ocrmypdf._pipeline.preprocess_clean'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-deskew.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-clean.png)
Completed Task = 'ocrmypdf._pipeline.preprocess_clean'
Task enters queue = 'ocrmypdf._pipeline.select_ocr_image'
Task enters queue = 'ocrmypdf._pipeline.select_visible_page_image'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-clean.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.png)
Completed Task = 'ocrmypdf._pipeline.select_ocr_image'
Task enters queue = 'ocrmypdf._pipeline.ocr_tesseract_textonly_pdf'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.page.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.image)
Completed Task = 'ocrmypdf._pipeline.select_visible_page_image'
Task enters queue = 'ocrmypdf._pipeline.select_image_layer'
DEBUG - 1: convert
DEBUG - 1: convert done
DEBUG - ['tesseract', '-l', 'eng+fra', '-c', 'textonly_pdf=1', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.png', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.text', 'pdf', 'txt']
Completed Task = 'ocrmypdf._pipeline.select_image_layer'
Completed Task = 'ocrmypdf._pipeline.ocr_tesseract_textonly_pdf'
Task enters queue = 'ocrmypdf._weave.weave_layers'
DEBUG - 1
DEBUG - ['/tmp/com.github.ocrmypdf.jqqkrft4/000001.image-layer.pdf', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.text.pdf', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.text.txt']
DEBUG - Replace
DEBUG - [0, 0, 0, 0]
DEBUG - Grafting
DEBUG - (0.9999522656985343, 0.9999523775592609)
Completed Task = 'ocrmypdf._weave.weave_layers'
Task enters queue = 'ocrmypdf._pipeline.metadata_fixup'
DEBUG - ['gs', '-dQUIET', '-dBATCH', '-dNOPAUSE', '-dCompatibilityLevel=1.6', '-dNumRenderingThreads=2', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=/RGB', '-sProcessColorModel=DeviceRGB', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-sOutputFile=/tmp/tmpxkb793zf', '/tmp/com.github.ocrmypdf.jqqkrft4/layers.rendered.pdf', '/tmp/com.github.ocrmypdf.jqqkrft4/pdfa.ps']
DEBUG -
Completed Task = 'ocrmypdf._pipeline.metadata_fixup'
Task enters queue = 'ocrmypdf._pipeline.optimize_pdf'
DEBUG - Optimizable images: JBIG2 groups: 0 JPEGs: 0 PNGs: 1 Errors: 0
INFO - Optimize ratio: 1.00 savings: -0.0%
INFO - Optimize did not improve the file - discarded
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/metafix.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/metafix.optimized.pdf)
Completed Task = 'ocrmypdf._pipeline.optimize_pdf'
Task enters queue = 'ocrmypdf._pipeline.copy_final'
DEBUG - /tmp/com.github.ocrmypdf.jqqkrft4/metafix.optimized.pdf -> /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf
Completed Task = 'ocrmypdf._pipeline.copy_final'
INFO - Output file is a PDF/A-2B (as expected)
DEBUG - <PdfInfo('...'), page count=1>`

@Goku103
Copy link
Author

Goku103 commented Jul 27, 2018

Now, i can found my file OCR if I download directly in SSH to my server.
But I must launch the command line manually to transform my file.
On the web interface, the file is not transformed.

Sorry for my english

@angelborroy-ks
Copy link
Contributor

So you have to include/exclude the missing options ("spa" and so on) in your alfresco-global.properties and it's done.

@Goku103
Copy link
Author

Goku103 commented Jul 27, 2018

Yes but when i click on the button OCR to the web interface of Alfresco, I have nothing.
In catalina.out, i have :

Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86) at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:181) ... 10 more Caused by: org.alfresco.service.cmr.repository.ContentIOException: 06270065 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 11 more

If i launch the command line manually

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766_ocr.pdf

It's work but i have nothing in Alfresco.

@angelborroy-ks
Copy link
Contributor

Then probably is an environment problem.

Check the FAQ section.

@Goku103
Copy link
Author

Goku103 commented Jul 30, 2018

My environment ?
Normally, when you click on the OCR button, the file changes automatically ?

@gtnieto
Copy link

gtnieto commented Jun 26, 2020

I have the same problem that Goku103 but im working with pdfsandwich.
Whe i run the conversion by command line, i guet the extra file changed.
But when i order the convertion into Alfresco, the same message its received, but no new file its create. In fact, nothing happend.

I have to say that my implementation of Alfresco its in Ubuntu. Maybe the sintax on properties should be different on this part?? ocr.server.os=linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants