Files was not modified #51

Goku103 · 2018-07-27T08:57:41Z

Hello,

I've been encountering an issue recently.
When I attempt to use the OCR on a PDF file, a message appears saiying "the document will be availabe in minutes" but i can't find any file converted. And the original file was not modified.

I want to kown if someone has ever seen this issue or help me to get the pdf converted.

Thank you in advance.

angelborroy-ks · 2018-07-27T08:59:32Z

Please include a detailed stacktrace from alfresco.log or catalina.out.

Thanks.

Goku103 · 2018-07-27T09:22:51Z

In alfresco.log, i don't have errors.

Catalina.out

Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 06270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183) at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164) at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161) at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464) at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169) at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38) at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 06270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86) at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:181) ... 10 more Caused by: org.alfresco.service.cmr.repository.ContentIOException: 06270022 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 11 more

angelborroy-ks · 2018-07-27T09:25:19Z

Probably you can test from command line the transformation that is not working to find more details:

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf

Goku103 · 2018-07-27T09:35:39Z

Ok, First I have

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf DEBUG - ocrmypdf 7.0.0 DEBUG - tesseract 4.0.0-beta.3-249-g607e DEBUG - qpdf 8.0.2 ERROR - The installed version of tesseract does not have language data for the following requested languages: spa

I launch the command line without "spa"
Now I have

`/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf
DEBUG - ocrmypdf 7.0.0
DEBUG - tesseract 4.0.0-beta.3-249-g607e
DEBUG - qpdf 8.0.2
WARNING - The installed version of Ghostscript does not work correctly with the OCR languages you specified. Use --output-type pdf or upgrade to Ghostscript 9.20 or later to avoid this issue.Found Ghostscript 9.18
DEBUG - os.symlink(/opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/origin)

Tasks which will be run:

Task enters queue = 'ocrmypdf._pipeline.triage'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/origin, /tmp/com.github.ocrmypdf.jqqkrft4/origin.pdf)
Completed Task = 'ocrmypdf._pipeline.triage'
Task enters queue = 'ocrmypdf._pipeline.repair_and_parse_pdf'
DEBUG - <PdfInfo('...'), page count=1>
Completed Task = 'ocrmypdf._pipeline.repair_and_parse_pdf'
Task enters queue = 'ocrmypdf._pipeline.marker_pages'
Task enters queue = 'ocrmypdf._pipeline.generate_postscript_stub'
Completed Task = 'ocrmypdf._pipeline.marker_pages'
Task enters queue = 'ocrmypdf._pipeline.ocr_or_skip'
Completed Task = 'ocrmypdf._pipeline.generate_postscript_stub'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.marker.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.page.pdf)
Completed Task = 'ocrmypdf._pipeline.ocr_or_skip'
Task enters queue = 'ocrmypdf._pipeline.orient_page'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.page.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.oriented.pdf)
Completed Task = 'ocrmypdf._pipeline.orient_page'
Task enters queue = 'ocrmypdf._pipeline.rasterize_with_ghostscript'
DEBUG - Rasterize 000001.ocr.oriented.pdf with pngmono
DEBUG - ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pngmono', '-dFirstPage=1', '-dLastPage=1', '-r600x600', '-o', '/tmp/tmpanbb5z_n', '-dAutoRotatePages=/None', '-f', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.oriented.pdf']
DEBUG -
DEBUG - Rotating output by 0
Completed Task = 'ocrmypdf._pipeline.rasterize_with_ghostscript'
Task enters queue = 'ocrmypdf._pipeline.preprocess_remove_background'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.page.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-background.png)
Completed Task = 'ocrmypdf._pipeline.preprocess_remove_background'
Task enters queue = 'ocrmypdf._pipeline.preprocess_deskew'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-background.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-deskew.png)
Completed Task = 'ocrmypdf._pipeline.preprocess_deskew'
Task enters queue = 'ocrmypdf._pipeline.preprocess_clean'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-deskew.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-clean.png)
Completed Task = 'ocrmypdf._pipeline.preprocess_clean'
Task enters queue = 'ocrmypdf._pipeline.select_ocr_image'
Task enters queue = 'ocrmypdf._pipeline.select_visible_page_image'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.pp-clean.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.png)
Completed Task = 'ocrmypdf._pipeline.select_ocr_image'
Task enters queue = 'ocrmypdf._pipeline.ocr_tesseract_textonly_pdf'
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/000001.page.png, /tmp/com.github.ocrmypdf.jqqkrft4/000001.image)
Completed Task = 'ocrmypdf._pipeline.select_visible_page_image'
Task enters queue = 'ocrmypdf._pipeline.select_image_layer'
DEBUG - 1: convert
DEBUG - 1: convert done
DEBUG - ['tesseract', '-l', 'eng+fra', '-c', 'textonly_pdf=1', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.ocr.png', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.text', 'pdf', 'txt']
Completed Task = 'ocrmypdf._pipeline.select_image_layer'
Completed Task = 'ocrmypdf._pipeline.ocr_tesseract_textonly_pdf'
Task enters queue = 'ocrmypdf._weave.weave_layers'
DEBUG - 1
DEBUG - ['/tmp/com.github.ocrmypdf.jqqkrft4/000001.image-layer.pdf', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.text.pdf', '/tmp/com.github.ocrmypdf.jqqkrft4/000001.text.txt']
DEBUG - Replace
DEBUG - [0, 0, 0, 0]
DEBUG - Grafting
DEBUG - (0.9999522656985343, 0.9999523775592609)
Completed Task = 'ocrmypdf._weave.weave_layers'
Task enters queue = 'ocrmypdf._pipeline.metadata_fixup'
DEBUG - ['gs', '-dQUIET', '-dBATCH', '-dNOPAUSE', '-dCompatibilityLevel=1.6', '-dNumRenderingThreads=2', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=/RGB', '-sProcessColorModel=DeviceRGB', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-sOutputFile=/tmp/tmpxkb793zf', '/tmp/com.github.ocrmypdf.jqqkrft4/layers.rendered.pdf', '/tmp/com.github.ocrmypdf.jqqkrft4/pdfa.ps']
DEBUG -
Completed Task = 'ocrmypdf._pipeline.metadata_fixup'
Task enters queue = 'ocrmypdf._pipeline.optimize_pdf'
DEBUG - Optimizable images: JBIG2 groups: 0 JPEGs: 0 PNGs: 1 Errors: 0
INFO - Optimize ratio: 1.00 savings: -0.0%
INFO - Optimize did not improve the file - discarded
DEBUG - os.symlink(/tmp/com.github.ocrmypdf.jqqkrft4/metafix.pdf, /tmp/com.github.ocrmypdf.jqqkrft4/metafix.optimized.pdf)
Completed Task = 'ocrmypdf._pipeline.optimize_pdf'
Task enters queue = 'ocrmypdf._pipeline.copy_final'
DEBUG - /tmp/com.github.ocrmypdf.jqqkrft4/metafix.optimized.pdf -> /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_8049813577400375998_ocr.pdf
Completed Task = 'ocrmypdf._pipeline.copy_final'
INFO - Output file is a PDF/A-2B (as expected)
DEBUG - <PdfInfo('...'), page count=1>`

Goku103 · 2018-07-27T10:07:26Z

Now, i can found my file OCR if I download directly in SSH to my server.
But I must launch the command line manually to transform my file.
On the web interface, the file is not transformed.

Sorry for my english

angelborroy-ks · 2018-07-27T10:16:02Z

So you have to include/exclude the missing options ("spa" and so on) in your alfresco-global.properties and it's done.

Goku103 · 2018-07-27T11:58:59Z

Yes but when i click on the button OCR to the web interface of Alfresco, I have nothing.
In catalina.out, i have :

Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86) at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:181) ... 10 more Caused by: org.alfresco.service.cmr.repository.ContentIOException: 06270065 Failed to perform OCR transformation: Execution result: os: Linux command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766_ocr.pdf succeeded: false exit code: 1 out: err: Traceback (most recent call last): File "/usr/local/bin/ocrmypdf", line 7, in <module> from ocrmypdf.__main__ import run_pipeline File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/__main__.py", line 70, in <module> verify_python3_env( at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79) ... 11 more

If i launch the command line manually

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng+fra /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_3205329177977384766_ocr.pdf

It's work but i have nothing in Alfresco.

angelborroy-ks · 2018-07-27T12:00:30Z

Then probably is an environment problem.

Check the FAQ section.

Goku103 · 2018-07-30T14:59:32Z

My environment ?
Normally, when you click on the OCR button, the file changes automatically ?

gtnieto · 2020-06-26T19:21:50Z

I have the same problem that Goku103 but im working with pdfsandwich.
Whe i run the conversion by command line, i guet the extra file changed.
But when i order the convertion into Alfresco, the same message its received, but no new file its create. In fact, nothing happend.

I have to say that my implementation of Alfresco its in Ubuntu. Maybe the sintax on properties should be different on this part?? ocr.server.os=linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files was not modified #51

Files was not modified #51

Goku103 commented Jul 27, 2018

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 27, 2018

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 27, 2018 •

edited

Goku103 commented Jul 27, 2018

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 27, 2018 •

edited

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 30, 2018

gtnieto commented Jun 26, 2020

Files was not modified #51

Files was not modified #51

Comments

Goku103 commented Jul 27, 2018

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 27, 2018

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 27, 2018 • edited

Goku103 commented Jul 27, 2018

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 27, 2018 • edited

angelborroy-ks commented Jul 27, 2018

Goku103 commented Jul 30, 2018

gtnieto commented Jun 26, 2020

Goku103 commented Jul 27, 2018 •

edited

Goku103 commented Jul 27, 2018 •

edited