Timeout for PDF extraction from OpenOffice supported document format. #34

vrybas · 2012-01-13T16:31:51Z

Because when we extract_pdf() from document more than 400-500 pages,
the JODConverter fails with exception:

Exception in thread "main" org.artofsolving.jodconverter.office.OfficeException: task did not complete within timeout at org.artofsolving.jodconverter.office.PooledOfficeManager.execute(PooledOfficeManager.java:88) at
org.artofsolving.jodconverter.office.ProcessPoolOfficeManager.execute(ProcessPoolOfficeManager.java:78) at org.artofsolving.jodconverter.OfficeDocumentConverter.convert(OfficeDocumentConverter.java:78) at org.artofsolving.jodconverter.OfficeDocumentConverter.convert(OfficeDocumentConverter.java:69) at org.artofsolving.jodconverter.cli.Convert.main(Convert.java:118) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228) at java.util.concurrent.FutureTask.get(FutureTask.java:91) at org.artofsolving.jodconverter.office.PooledOfficeManager.execute(PooledOfficeManager.java:85) ...

The new JODConverter 3.0b4 getting timeout param. The problem is solved.

I don't know if timeout should be hardcoded, or if it should be documented Docsplit's option. I did both in separate commits.

Because when we extract_pdf() document more than 400-500 pages, the JODConverter fails with exception: Exception in thread "main" org.artofsolving.jodconverter.office.OfficeException: task did not complete within timeout at org.artofsolving.jodconverter.office.PooledOfficeManager.execute...

tienle · 2012-05-07T04:27:32Z

vote for supporting timeout option. 👍

jravetch · 2012-05-17T07:47:49Z

Me too. Have run into the issue before. Very heavy docs can take almost 5min to convert to pdf.

mromaine · 2012-05-21T10:05:02Z

+1 here too; what are the chances this pull request will be granted?

pzaich · 2012-11-01T03:07:10Z

+1 Has this been resolved yet? I am running into this problem as well. Anything over 1.5 mB on .doc format seems to timeout along with a lot of pdfs.

alxndrmlr · 2013-06-19T13:09:47Z

lib/docsplit/command_line.rb

@@ -94,6 +94,9 @@ def parse_options
        opts.on('--no-clean', 'disable cleaning of OCR\'d text') do |c|
          @options[:clean] = false
        end
+        opts.on('-t', '--timeout [SEC]', 'Timeout for PDF extraction from OpenOffice document format (default is 1 hour)') do |t|


Perhaps change this message to "Timeout for PDF extraction from OpenOffice supported document format" so as not to lead people into thinking the flag will only apply to OpenOffice files and not .doc, .xlsx

@alxndrmlr, will do thanks

Original work by documentcloud#34 with modification to not use a default timeout (causing no change from existing functionality).

vrybas added 2 commits January 13, 2012 22:59

Command line parameter and documentation update for timeout option

51b6f7d

alxndrmlr reviewed Jun 19, 2013
View reviewed changes

Fixed help message for --timeout option

e52a17d

doxavore pushed a commit to ebp/docsplit that referenced this pull request Apr 25, 2014

Add timeout option to JODConverter.

856b122

Original work by documentcloud#34 with modification to not use a default timeout (causing no change from existing functionality).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout for PDF extraction from OpenOffice supported document format. #34

Timeout for PDF extraction from OpenOffice supported document format. #34

vrybas commented Jan 13, 2012

tienle commented May 7, 2012

jravetch commented May 17, 2012

mromaine commented May 21, 2012

pzaich commented Nov 1, 2012

alxndrmlr Jun 19, 2013

vrybas Jun 19, 2013

Timeout for PDF extraction from OpenOffice supported document format. #34

Are you sure you want to change the base?

Timeout for PDF extraction from OpenOffice supported document format. #34

Conversation

vrybas commented Jan 13, 2012

tienle commented May 7, 2012

jravetch commented May 17, 2012

mromaine commented May 21, 2012

pzaich commented Nov 1, 2012

alxndrmlr Jun 19, 2013

Choose a reason for hiding this comment

vrybas Jun 19, 2013

Choose a reason for hiding this comment