Fix escaping when extracting text using OCR #149

floehopper · 2018-07-02T20:48:09Z

Previously the output filename passed to the tesseract command was not shell-escaped. This meant that the filename was truncated and did not match the filename expected by Docsplit::TextExtractor#clean_text resulting in the following exception:

Errno::ENOENT: No such file or directory @ rb_sysopen - test/output/PDF file with spaces 'single' and "double quotes".txt
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `initialize'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `open'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `clean_text'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:80:in `extract_from_ocr'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:36:in `block in extract'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:32:in `each'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:32:in `extract'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit.rb:52:in `extract_text'
test/unit/test_extract_text.rb:58:in `test_name_escaping_while_extracting_text_using_ocr'

Previously the output filename passed to the tesseract command was not shell-escaped. This meant that the filename was truncated and did not match the filename expected by Docsplit::TextExtractor#clean_text resulting in the following exception: Errno::ENOENT: No such file or directory @ rb_sysopen - test/output/PDF file with spaces 'single' and "double quotes".txt /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `initialize' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `open' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `clean_text' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:80:in `extract_from_ocr' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:36:in `block in extract' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:32:in `each' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:32:in `extract' /Users/jamesmead/Code/freerange/docsplit/lib/docsplit.rb:52:in `extract_text' test/unit/test_extract_text.rb:58:in `test_name_escaping_while_extracting_text_using_ocr'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix escaping when extracting text using OCR #149

Fix escaping when extracting text using OCR #149

floehopper commented Jul 2, 2018

Fix escaping when extracting text using OCR #149

Are you sure you want to change the base?

Fix escaping when extracting text using OCR #149

Conversation

floehopper commented Jul 2, 2018