Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

Open
nruth opened this issue Jan 25, 2016 · 2 comments

Comments

@nruth
Copy link

nruth commented Jan 25, 2016

related to #42

After extracting the text from a PDF or Doc file I need to do something with it. I understand not loading the string into ruby (it could be huge), but it'd be helpful to get the output file path as a return value. Otherwise we have to use different output dirs or try to reconstruct its path based on other information, which feels wrong.

Currently Docsplit::TextExtractor#extract_text is returning the source file paths. For Transparent doc(x) file conversion it returns the intermediary tempfile pdf.
E.g. when I map over an array with a pdf and a doc in my project's tmp dir I get back

[
"/var/folders/_j/q3pr8b3s1vj85mhqvyb06gr40000gn/T/docsplit/sample.docx20160125-29577-go3upi.pdf",
"/Users/nruth/dev/monitor/tmp/AISB08.pdf20160125-29577-1svhpfo.pdf"
]

Instead I'd like to be given the path of the output text files, so I can open them.

Would this be a good PR, or is there a deliberate reason to return these other file paths that could be documented?

@harssh
Copy link

harssh commented Mar 18, 2016

👍 Are we going ahead with this or is this already implemented ?

@nruth
Copy link
Author

nruth commented Mar 20, 2016

I didn't make a PR. I worked around the problem by putting the document into its own temporary subdirectory then using ls. I do think it's something that can be fixed, as it's just a forgot-to-think-about-the-return-value problem. But the PR backlog is growing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants