Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find text.pdf file #58

Open
getglad opened this issue Jan 4, 2017 · 2 comments
Open

Cannot find text.pdf file #58

getglad opened this issue Jan 4, 2017 · 2 comments

Comments

@getglad
Copy link

getglad commented Jan 4, 2017

I am getting an intermittent error message when watching a directory that the x_text.pdf file cannot be found.

When the error is thrown, the current/running thread appears to complete and make the OCR'ed version (best I can tell, it comes out okay), so I'm not sure why the error is thrown.

The observer dies, however, and so any other queued files are dropped.

ERROR: Cannot find specified pdf file ./x_text.pdf
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/root/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
  File "/root/anaconda2/lib/python2.7/site-packages/watchdog/observers/api.py", line 200, in run
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'Empty'

A couple of additional notes:

  • The PDFs in question are large (125+ pages), so I don't know if that would be part of the issue?

  • Best I can tell, the error is not thrown uniformly (it does not always fail on the same file, or after the same number of files, etc), and I cannot replicate the problem when I explicitly/individually call the same files that fail when watching (ie, calling pypdfocr ./x.pdf instead of pypdfocr -w ./folder/).

@TheLexus
Copy link

TheLexus commented Aug 8, 2017

I have the same problem. Its quite simple, the x_text.pdf file is generated by pypdfocr as a intermidiate file. The problem is, it is generated inside the watch folder and the watcher does try to work on each pdf file. After finishing the x.pdf file it has the x_text.pdf file at his queues and tries to do his work on it. But the x_text.pdf is deleted (because it was only a intermidiate file). The error itself doesnt harm but i think there should be a filter at the watcher and there should be a detection if a file is existing (anymore) before doing the work.

@AurelioPuente
Copy link

AurelioPuente commented Jul 23, 2019

Any update on how to fix this? I keep running into the same issue. The watcher basically stops because of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants