Stuck on find /to-ocr -name '*.pdf' -type f #23

g-simmons · 2023-07-11T22:52:06Z

Hey there, I'm running pd3f on a workstation and accessing via SSH tunnel to my local machine.

I'm using the browser GUI to (hopefully 🤞) OCR a book scan with several hundred pages. The book scan is already in PDF format - the file size is around 30MB.

In the log output of the web GUI I am seeing:

INFO:root:setting up ocr
INFO:root:ocr finished successfully
INFO:pd3f.parsr_wrapper:sending PDF to Parsr
INFO:pd3f.parsr_wrapper:got response from Parsr
INFO:pd3f.doc_info:media line width: 174.0
INFO:pd3f.doc_info:median line height: 9.0
INFO:pd3f.doc_info:median line space: 4.159999999999968
INFO:pd3f.doc_info:counter width: [(409.44, 1036), (8, 1036), (409.68, 1014), (410.16, 982), (409.2, 974)]
INFO:pd3f.doc_info:counter height: [(10, 19582), (9, 11277), (8, 10001), (7, 1238), (9.24, 1180)]
INFO:pd3f.doc_info:counter lineheight: [(4.159999999999968, 3830), (4.160000000000025, 2457), (4.159999999999997, 2251), (4.399999999999977, 2118), (2.759999999999991, 1806)]
INFO:pd3f.export:export page #0

It's been at least 20 minute since I started the conversion, so I'm surprised to see the tool is still on page #0.

In the terminal I'm seeing the following:

ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:00] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:01] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:02] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:03] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:04] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:05] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:06] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:07] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1
web_1         | 172.18.0.1 - - [11/Jul/2023 22:46:08] "GET /update/B3QUgcWaGCVeuaQTqy7u5Z HTTP/1.1" 200 -
ocr_worker_1  | ++ find /to-ocr -name '*.pdf' -type f
ocr_worker_1  | + sleep 1

I'm new to pd3f but it looks like the ocr worker is stuck in a loop waiting to receive a file?

Any suggestions for troubleshooting are much appreciated.

The text was updated successfully, but these errors were encountered:

rahulkrprajapati · 2024-03-02T14:00:20Z

I'm facing the same issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck on find /to-ocr -name '*.pdf' -type f #23

Stuck on find /to-ocr -name '*.pdf' -type f #23

g-simmons commented Jul 11, 2023

rahulkrprajapati commented Mar 2, 2024

Stuck on find /to-ocr -name '*.pdf' -type f #23

Stuck on find /to-ocr -name '*.pdf' -type f #23

Comments

g-simmons commented Jul 11, 2023

rahulkrprajapati commented Mar 2, 2024