Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation Problem #117

Open
PanosHatz opened this issue Mar 18, 2024 · 11 comments
Open

Installation Problem #117

PanosHatz opened this issue Mar 18, 2024 · 11 comments

Comments

@PanosHatz
Copy link

Hi, first of all is this project still active?

When trying to install on Windows 11 Anaconda after the pip install . command I get the following error:

ERROR: Could not find a version that satisfies the requirement tensorflow==2.13.1 (from invoicenet) 
(from versions: 1.13.1, 1.13.2, 1.14.0, 1.15.0, 1.15.2, 1.15.3, 1.15.4, 1.15.5, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.1.0, 
2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 
2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 
2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 
2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 
2.11.0rc2, 2.11.0)
ERROR: No matching distribution found for tensorflow==2.13.1

Can anyone help me?

@GREGOR2000
Copy link

GREGOR2000 commented Mar 18, 2024

Change two lines (258,259) in setup.py:

install_requires=[
"tensorflow",
"numpy",

"six~=1.15.0",
"datefinder==0.7.1",
"opencv-python==4.5.1.48",
"pdf2image==1.14.0",
"pdfplumber==0.5.27",
"PyPDF2==1.27.9",
"pytesseract==0.3.7",
"python-dateutil==2.8.1",
"PyYAML==5.4.1",
"simplejson==3.17.2",
"tqdm==4.59.0",
"google-api-python-client",
"google-cloud-vision"
])

@PanosHatz
Copy link
Author

Change two lines (258,259) in setup.py:

install_requires=[ "tensorflow", "numpy", "six~=1.15.0", "datefinder==0.7.1", "opencv-python==4.5.1.48", "pdf2image==1.14.0", "pdfplumber==0.5.27", "PyPDF2==1.27.9", "pytesseract==0.3.7", "python-dateutil==2.8.1", "PyYAML==5.4.1", "simplejson==3.17.2", "tqdm==4.59.0", "google-api-python-client", "google-cloud-vision" ])

Thank you very much, it worked!

@eshsu
Copy link

eshsu commented Mar 21, 2024

Have you implement this repo successfully in windows

@GREGOR2000
Copy link

Yes. On Win 10 with miniconda.

@PanosHatz
Copy link
Author

Yes. On Win 10 with miniconda.

I ran into some other problems and kind of gave up. Any idea if it works for Windows 11?

@GREGOR2000
Copy link

Please tell us what problems or errors you have.

@PanosHatz
Copy link
Author

PanosHatz commented Mar 21, 2024

Please tell us what problems or errors you have.

Thanks a lot for the immediate response. Actually, I think I managed to make it work after a fresh "reinstall"
Just two questions:
Can I train using a regular CPU? If my invoices are in Greek Language will it work?

@GREGOR2000
Copy link

You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.

As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek.
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

File InvoiceNet\invoicenet\common\util.py, line 95.

data = pytesseract.image_to_data(img, output_type=Output.DICT)

data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)

@GREGOR2000
Copy link

You need to check what languages ​​tesseract-ocr supports:

c:\Program Files\Tesseract-OCR\tesseract.exe --list-langs

@PanosHatz
Copy link
Author

PanosHatz commented Mar 25, 2024

You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.

As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek. https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

File InvoiceNet\invoicenet\common\util.py, line 95.

data = pytesseract.image_to_data(img, output_type=Output.DICT)

data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)

Hi, I tried training using only CPU, it took a huge amount of time. Can I somehow use Google Colab's free GPUs for this? Do I have to make any modification to the code?

@GREGOR2000
Copy link

On a normal computer, 5,000 invoices are processed and trained in about a few hours. It's enough once. Then the trained network works quickly.

The only thing I see in the Google OCR code is the util.py file line 37:

API keys for google ocr

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="google_api_keys.json"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants