Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textract.exceptions.ShellError: The command antiword is not installed on your system. Please make sure the appropriate dependencies are installed before using textract #444

Open
faridelya opened this issue Oct 27, 2022 · 0 comments

Comments

@faridelya
Copy link

**Can not execute antword In production by Gunicorn while in Development on same computer it work **

i have install all dependences on Ubuntu before installing textract here is the link here
Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'python-dev-is-python2' instead of 'python-dev' libjpeg-dev is already the newest version (8c-2ubuntu8). antiword is already the newest version (0.37-16). flac is already the newest version (1.3.3-1build1). lame is already the newest version (3.100-3). libmad0 is already the newest version (0.15.1b-10ubuntu1). libsox-fmt-mp3 is already the newest version (14.4.2+git20190427-2). pstotext is already the newest version (1.9-6build1). python-dev-is-python2 is already the newest version (2.7.17-4). sox is already the newest version (14.4.2+git20190427-2). swig is already the newest version (4.0.1-5build1). tesseract-ocr is already the newest version (4.1.1-2build2). unrtf is already the newest version (0.21.10-clean-1). libxml2-dev is already the newest version (2.9.10+dfsg-5ubuntu0.20.04.4). libxslt1-dev is already the newest version (1.1.34-4ubuntu0.20.04.1). poppler-utils is already the newest version (0.86.1-0ubuntu1.1). ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1). 0 upgraded, 0 newly installed, 0 to remove and 44 not upgraded.

The following work is done on same server

  • when i run gunicorn -b 0.0.0.0:8000 wsgi:app --workers 3 --timeout 600
  • The application convert all docx and doc file to txt. but

** problem:**

When i updated changes by sudo systemctl restart gunicorn.service and sudo systemctl restart nginx

  • In production it cannot convert docx and doc file to txt and error come up.
  • the application still give me error when i check gunicorn status

`● app.service - Gunicorn instance to serve myproject
Loaded: loaded (/etc/systemd/system/app.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2022-10-27 06:11:06 UTC; 5min ago
Main PID: 389929 (gunicorn)
Tasks: 13 (limit: 38087)
Memory: 4.9G
CGroup: /system.slice/app.service
├─389929 /home/ubuntu/web-server/env/bin/python /home/ubuntu/web-server/env/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 wsgi:app --timeout 3600
├─389992 /home/ubuntu/web-server/env/bin/python /home/ubuntu/web-server/env/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 wsgi:app --timeout 3600
├─389993 /home/ubuntu/web-server/env/bin/python /home/ubuntu/web-server/env/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 wsgi:app --timeout 3600
└─389994 /home/ubuntu/web-server/env/bin/python /home/ubuntu/web-server/env/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 wsgi:app --timeout 3600

Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: byte_string = self.extract(filename, **kwargs)
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: File "/home/ubuntu/web-server/env/lib/python3.7/site-packages/textract/parsers/doc_parser.py", line 9, in extract
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: stdout, stderr = self.run(['antiword', filename])
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: File "/home/ubuntu/web-server/env/lib/python3.7/site-packages/textract/parsers/utils.py", line 96, in run
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: ' '.join(args), 127, '', '',
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: textract.exceptions.ShellError: The command antiword /home/ubuntu/web-server/data/test_cvs/Yassin.docx failed because the executable
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: antiword is not installed on your system. Please make
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: sure the appropriate dependencies are installed before using
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: textract:
Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: http://textract.readthedocs.org/en/latest/installation.html`

  • while i have installed antiword on Ubuntu i check by run which antiword the ouput is

ubuntu@:~/web-server/data/test_cvs$ which antiword
/usr/bin/antiword`
i also uninstalled and and reinstalled antiword but still the problem exist. i am stuck but it doesnt work in production but on port 8000 it work and i get output. why gunicorn cannot execute antiword? any help would be appreciated Thanks.

python version = 3.7
OS = Ubuntu 20.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant