Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop python2 support #433

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

tehabstract
Copy link

Dropping python2 support, loosening up dependencies.
Please comment if you want dependencies in a different format, or any changes and I will adjust.

Introduced openpyxl for xlsx files.
Updated 2 test files:

  • pdf - since recent versions of pdfminer.six parse the file a little different.
  • xlsx - since openpyxl parses the file a little differently than xlrd - notably bool ( 1 -> True )

Updated travis, vagrant, dockerfile in tests.

Upped the version to 1.7.0, added to changelog.

Thanks

@twolfvb
Copy link

twolfvb commented Jun 16, 2023

@deanmalmgren Any chance this could get looked into?
Python 2 was left with no support on Jan 1 2020, and the older packages required for textract to work with 2.7 do cause conflicts.
In particular, our team would appreciate bumping pdfminer.six to a newer version.

pdfminer.six >= 20200726 is required for using unstructured, which is required by langchain!

@thehunmonkgroup
Copy link

Quick note that I've tested this patch lightly, the only problem I've found so far relates to an update to Python's subprocess module:

diff --git a/textract/parsers/utils.py b/textract/parsers/utils.py
index 11ec8a1..efb0d9c 100755
--- a/textract/parsers/utils.py
+++ b/textract/parsers/utils.py
@@ -83,7 +83,7 @@ class ShellParser(BaseParser):
         """

         # run a subprocess and put the stdout and stderr on the pipe object
-        if subprocess.mswindows:
+        if subprocess._mswindows:
             startupinfo = subprocess.STARTUPINFO()
             startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
         else:

Otherwise it's been working well for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants