Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', #529

Open
kdshreyas opened this issue Jun 22, 2023 · 4 comments
Open

Comments

@kdshreyas
Copy link

kdshreyas commented Jun 22, 2023

Summary of your issue

Refer: chezou/tabula-py#349

I encountered an issue while processing a PDF file where a specific page consistently triggers a "CalledProcessError" with the following command: ['java', '-Dfile.encoding=UTF8', '-jar']. This error disrupts the processing flow and prevents further execution.

CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'D:\Anaconda\envs\dev_env\lib\site-packages\tabula\tabula-1.0.5-jar-with-dependencies.jar', '--pages', '1', '--lattice', '--format', 'JSON'

test pdf to reproduce the issue:
test_pdf_output.pdf

Code to reproduce the error:

inputpdf = 'test_pdf_output.pdf'
page = 1
tables = tabula.read_pdf(inputpdf, pages = page, lattice = True, guess = False)
df = tables[0]

Expected behavior:
The command should execute successfully on the page of the PDF file, without encountering any errors.

Actual behavior:
The error "CalledProcessError" is encountered when processing the specified page within the PDF file.

@chezou
Copy link
Contributor

chezou commented Jun 23, 2023

This is something same as #218, lattice mode triggers the exception.

@kdshreyas Could you please update the issue, not copying your original issue, but referencing my minimal reproductive command and output? You should not use tabula-py template.

@kdshreyas
Copy link
Author

Hey @chezou,
I have updated the issue, but I am bit unsure what exactly to update in issue raised, please guide me.

@chezou
Copy link
Contributor

chezou commented Jun 26, 2023

This is the tabula-java repo. You should not describe tabula-py code.

This is the reproducible command for the issue:

$ java  -Dfile.encoding=UTF8 -jar tabula/tabula-1.0.5-jar-with-dependencies.jar --pages 1 --lattice ~/Downloads/test_pdf_output.pdf
Exception in thread "main" java.lang.IllegalArgumentException: lines must be orthogonal, vertical and horizontal
	at technology.tabula.Ruling.intersectionPoint(Ruling.java:214)
	at technology.tabula.Ruling.findIntersections(Ruling.java:378)
	at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.findCells(SpreadsheetExtractionAlgorithm.java:134)
	at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.extract(SpreadsheetExtractionAlgorithm.java:63)
	at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.extract(SpreadsheetExtractionAlgorithm.java:41)
	at technology.tabula.CommandLineApp$TableExtractor.extractTablesSpreadsheet(CommandLineApp.java:452)
	at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:410)
	at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:180)
	at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:124)
	at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:106)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:76)

When I remove the lattice option, it works.

$ java  -Dfile.encoding=UTF8 -jar tabula/tabula-1.0.5-jar-with-dependencies.jar --pages 1  ~/Downloads/test_pdf_output.pdf
"","Utah Medicaid Preferred Drug List - Effective April 1, 2023"
"",Quinolones
"",Last Brand
Preferred Drugs,Status Type Limits Mandatory 3-Month Additional Note
"",Update Required
Cipro suspension,Preferred Brand 02/01/10 Cipro susp
"ciprofloxacin 250, 500, 750mg Preferred",Generic 02/01/10
levofloxacin,Preferred Generic 02/01/16
moxifloxacin,Preferred Generic 01/01/21
"",Last Required Prior Brand
Non Preferred Drugs,Status Type Limits Additional Note
"",Update Authorization Form Required
Baxdela,Non Preferred Brand 10/01/17 Medication Coverage Exception
Cipro tablet,Non Preferred Brand 02/01/10 Medication Coverage Exception
ciprofloxacin 100mg tablet,Non Preferred Generic 01/01/22 Medication Coverage Exception
ciprofloxacin suspension,Non Preferred Generic 01/01/20 Medication Coverage Exception Cipro susp
ofloxacin tablet,Non Preferred Generic 02/01/10 Medication Coverage Exception
"",Tetracyclines
"",Last Brand
Preferred Drugs,Status Type Limits Mandatory 3-Month Additional Note
"",Update Required
doxycycline monohydrate,
"",Preferred Generic 01/01/20
"50, 100mg capsule",
doxycycline hyclate,
"",Preferred Generic 01/01/20
"50, 100mg",
minocycline,
"",Preferred Generic 01/01/20
"50, 75, 100mg capsule",
"",Last Required Prior Brand
Non Preferred Drugs,Status Type Limits Additional Note
"",Update Authorization Form Required
demeclocycline,Non Preferred Generic 01/01/20 Medication Coverage Exception
Doryx,Non Preferred Brand 01/01/20 Medication Coverage Exception
doxycycline (unless listed preferred),Non Preferred Generic 01/01/20 Medication Coverage Exception
Minocin,Non Preferred Brand 01/01/20 Medication Coverage Exception
minocycline ER capsule,Non Preferred Generic 12/01/22 Medication Coverage Exception
minocycline tablet,Non Preferred Generic 01/01/20 Medication Coverage Exception
Minolira,Non Preferred Brand 01/01/20 Medication Coverage Exception
Nuzyra,Non Preferred Brand 01/01/20 Medication Coverage Exception
Solodyn,Non Preferred Brand 01/01/20 Medication Coverage Exception
tetracycline,Non Preferred Generic 01/01/20 Medication Coverage Exception
Vibramycin,Non Preferred Brand 01/01/20 Medication Coverage Exception
Ximino,Non Preferred Brand 01/01/20 Medication Coverage Exception
"",Page 11 of 111

@kdshreyas
Copy link
Author

Hey @chezou,
I am not very aware about tabula-java, still thanks for the input it is very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants