Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to test run and train #1

Open
sourcelead opened this issue Jun 13, 2019 · 1 comment
Open

Trying to test run and train #1

sourcelead opened this issue Jun 13, 2019 · 1 comment

Comments

@sourcelead
Copy link

Getting following error during testing and training with pdf files

python3 main.py --type fixed "./src/data/test/Dong Xing_Catherine Zhang_Equity Research Intern.pdf" --model_name model
Loading nlp tools...
Loading pdf parser...
2019-06-13 12:32:38,162 [MainThread ] [WARNI] Tika server returned status: 500
Traceback (most recent call last):
File "main.py", line 101, in
r.test(path_to_resume, infoExtractor)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/model.py", line 568, in test
doc, _ = loadDocumentIntoSpacy(filename, self.parser, self.nlp)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 162, in loadDocumentIntoSpacy
new_text = getPDFText(f, parser)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 144, in getPDFText
raw = parser.from_file(filename)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 40, in from_file
return _parse(jsonOutput)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 77, in _parse
realJson = json.loads(jsonOutput[1])
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@ongteckwu
Copy link
Owner

i think it's a Tika parser problem. I did not want to use Tika because of the need to interface with Java but sadly other methods require a lot of dependencies. I think you can try restarting your Tika server or maybe upgrade your Python to 3.7.

Also, Tika requires the Internet (unfortunately) so it is possible you might have not connected to Apache Tika.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants