How to give test data as pdf instead of annotated json file #10

chaitanya1019 · 2019-04-15T12:20:59Z

In this repo, u have given annotated test data for prediction, but what if the test data is not annotated before hand for prediction. I mean what modifications should be made to take input as a pdf not an annotated json file.

chaitanya1019 · 2019-04-17T08:54:46Z

In this repo, u have given annotated test data for prediction, but what if the test data is not annotated before hand for prediction. I mean what modifications should be made to take input as a pdf not an annotated json file.

I've use pdfminier.six package to convert pdf to text and apply my custom ner model to make predictions.
But the predictions are not coming out to be accurate. Out of the 9 entities(Companies Worked at
Skills
Graduation Year
College Name
Degree
Designation
Email Address
Location
Name) that the training data set has only 5-6 entities are recognized even after 100 iterations of 200 resumes. Please follow this explosion/spaCy#3528 (comment)

srk86386 · 2019-04-30T17:34:36Z

I am getting this error when I try to run the script :

ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the GoldParse was not correct. For example, are all labels added to the model?

thongtran957 · 2019-10-31T03:39:14Z

In this repo, u have given annotated test data for prediction, but what if the test data is not annotated before hand for prediction. I mean what modifications should be made to take input as a pdf not an annotated json file.

I've use pdfminier.six package to convert pdf to text and apply my custom ner model to make predictions.
But the predictions are not coming out to be accurate. Out of the 9 entities(Companies Worked at
Skills
Graduation Year
College Name
Degree
Designation
Email Address
Location
Name) that the training data set has only 5-6 entities are recognized even after 100 iterations of 200 resumes. Please follow this explosion/spaCy#3528 (comment)

Hi @chaitanya1019 , you can test data with pdf file? if yes, can you share with me how to test.

sayalraza · 2019-11-27T09:08:03Z

Hi @chaitanya1019 Could you please share the code with which you are doing inference. How are you using pdf/doc/docx to test the model instead of already annotated testdata.json file?

sayalraza · 2019-12-10T10:06:50Z

@chaitanya1019 I was able to successfully give a pdf/txt file to model and get a decent output. I don't think your code is the problem. I had same issue of model being messed up after loading from disk. I could find out that it was an issue with older version of spacy. I am using spacy 2.2.3. I trained the model with this and was able to do inference without messing up the model. The only problem is the dataset. In the new version, they dont allow dataset entities to overlap and this will raise ValueError: [E103]. I had to mannually remove all conflicting entities from dataset as there is no particular pattern. This was a time consuming task. Anyways, I have the clean traindata.json and testdata.json with me. Not able to attach json here though. I can share it any other way, if you want.

mhmadayad · 2020-01-02T16:26:58Z

@sayalraza could you please send me the clean traindata.json and testdata.json to mohamad.ayad@tum.de. Thanks in advance.

mhmadayad · 2020-01-02T16:28:38Z

@sayalraza as I am working to generalize the model by adding thousands of labeled training data. so I have to take care about entity conflicts otherwise its gonna be really time consuming to clean a json file with 1000 entry

Noorain99 · 2020-02-03T02:28:15Z

@sayalraza could you please send me the clean traindata.json and testdata.json to noorainzaidi99@gmail.com.
Thanks in advance.

hardikjamnal404 · 2020-03-31T11:23:57Z

Heyy! @sayalraza If you can please send the traindata.json and testdata.json to hardikjamnal@gmail.com Thank You :)

ziodos · 2020-05-14T22:50:31Z

@sayalraza hey can you please provide me the cleaned version of traindata.json and testdata.json. this is my email:
zied.zanina14@gmail.com
Thanks in advance

puttapraneeth · 2020-05-24T18:18:26Z

@sayalraza could you please provide the cleaned version of traindata.json and testdata.json. My email:
puttapraneeth@gmail.com

Also what is causing conflict in the json provided. I am getting error as there is a conflict, unable to understand what is that. Thanks in advance

JasonLing95 · 2020-09-07T08:18:30Z

@sayalraza please could you provide a cleaned version of train data.json for new version of SpaCy model. Email: jason_ling95@hotmail.com

Really really appreciated.

swethasrinivasan16 · 2022-05-19T15:42:42Z

@sayalraza please could you send the cleaned version to mail: swethas162001@gmail.com

KaranvirSIdana · 2022-07-20T12:02:15Z

@sayalraza please could you send the cleaned version to mail: swethas162001@gmail.com

Have you got the cleaned version? If yes, could you please share it with me as well.TIA!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to give test data as pdf instead of annotated json file #10

How to give test data as pdf instead of annotated json file #10

chaitanya1019 commented Apr 15, 2019

chaitanya1019 commented Apr 17, 2019

srk86386 commented Apr 30, 2019

thongtran957 commented Oct 31, 2019

sayalraza commented Nov 27, 2019

sayalraza commented Dec 10, 2019

mhmadayad commented Jan 2, 2020

mhmadayad commented Jan 2, 2020

Noorain99 commented Feb 3, 2020

hardikjamnal404 commented Mar 31, 2020

ziodos commented May 14, 2020 •

edited

puttapraneeth commented May 24, 2020

JasonLing95 commented Sep 7, 2020

swethasrinivasan16 commented May 19, 2022

KaranvirSIdana commented Jul 20, 2022

How to give test data as pdf instead of annotated json file #10

How to give test data as pdf instead of annotated json file #10

Comments

chaitanya1019 commented Apr 15, 2019

chaitanya1019 commented Apr 17, 2019

srk86386 commented Apr 30, 2019

thongtran957 commented Oct 31, 2019

sayalraza commented Nov 27, 2019

sayalraza commented Dec 10, 2019

mhmadayad commented Jan 2, 2020

mhmadayad commented Jan 2, 2020

Noorain99 commented Feb 3, 2020

hardikjamnal404 commented Mar 31, 2020

ziodos commented May 14, 2020 • edited

puttapraneeth commented May 24, 2020

JasonLing95 commented Sep 7, 2020

swethasrinivasan16 commented May 19, 2022

KaranvirSIdana commented Jul 20, 2022

ziodos commented May 14, 2020 •

edited