How to run a pretrained model on unlabeled data? #30

serenalotreck · 2021-09-13T14:35:44Z

Hi,

I'm looking to apply your pretrained models to an unlabeled, new dataset. I have my dataset in DyGIE format. Looking at the script, it's unclear to me how to do this, becuase there are only two blocks of code in the script. The first is if args.do_train:, where the model is trained, and the second is if args.do_eval:, where the model is evaluated.

I don't want to train, since I'm using a pre-trained model, but I also don't want to evaluate, since my data don't have labels, which makes my use case different than the example of applying the pretrained scibert models to the scierc dataset.

Wondering if you have pointers on how to do this?

Thanks!

The text was updated successfully, but these errors were encountered:

a3616001 · 2021-09-13T21:16:11Z

Hi! I guess the easiest way for you to do this is to still create the "ner" and "relations" field in your unlabeled dataset, but set them to be empty for each sentence. For example, if a document contains 4 sentences, you can set the "ner" and "relations" as {..., "ner": [[], [], [], []], "relations":[[], [], [], []], ...}. After that, you can use --do_eval to generate the prediction file (and ignore the evaluation results in that case).

Thanks for pointing this out! I plan to add a --do_predict feature soon. For now, I think this could be an easy way to do only prediction.

serenalotreck · 2021-12-06T16:43:43Z

I just wanted to check in to see if you thought the --do-predict feature would be available soon!

serenalotreck · 2022-01-17T22:44:14Z

Just wanted to leave an update for anyone trying this -- your data file should be in a file called dev.json -- I originally had mine in test.json & couldn't get it to work, but it ran once I changed it to dev.json!

Edit: I had typed test.dev, but it should be test.json

Hubotcoder · 2023-01-19T03:42:28Z

Hi,
allow me to ask a simple question. What is doc_key? According to 'please make sure doc_key can be used to identify a certain document', should I find any document in the sciERC processed data?

Shike-Cheng · 2023-05-09T13:11:17Z

Hi! I guess the easiest way for you to do this is to still create the "ner" and "relations" field in your unlabeled dataset, but set them to be empty for each sentence. For example, if a document contains 4 sentences, you can set the "ner" and "relations" as {..., "ner": [[], [], [], []], "relations":[[], [], [], []], ...}. After that, you can use --do_eval to generate the prediction file (and ignore the evaluation results in that case).

Thanks for pointing this out! I plan to add a --do_predict feature soon. For now, I think this could be an easy way to do only prediction.

I would like to know if the prediction function of the model on the unlabeled dataset has been updated, and where I can see the relevant code, thank you very much

a3616001 mentioned this issue Sep 26, 2021

What is the command to do predictions after training a custom model? #33

Closed

prasang-gupta mentioned this issue Jul 2, 2022

Multiple issues #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run a pretrained model on unlabeled data? #30

How to run a pretrained model on unlabeled data? #30

serenalotreck commented Sep 13, 2021

a3616001 commented Sep 13, 2021

serenalotreck commented Dec 6, 2021

serenalotreck commented Jan 17, 2022 •

edited

Hubotcoder commented Jan 19, 2023

Shike-Cheng commented May 9, 2023

How to run a pretrained model on unlabeled data? #30

How to run a pretrained model on unlabeled data? #30

Comments

serenalotreck commented Sep 13, 2021

a3616001 commented Sep 13, 2021

serenalotreck commented Dec 6, 2021

serenalotreck commented Jan 17, 2022 • edited

Hubotcoder commented Jan 19, 2023

Shike-Cheng commented May 9, 2023

serenalotreck commented Jan 17, 2022 •

edited