Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on train/val/test split when evaluating label model. #27

Closed
wurenzhi opened this issue Apr 30, 2022 · 1 comment
Closed

Question on train/val/test split when evaluating label model. #27

wurenzhi opened this issue Apr 30, 2022 · 1 comment

Comments

@wurenzhi
Copy link
Contributor

wurenzhi commented Apr 30, 2022

It seems the label model is also fitted on a training set and then evaluated on a test set in the original paper. However, when using weak supervision to generate labeled data, we care more about the quality of the generated labels than the generalization ability of a label model. For example, a label model provides perfect labels on the training set (which it was fitted on with an unsupervised learning process), and the label model provides random labels on a test set (on which it was not fitted). This is a perfect label model for the purpose of generating labeled data but will be the worst label model in the benchmark. My questions is:
For the purpose of generating labeled data (which is then used to train an end model), is it really necessary to do train/val/test split to evaluate the label model? Can we just fit the unsupervised label model on the whole dataset and then evaluate on the whole dataset?

I appreciate any explanations.

@JieyuZ2
Copy link
Owner

JieyuZ2 commented Apr 30, 2022

Hey @wurenzhi

That's a great question and thanks for pointing it out!!

In fact, in our original paper, we adopt that setup to ease the comparison of the label model and end model. However, if we only care about the quality of generated label (in such case no end model would be involved), we can of course use the evaluation setup you mentioned. Actually, there're some work that follow the setup you mentioned, for example, this one.

@JieyuZ2 JieyuZ2 closed this as completed Apr 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants