Question on train/val/test split when evaluating label model. #27

wurenzhi · 2022-04-30T18:33:48Z

It seems the label model is also fitted on a training set and then evaluated on a test set in the original paper. However, when using weak supervision to generate labeled data, we care more about the quality of the generated labels than the generalization ability of a label model. For example, a label model provides perfect labels on the training set (which it was fitted on with an unsupervised learning process), and the label model provides random labels on a test set (on which it was not fitted). This is a perfect label model for the purpose of generating labeled data but will be the worst label model in the benchmark. My questions is:
For the purpose of generating labeled data (which is then used to train an end model), is it really necessary to do train/val/test split to evaluate the label model? Can we just fit the unsupervised label model on the whole dataset and then evaluate on the whole dataset?

I appreciate any explanations.

JieyuZ2 · 2022-04-30T19:40:35Z

Hey @wurenzhi

That's a great question and thanks for pointing it out!!

In fact, in our original paper, we adopt that setup to ease the comparison of the label model and end model. However, if we only care about the quality of generated label (in such case no end model would be involved), we can of course use the evaluation setup you mentioned. Actually, there're some work that follow the setup you mentioned, for example, this one.

JieyuZ2 closed this as completed Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on train/val/test split when evaluating label model. #27

Question on train/val/test split when evaluating label model. #27

wurenzhi commented Apr 30, 2022 •

edited

JieyuZ2 commented Apr 30, 2022

Question on train/val/test split when evaluating label model. #27

Question on train/val/test split when evaluating label model. #27

Comments

wurenzhi commented Apr 30, 2022 • edited

JieyuZ2 commented Apr 30, 2022

wurenzhi commented Apr 30, 2022 •

edited