Question about the test set of the GLUE benchmark #145

James6Chou · 2023-11-01T14:09:20Z

While reading the /examples/NLU/examples/text-classification/run_glue.py file, I noticed that the GLUE dataset only uses the validation set for generating results and does not measure accuracy on the evaluation set. Would it be better to evaluate the accuracy on the evaluation set using the model that performs best on the validation set in run_glue.py?

Edenzzzz · 2024-01-25T03:18:10Z

It's using exactly the evaluation set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the test set of the GLUE benchmark #145

Question about the test set of the GLUE benchmark #145

James6Chou commented Nov 1, 2023

Edenzzzz commented Jan 25, 2024 •

edited

Question about the test set of the GLUE benchmark #145

Question about the test set of the GLUE benchmark #145

Comments

James6Chou commented Nov 1, 2023

Edenzzzz commented Jan 25, 2024 • edited

Edenzzzz commented Jan 25, 2024 •

edited