ListOps performance #15

dido1998 · 2021-01-07T14:41:05Z

On running the ListOps task as-is from the repo, I got a validation performance similar to that reported in the paper but the test performance on results.json is very low:

{"accuracy": 0.17500001192092896, "loss": 3.032956123352051, "perplexity": 20.758506774902344}

I saw that the code is saving the model from the last checkpoint as compared to the model with the best validation performance. Could you detail the evaluation setup used in the paper i.e. for the paper do you evaluate the model from the last checkpoint of from the best validation checkpoint?

Thank you very much! :-)

The text was updated successfully, but these errors were encountered:

sihyun-yu · 2021-01-22T04:04:15Z

Have you solved this problem? I have a similar issue.

dido1998 · 2021-01-25T13:31:52Z

Hi @sihyun-yu, I was not able to solve it

apuaaChen · 2021-05-04T18:38:33Z

I got a similar issue with the transformer_base. The evaluation accuracy curve is a little bit weird. The highest accuracy reaches 0.3359 at step 2.5k, then it drops to < 0.2. I directly use the default configurations.

jinfengr · 2021-06-11T20:40:45Z

It seems the issue is fixed with the latest code push. Please add comment if the issue still comes up.

renebidart · 2021-07-08T16:02:31Z

I found either lowering the learning rate or increasing the batch size was useful for this task. I think their hyperparameters are for a large effective batch size because they run on TPU.

BalloutAI · 2021-12-20T12:24:50Z

I am still getting the same problem, my validation during training is high on the listops, but when running test_only option, I am getting very low accuracy!

BalloutAI · 2022-02-09T11:45:42Z

The problem is that the data is shuffled every time the code is ran, so the tokens are changed when running the test script giving a random accuracy.

yuzhenmao · 2022-09-15T10:04:50Z

The problem is that the data is shuffled every time the code is ran, so the tokens are changed when running the test script giving a random accuracy.

@BalloutAI Hi, I also found this issue: high training accuracy, low test accuracy; I also found if I run training process multiple times, sometimes the model cannot even converge. Could you explain your idea a little bit more? Thank you.

BalloutAI mentioned this issue Dec 20, 2021

Facilitate running the benchmark #8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ListOps performance #15

ListOps performance #15

dido1998 commented Jan 7, 2021 •

edited

sihyun-yu commented Jan 22, 2021

dido1998 commented Jan 25, 2021

apuaaChen commented May 4, 2021

jinfengr commented Jun 11, 2021

renebidart commented Jul 8, 2021

BalloutAI commented Dec 20, 2021

BalloutAI commented Feb 9, 2022

yuzhenmao commented Sep 15, 2022 •

edited

ListOps performance #15

ListOps performance #15

Comments

dido1998 commented Jan 7, 2021 • edited

sihyun-yu commented Jan 22, 2021

dido1998 commented Jan 25, 2021

apuaaChen commented May 4, 2021

jinfengr commented Jun 11, 2021

renebidart commented Jul 8, 2021

BalloutAI commented Dec 20, 2021

BalloutAI commented Feb 9, 2022

yuzhenmao commented Sep 15, 2022 • edited

dido1998 commented Jan 7, 2021 •

edited

yuzhenmao commented Sep 15, 2022 •

edited