Abnormal memory increase in eval step #77

aaron1aaron2 · 2024-04-02T10:04:12Z

This problem is caused by using the trainer provided by the transformers package. The memory usage will increase abnormally during the eval step when using customized compute_metrics(), but there is no problem during training.

I finetune my data set. The size of my evaluation set was about 27k. There was insufficient memory at the beginning. I used the eval_accumulation_steps parameter to put the evaluation part on the CPU. and it work, but in the end the RAM usage was It reaches 140 GB, and it takes a long time.

my related settings:

model_max_length=500
per_device_eval_batch_size=16

refer to this article, I fixed this issue with add preprocess_logits_for_metrics() under the trainer.

In our code, we need to exclude dnabert last layer output in preprocess_logits_for_metrics() before the trainer passes the output to compute_metrics() (it will be passed out together with the output classification logits by default) to avoid taking up too much memory.

aaron1aaron2 added 3 commits April 2, 2024 17:43

Update train.py - fix memory problem

c38be2f

Update train.py

7ebf6ae

Update train.py

f3eb930

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormal memory increase in eval step #77

Abnormal memory increase in eval step #77

aaron1aaron2 commented Apr 2, 2024 •

edited

Abnormal memory increase in eval step #77

Are you sure you want to change the base?

Abnormal memory increase in eval step #77

Conversation

aaron1aaron2 commented Apr 2, 2024 • edited

aaron1aaron2 commented Apr 2, 2024 •

edited