Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mre is increasing while mse is decreasing #171

Open
WhenMelancholy opened this issue Mar 25, 2024 · 1 comment
Open

mre is increasing while mse is decreasing #171

WhenMelancholy opened this issue Mar 25, 2024 · 1 comment

Comments

@WhenMelancholy
Copy link

Hi, thank you for such wondorful work!

I am trying to pretrain scGPT for in a small dataset and I am using the pipeline in the dev-temp branch (I merged it with the main branch). After solving the issues related to library version\flash-attn I finally make the pretrain.py works! But I found the train loss is a little bit strange.

This is part of the training log:

scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 186 | time:  8.76s | valid loss/mse 157.0726 | mre 1.4009                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - Saving the best model to ./save/eval-Mar25-14-06-2024                                                                                                                     
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 187 | time:  8.13s | valid loss/mse 158.1381 | mre 1.4314                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 188 | time:  5.81s | valid loss/mse 157.6718 | mre 1.3989                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 189 | time:  6.14s | valid loss/mse 158.9929 | mre 1.4236                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 190 | time:  8.69s | valid loss/mse 158.0198 | mre 1.4282                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 191 | time:  8.16s | valid loss/mse 158.5909 | mre 1.4189                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 192 | time:  9.12s | valid loss/mse 158.4677 | mre 1.4159                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
scGPT - INFO - | end of epoch 193 | time:  8.12s | valid loss/mse 158.7186 | mre 1.4422                                                                                                  
scGPT - INFO - -----------------------------------------------------------------------------------------                                                                                 
                                                                                                                                                                                         
scGPT - INFO - -----------------------------------------------------------------------------------------

You can see the valid loss is pretty large and the mre is increasing.

My training command is:

DATASET="path to dataset"
LOG_INTERVAL=100
VALID_SIZE_OR_RATIO=0.1
MAX_LENGTH=1200
per_proc_batch_size=64
LAYERS=4
MODEL_SCALE=1

python ./examples/pretrain.py \
    --data-source $DATASET \
    --save-dir ./save/eval-$(date +%b%d-%H-%M-%Y) \
    --max-seq-len $MAX_LENGTH \
    --batch-size $per_proc_batch_size \
    --eval-batch-size $(($per_proc_batch_size * 2)) \
    --epochs 10000 \
    --log-interval $LOG_INTERVAL --save-interval 10000 \
    --no-cls \
    --no-cce \
    --fp16 \
    --vocab-path "path to vocab.json" \
    --nlayers 2 --nheads 2 --embsize 32 --d-hid 32

I was wondering how normal train looks like and any help are welcome!

@WhenMelancholy
Copy link
Author

@subercui Hi, may I ask the details about your train curve? I was wondering the train log above is correct or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant