Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss Increasing after few epochs #652

Open
alan-ai-learner opened this issue Mar 20, 2023 · 6 comments
Open

Loss Increasing after few epochs #652

alan-ai-learner opened this issue Mar 20, 2023 · 6 comments

Comments

@alan-ai-learner
Copy link

alan-ai-learner commented Mar 20, 2023

Hi @rnyak , @benfred @gabrielspmoreira @oliverholworthy , I'm training the t4rec model on custom data, but the loss is not decreasing after few epochs, instead it started increasing. Basically the loss started from 13.67 and after training for few epoch it get decreased to 6.43 and then it started increasing,
image
I'm not sure what can be done to improve the loss more.

Here are my params:

params = {
    'batch_size': 512,
    'lr': 0.0005,
    'lr_scheduler': 'cosine',
    'num_train_epochs': 1,
    'using_test': True,
    'using_type': False,
    'bl_shuffle': True,
    'masking': 'mlm',
    'd_model': 256,
    'n_head': 32,
    'n_layer': 3,
    'proj_num': 1,
    'act_mlp': 'None',
    'item_correction': False,
    'neg_factor': 4,
    'label_smoothing': 0.0,
    'temperature': 1.5734215681668653,
    'remove_false_neg': True,
    'item_correction_factor': 0.04152252077012748,
    'transformer_dropout': 0.05096800263401626,
    'mlm_probability': 0.35044384745899415,
    'top20': True,
    'loss_types': True,
    'loss_types_type': 'Simple',
    'multi_task_emb': 0,
    'mt_num_layers': 1,
    'use_tanh': False,
    'seq_len': 20,
    'split': 0
}

Any suggesstion would be very helpful. Thanks in Advance!!

Originally posted by @alan-ai-learner in #493 (comment)

@rnyak
Copy link
Contributor

rnyak commented Mar 20, 2023

@alan-ai-learner hello. It is hard to tell what'd be these values for your custom dataset. All of these params are hyper parameters. Did you do some hyper-parameter tuning? If not, you can first play with your learning rate and batch size. Then then reduce n_head and mlm_probability. You can see our paper experiments in here for different public datasets. But this does not mean, the same value of these hyper-params will work for your dataset.

Are you training your model only with item-id-list or with side features?

@alan-ai-learner
Copy link
Author

Thanks for responding @rnyak ,

  • I didn't do hyperparameter tuning I used these params from this repo they are using the same data. The only difference is the batch size they are using 1024 and they did some architectural changes in the t4rec.
  • I tried their approach end to end, and I'm able to start the training but after a few steps I got the error:
RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

I'm unable to make it right and I gave up.

  • Also @rnyak, they wanted to do a pull request, to add features that help current t4rec to train faster.
    And the repo code I share is the 3rd position winning solution for Otto competiton.

For now, i went ahead with the default t4rec setup.
FYI I'm using the item-id-list with side features, its categories, the data looks like for exp:

[12,34,55,56] , [1,2,3,1]

where the first list of values contains item ids and the second list of values contains the event type, where 1:clicks, 2:carts, and 3:orders.

Please let me know if any thing to make it work.
Is there any direct relationship between, batch size and learning rate?

@rnyak
Copy link
Contributor

rnyak commented Mar 20, 2023

@bschifferer might help you with that if possible. His code is not merged with TF4Rec and it has custom implementations.

@alan-ai-learner
Copy link
Author

I see, @rnyak but i'm only using his code to preprocess the dataset and after that i'm trying to use the model architecture given in one of example of this repo.

So there is no straight forward way to overcome the problem i'm facing, i need to play with the params.

@rnyak
Copy link
Contributor

rnyak commented Mar 27, 2023

@alan-ai-learner how are you generating the schema file if you are not using NVTabular? thanks.

@alan-ai-learner
Copy link
Author

@alan-ai-learner how are generating the schema file if you are not using NVTabular? thanks.

I'm using this manual schema.. https://github.com/bschifferer/Kaggle-Otto-Comp/blob/master/01e_FE_Transformer/test.pb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants