Loss Increasing after few epochs #652

alan-ai-learner · 2023-03-20T06:24:29Z

Hi @rnyak , @benfred @gabrielspmoreira @oliverholworthy , I'm training the t4rec model on custom data, but the loss is not decreasing after few epochs, instead it started increasing. Basically the loss started from 13.67 and after training for few epoch it get decreased to 6.43 and then it started increasing,

I'm not sure what can be done to improve the loss more.

Here are my params:

params = {
    'batch_size': 512,
    'lr': 0.0005,
    'lr_scheduler': 'cosine',
    'num_train_epochs': 1,
    'using_test': True,
    'using_type': False,
    'bl_shuffle': True,
    'masking': 'mlm',
    'd_model': 256,
    'n_head': 32,
    'n_layer': 3,
    'proj_num': 1,
    'act_mlp': 'None',
    'item_correction': False,
    'neg_factor': 4,
    'label_smoothing': 0.0,
    'temperature': 1.5734215681668653,
    'remove_false_neg': True,
    'item_correction_factor': 0.04152252077012748,
    'transformer_dropout': 0.05096800263401626,
    'mlm_probability': 0.35044384745899415,
    'top20': True,
    'loss_types': True,
    'loss_types_type': 'Simple',
    'multi_task_emb': 0,
    'mt_num_layers': 1,
    'use_tanh': False,
    'seq_len': 20,
    'split': 0
}

Any suggesstion would be very helpful. Thanks in Advance!!

Originally posted by @alan-ai-learner in #493 (comment)

The text was updated successfully, but these errors were encountered:

rnyak · 2023-03-20T12:42:41Z

@alan-ai-learner hello. It is hard to tell what'd be these values for your custom dataset. All of these params are hyper parameters. Did you do some hyper-parameter tuning? If not, you can first play with your learning rate and batch size. Then then reduce n_head and mlm_probability. You can see our paper experiments in here for different public datasets. But this does not mean, the same value of these hyper-params will work for your dataset.

Are you training your model only with item-id-list or with side features?

alan-ai-learner · 2023-03-20T14:57:51Z

Thanks for responding @rnyak ,

I didn't do hyperparameter tuning I used these params from this repo they are using the same data. The only difference is the batch size they are using 1024 and they did some architectural changes in the t4rec.
I tried their approach end to end, and I'm able to start the training but after a few steps I got the error:

RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

I'm unable to make it right and I gave up.

Also @rnyak, they wanted to do a pull request, to add features that help current t4rec to train faster.
And the repo code I share is the 3rd position winning solution for Otto competiton.

For now, i went ahead with the default t4rec setup.
FYI I'm using the item-id-list with side features, its categories, the data looks like for exp:

[12,34,55,56] , [1,2,3,1]

where the first list of values contains item ids and the second list of values contains the event type, where 1:clicks, 2:carts, and 3:orders.

Please let me know if any thing to make it work.
Is there any direct relationship between, batch size and learning rate?

rnyak · 2023-03-20T21:39:43Z

@bschifferer might help you with that if possible. His code is not merged with TF4Rec and it has custom implementations.

alan-ai-learner · 2023-03-23T06:13:20Z

I see, @rnyak but i'm only using his code to preprocess the dataset and after that i'm trying to use the model architecture given in one of example of this repo.

So there is no straight forward way to overcome the problem i'm facing, i need to play with the params.

rnyak · 2023-03-27T15:50:10Z

@alan-ai-learner how are you generating the schema file if you are not using NVTabular? thanks.

alan-ai-learner · 2023-03-27T16:20:45Z

@alan-ai-learner how are generating the schema file if you are not using NVTabular? thanks.

I'm using this manual schema.. https://github.com/bschifferer/Kaggle-Otto-Comp/blob/master/01e_FE_Transformer/test.pb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss Increasing after few epochs #652

Loss Increasing after few epochs #652

alan-ai-learner commented Mar 20, 2023 •

edited

rnyak commented Mar 20, 2023 •

edited

alan-ai-learner commented Mar 20, 2023

rnyak commented Mar 20, 2023

alan-ai-learner commented Mar 23, 2023

rnyak commented Mar 27, 2023 •

edited

alan-ai-learner commented Mar 27, 2023

Loss Increasing after few epochs #652

Loss Increasing after few epochs #652

Comments

alan-ai-learner commented Mar 20, 2023 • edited

rnyak commented Mar 20, 2023 • edited

alan-ai-learner commented Mar 20, 2023

rnyak commented Mar 20, 2023

alan-ai-learner commented Mar 23, 2023

rnyak commented Mar 27, 2023 • edited

alan-ai-learner commented Mar 27, 2023

alan-ai-learner commented Mar 20, 2023 •

edited

rnyak commented Mar 20, 2023 •

edited

rnyak commented Mar 27, 2023 •

edited