Trained model give constant value prediction outputs #16

NickGeo1 · 2022-12-21T02:25:44Z

Hello there,

First of all, I want to say this implementation seems really interesting. I am currently working on timeseries forecasting with Transformer model for my thesis. I studied the following posts:
https://towardsdatascience.com/how-to-make-a-pytorch-transformer-for-time-series-forecasting-69e073d4061e
https://towardsdatascience.com/how-to-run-inference-with-a-pytorch-time-series-transformer-394fd6cbe16c
as well as your repository and code.

It was a huge help for me to implement the model with pytorch, so I want to thank you for sharing your implementation.
I implemented the TimeseriesTransformer class on my project, as well as a few other functions of my own, about training, validation, Batchify data etc.
I modified a little bit the positional encoder part so it can be compatible with a single sequence input. Also, I fixed the uncompatibility issue of batch_first = true, but I already saw that someone else fixed that before me.

Now let me introduce you to my issue. I tried to train the model with some timeseries data. I used the MSE loss function and Adam optimizer as you introduced in the inference post. I trained it for 53 epochs of 21 batches of 16 data each, as it seemed like it was converging at that point. I trained it in a dataset of 357 time units with encoder_in_len = 10 and decoder_in_len = 1. Each sequence was shifted left by one unit, compared to the previous sequence in a batch.
For example, the first 3 sequences of the first batch:

seq1: 343, 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134 (T1->T10)
seq2: 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237 (T2->T11)
seq3: 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237, 698 (T3->T12)
...

The thing is, both for these data and dummy data I used earlier, the model seems to make a wrong inference.
Any time I tried to make an inference on new/trained data to test, the prediction was a constant value for all sequences in the batch.

For example I got the following inference output for the above sequences:
427.4353, 427.4353, 427.4353 (decoder input 134, 237, 698 respectively. The output should be 237, 698 and 0, where T13 = 0)

What do you think is going on here? Is it the loss function, the optimizer, the model or what?
I will really appriciate your answer!

qdwang · 2023-03-09T07:46:23Z

@NickGeo1 Almot same problem here. Have you figured out the reason yet? I tried to make the transformer to learn two very simple time series data. [1,2,3] to predict 4 and [55, 56, 57] to predict 58.

The data i used for encoder and decoder are:

enc_seq = torch.tensor([[1., 2., 3.], [55., 56., 57.]])
dec_seq = torch.tensor([[3.], [57.]])
goal = torch.tensor([[4.], [58.]])

The complete code is here, https://gist.github.com/qdwang/b6037c9117195cc07c4582fdd6d126a8
Don't know what's wrong

Singal0927 · 2023-04-11T10:05:07Z

I tried on the dataset given by the author, and the transformer returns the constant value as well.

Athuliva · 2023-04-12T01:58:51Z

It seems there is some bug in the transformers encoder layer, source: https://discuss.pytorch.org/t/model-eval-predicts-all-the-same-or-nearly-same-outputs/155025/11

Try using a lower version of Pytorch (preferably 1.11)

MarkiemarkF · 2023-04-13T08:53:02Z

Hi, unfortunately I'm facing the same issue. I have tried PyTorch 1.11, (and PyTorch 2.0 as well), but that didn't help. I played around with the code a bit and wasn't able to fix it either, did anyone find a solution to this problem?

Priyanga-Kasthurirajan · 2023-05-04T16:48:06Z

Hi, did anyone solve this issue? Facing the same problem of predictions remaining constant.

Athuliva · 2023-05-05T05:07:34Z

@Priyanga-Kasthurirajan, the problem is with the transformer encoder layer in PyTorch. There are some issues when u run the model in eval() mode. You can run the model in train() model itself, but You won't get same result every time you infer with the same input data since the dropout layer is not disabled in the train () mode (dropout layer drops weights randomly)

Aimen1996 · 2023-05-29T09:11:34Z

@Athuliva yes you said it right, every time i am also getting different results for my same input data

Athuliva · 2023-05-29T11:46:47Z

@Aimen1996 @Priyanga-Kasthurirajan @MarkiemarkF I created transforms in TensorFlow, referring to this blog and TensorFlow's official blog on implementing the transformers language model, but my results are not better than an LSTM network. Would you like to work together? athul.suresh@iva.org.in

Aimen1996 · 2023-05-29T11:50:55Z

@Athuliva i am working with TSF. basically doing comparison between LSTM,transformer and non-stationary transformer.

Athuliva · 2023-05-29T11:53:14Z

@Aimen1996 I am also working with TSF, but the results from transformers need to be better. I need to do a comparison study and find the right model for the task

Aimen1996 · 2023-05-29T11:55:41Z

how to add you ? so we can talk more about it

…

On Mon, 29 May 2023 at 19:53, Athul Suresh ***@***.***> wrote: @Aimen1996 <https://github.com/Aimen1996> I am also working with TSF, but the results from transformers are not up to the benchmark — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYYUCABRW2JWZONDMHTSAQLXISE3JANCNFSM6AAAAAATFEE5OM> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

Athuliva · 2023-05-29T12:30:43Z

@Aimen1996 can you drop an email at athulksuresh21@gmail.com or athul.suresh@iva.org.in, This is the git repo https://github.com/athulvingt/transformers

toibanoor · 2023-09-18T17:44:21Z

Did anyone find the bug in this code, I am as well getting a straight line with this implementation? I would appreciate some assistance here.

Athuliva · 2023-09-19T05:49:51Z

@Priyanga-Kasthurirajan, the problem is with the transformer encoder layer in PyTorch. There are some issues when u run the model in eval() mode. You can run the model in train() model itself, but You won't get same result every time you infer with the same input data since the dropout layer is not disabled in the train () mode (dropout layer drops weights randomly)

@toibanoor, i guess you have to override the encoder method

toibanoor · 2023-09-19T05:54:00Z

@Athuliva @Priyanga-Kasthurirajan Do you have any idea how do I go about overriding the encoder method. Been stuck with this for a long time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trained model give constant value prediction outputs #16

Trained model give constant value prediction outputs #16

NickGeo1 commented Dec 21, 2022

qdwang commented Mar 9, 2023

Singal0927 commented Apr 11, 2023

Athuliva commented Apr 12, 2023

MarkiemarkF commented Apr 13, 2023

Priyanga-Kasthurirajan commented May 4, 2023

Athuliva commented May 5, 2023

Aimen1996 commented May 29, 2023

Athuliva commented May 29, 2023

Aimen1996 commented May 29, 2023 •

edited

Athuliva commented May 29, 2023 •

edited

Aimen1996 commented May 29, 2023 via email

Athuliva commented May 29, 2023 •

edited

toibanoor commented Sep 18, 2023

Athuliva commented Sep 19, 2023

toibanoor commented Sep 19, 2023

Trained model give constant value prediction outputs #16

Trained model give constant value prediction outputs #16

Comments

NickGeo1 commented Dec 21, 2022

qdwang commented Mar 9, 2023

Singal0927 commented Apr 11, 2023

Athuliva commented Apr 12, 2023

MarkiemarkF commented Apr 13, 2023

Priyanga-Kasthurirajan commented May 4, 2023

Athuliva commented May 5, 2023

Aimen1996 commented May 29, 2023

Athuliva commented May 29, 2023

Aimen1996 commented May 29, 2023 • edited

Athuliva commented May 29, 2023 • edited

Aimen1996 commented May 29, 2023 via email

Athuliva commented May 29, 2023 • edited

toibanoor commented Sep 18, 2023

Athuliva commented Sep 19, 2023

toibanoor commented Sep 19, 2023

Aimen1996 commented May 29, 2023 •

edited

Athuliva commented May 29, 2023 •

edited

Athuliva commented May 29, 2023 •

edited