Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trained model give constant value prediction outputs #16

Open
NickGeo1 opened this issue Dec 21, 2022 · 15 comments
Open

Trained model give constant value prediction outputs #16

NickGeo1 opened this issue Dec 21, 2022 · 15 comments

Comments

@NickGeo1
Copy link

Hello there,

First of all, I want to say this implementation seems really interesting. I am currently working on timeseries forecasting with Transformer model for my thesis. I studied the following posts:
https://towardsdatascience.com/how-to-make-a-pytorch-transformer-for-time-series-forecasting-69e073d4061e
https://towardsdatascience.com/how-to-run-inference-with-a-pytorch-time-series-transformer-394fd6cbe16c
as well as your repository and code.

It was a huge help for me to implement the model with pytorch, so I want to thank you for sharing your implementation.
I implemented the TimeseriesTransformer class on my project, as well as a few other functions of my own, about training, validation, Batchify data etc.
I modified a little bit the positional encoder part so it can be compatible with a single sequence input. Also, I fixed the uncompatibility issue of batch_first = true, but I already saw that someone else fixed that before me.

Now let me introduce you to my issue. I tried to train the model with some timeseries data. I used the MSE loss function and Adam optimizer as you introduced in the inference post. I trained it for 53 epochs of 21 batches of 16 data each, as it seemed like it was converging at that point. I trained it in a dataset of 357 time units with encoder_in_len = 10 and decoder_in_len = 1. Each sequence was shifted left by one unit, compared to the previous sequence in a batch.
For example, the first 3 sequences of the first batch:

seq1: 343, 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134 (T1->T10)
seq2: 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237 (T2->T11)
seq3: 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237, 698 (T3->T12)
...

The thing is, both for these data and dummy data I used earlier, the model seems to make a wrong inference.
Any time I tried to make an inference on new/trained data to test, the prediction was a constant value for all sequences in the batch.

For example I got the following inference output for the above sequences:
427.4353, 427.4353, 427.4353 (decoder input 134, 237, 698 respectively. The output should be 237, 698 and 0, where T13 = 0)

What do you think is going on here? Is it the loss function, the optimizer, the model or what?
I will really appriciate your answer!

@qdwang
Copy link

qdwang commented Mar 9, 2023

@NickGeo1 Almot same problem here. Have you figured out the reason yet? I tried to make the transformer to learn two very simple time series data. [1,2,3] to predict 4 and [55, 56, 57] to predict 58.

The data i used for encoder and decoder are:

enc_seq = torch.tensor([[1., 2., 3.], [55., 56., 57.]])
dec_seq = torch.tensor([[3.], [57.]])
goal = torch.tensor([[4.], [58.]])

The complete code is here, https://gist.github.com/qdwang/b6037c9117195cc07c4582fdd6d126a8
Don't know what's wrong

@Singal0927
Copy link

I tried on the dataset given by the author, and the transformer returns the constant value as well.
Figure_1

@Athuliva
Copy link

It seems there is some bug in the transformers encoder layer, source: https://discuss.pytorch.org/t/model-eval-predicts-all-the-same-or-nearly-same-outputs/155025/11

Try using a lower version of Pytorch (preferably 1.11)

@MarkiemarkF
Copy link

Hi, unfortunately I'm facing the same issue. I have tried PyTorch 1.11, (and PyTorch 2.0 as well), but that didn't help. I played around with the code a bit and wasn't able to fix it either, did anyone find a solution to this problem?

@Priyanga-Kasthurirajan
Copy link

Hi, did anyone solve this issue? Facing the same problem of predictions remaining constant.

@Athuliva
Copy link

Athuliva commented May 5, 2023

@Priyanga-Kasthurirajan, the problem is with the transformer encoder layer in PyTorch. There are some issues when u run the model in eval() mode. You can run the model in train() model itself, but You won't get same result every time you infer with the same input data since the dropout layer is not disabled in the train () mode (dropout layer drops weights randomly)

@Aimen1996
Copy link

@Athuliva yes you said it right, every time i am also getting different results for my same input data

@Athuliva
Copy link

@Aimen1996 @Priyanga-Kasthurirajan @MarkiemarkF I created transforms in TensorFlow, referring to this blog and TensorFlow's official blog on implementing the transformers language model, but my results are not better than an LSTM network. Would you like to work together? athul.suresh@iva.org.in

@Aimen1996
Copy link

Aimen1996 commented May 29, 2023

@Athuliva i am working with TSF. basically doing comparison between LSTM,transformer and non-stationary transformer.

@Athuliva
Copy link

Athuliva commented May 29, 2023

@Aimen1996 I am also working with TSF, but the results from transformers need to be better. I need to do a comparison study and find the right model for the task

@Aimen1996
Copy link

Aimen1996 commented May 29, 2023 via email

@Athuliva
Copy link

Athuliva commented May 29, 2023

@Aimen1996 can you drop an email at athulksuresh21@gmail.com or athul.suresh@iva.org.in, This is the git repo https://github.com/athulvingt/transformers

@toibanoor
Copy link

Did anyone find the bug in this code, I am as well getting a straight line with this implementation? I would appreciate some assistance here.

@Athuliva
Copy link

@Priyanga-Kasthurirajan, the problem is with the transformer encoder layer in PyTorch. There are some issues when u run the model in eval() mode. You can run the model in train() model itself, but You won't get same result every time you infer with the same input data since the dropout layer is not disabled in the train () mode (dropout layer drops weights randomly)

@toibanoor, i guess you have to override the encoder method

@toibanoor
Copy link

@Athuliva @Priyanga-Kasthurirajan Do you have any idea how do I go about overriding the encoder method. Been stuck with this for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants