Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model performance changing a lot. #111

Open
himanshupant24 opened this issue May 17, 2023 · 7 comments
Open

Model performance changing a lot. #111

himanshupant24 opened this issue May 17, 2023 · 7 comments
Labels
question Further information is requested

Comments

@himanshupant24
Copy link

I tried the model for an example with Y_lag=26 and x_lag=10. I ran the model 2-3 times. I am using NSE and RMSE as performance measurement. I found my NSE changing from 0.45 to -0.3. My question is , is the performance of the model expected to change this much with same data and configuration?

@ericglem
Copy link

I think the best way to assure the variability is large is by running a significant benchmark (i.e, more than 2-3 times) and taking some measurements (like mean, std, median, whatever). Have you tried that?

@wilsonrljr
Copy link
Owner

I agree with @ericglem. Besides, as I sent you by mail, if you could share a small sample of your data and code you are using it'd be great. Without some details its difficult to check whats is happening.

For example: since you are using a neural network with stochastic optimizer, the performance can change each time you run it. But if the performance is changing too much we have to look at the data to check if it is a data related problem or is something else behind the scenes in the package implementation.

Moreover, you said in the issue #110 you are having some troubles to defined the lags for multiple input cases, so maybe the problem can be related to some mistake in that part too.

But as I said, I have to look your code and data (just a sample) to have a proper answer.

@himanshupant24
Copy link
Author

Hi @wilsonrljr, I have sent you the data and other information over an email on 26th may. My email id is "himanshupant2411@gmail.com". Can you please have a look and give me your insight?

@wilsonrljr
Copy link
Owner

Hey @himanshupant24 , I'll make some tests with the sample data you sent me.

Regarding the model performance changing, it can absolutely happen in your case (given the details you sent me by mail). In your case, you are changing the lags and, respectively, the model form, so the performance can vary a lot depending on the data.

I'll keep you updated.

@himanshupant24
Copy link
Author

Hi @wilsonrljr,

Sorry to bother you again. Did you get a chance to have a look on the dataset?

Thanks,
Himanshu

@wilsonrljr
Copy link
Owner

Hey @himanshupant24 ! I was looking at your case:

First, you said you got the best results when you used water_level as input and output. This is actually wrong because you will have kind of a "causal" relation between the input and output of your system. Maybe I'm not getting the idea right, but it looks like that to me. Let me know if I'm wrong.

So you tried to use the rain data as an input and got worse results. Compared with using the same data as input and output, this is expected. We should work on improve the model that use rain as input but probably we can't reach the save accuracy level of the "causal" method.

In this respect, I want to ask another questions:

  • Did you remove outliers from your data? I checked the data and there are some outliers on it, so we can try to improve the model a little by processing the outliers.

  • Did you try to decimate your data in anyways or you are using all samples in the training process?

  • Have you tried other models than neural networks or neural network is your goal in this case?

As soon as I have your answers we can try to follow some new ideas.

@himanshupant24
Copy link
Author

Hi @wilsonrljr , Thanks so much for your response. PFB the response for your questions:

“First, you said you got the best results when you used water_level as input and output”: The logic behind it was to use lag value of the column for forecasting. In this case, I am trying to forecast the value of water_level using its lag. That’s why it is both input and output.
“So you tried to use the rain data as an input and got worse results”: Sorry I forgot to mention one thing, I used rain data along with water_level as input. I expected better performance because I have extra information to the model in the form of rain data. Note: As displayed below, the cross correlation of rain data is high with water_level till 20-25 legs.

image

"Did you remove outliers from your data? I checked the data and there are some outliers on it, so we can try to improve the model a little by processing the outliers.": Actually these are not outliers because those points represents high water level due to rain/blockage.

"Did you try to decimate your data in anyways or you are using all samples in the training process?": Yes, the data is split into train, test and valid in the ratio of 65/35/35.

"Have you tried other models than neural networks or neural network is your goal in this case?": Yes, started with ARIMA and Prophet,

image

If you want data from different sensor, I can provide one. Let me know if you need any clarification from my side.

@wilsonrljr wilsonrljr added the question Further information is requested label Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants