Model performance changing a lot. #111

himanshupant24 · 2023-05-17T15:15:03Z

I tried the model for an example with Y_lag=26 and x_lag=10. I ran the model 2-3 times. I am using NSE and RMSE as performance measurement. I found my NSE changing from 0.45 to -0.3. My question is , is the performance of the model expected to change this much with same data and configuration?

ericglem · 2023-05-19T15:38:23Z

I think the best way to assure the variability is large is by running a significant benchmark (i.e, more than 2-3 times) and taking some measurements (like mean, std, median, whatever). Have you tried that?

wilsonrljr · 2023-05-19T15:47:04Z

I agree with @ericglem. Besides, as I sent you by mail, if you could share a small sample of your data and code you are using it'd be great. Without some details its difficult to check whats is happening.

For example: since you are using a neural network with stochastic optimizer, the performance can change each time you run it. But if the performance is changing too much we have to look at the data to check if it is a data related problem or is something else behind the scenes in the package implementation.

Moreover, you said in the issue #110 you are having some troubles to defined the lags for multiple input cases, so maybe the problem can be related to some mistake in that part too.

But as I said, I have to look your code and data (just a sample) to have a proper answer.

himanshupant24 · 2023-06-12T09:39:00Z

Hi @wilsonrljr, I have sent you the data and other information over an email on 26th may. My email id is "himanshupant2411@gmail.com". Can you please have a look and give me your insight?

wilsonrljr · 2023-07-01T17:09:29Z

Hey @himanshupant24 , I'll make some tests with the sample data you sent me.

Regarding the model performance changing, it can absolutely happen in your case (given the details you sent me by mail). In your case, you are changing the lags and, respectively, the model form, so the performance can vary a lot depending on the data.

I'll keep you updated.

himanshupant24 · 2023-07-12T09:46:11Z

Hi @wilsonrljr,

Sorry to bother you again. Did you get a chance to have a look on the dataset?

Thanks,
Himanshu

wilsonrljr · 2023-07-13T16:27:50Z

Hey @himanshupant24 ! I was looking at your case:

First, you said you got the best results when you used water_level as input and output. This is actually wrong because you will have kind of a "causal" relation between the input and output of your system. Maybe I'm not getting the idea right, but it looks like that to me. Let me know if I'm wrong.

So you tried to use the rain data as an input and got worse results. Compared with using the same data as input and output, this is expected. We should work on improve the model that use rain as input but probably we can't reach the save accuracy level of the "causal" method.

In this respect, I want to ask another questions:

Did you remove outliers from your data? I checked the data and there are some outliers on it, so we can try to improve the model a little by processing the outliers.
Did you try to decimate your data in anyways or you are using all samples in the training process?
Have you tried other models than neural networks or neural network is your goal in this case?

As soon as I have your answers we can try to follow some new ideas.

himanshupant24 · 2023-07-19T13:10:20Z

Hi @wilsonrljr , Thanks so much for your response. PFB the response for your questions:

“First, you said you got the best results when you used water_level as input and output”: The logic behind it was to use lag value of the column for forecasting. In this case, I am trying to forecast the value of water_level using its lag. That’s why it is both input and output.
“So you tried to use the rain data as an input and got worse results”: Sorry I forgot to mention one thing, I used rain data along with water_level as input. I expected better performance because I have extra information to the model in the form of rain data. Note: As displayed below, the cross correlation of rain data is high with water_level till 20-25 legs.

"Did you remove outliers from your data? I checked the data and there are some outliers on it, so we can try to improve the model a little by processing the outliers.": Actually these are not outliers because those points represents high water level due to rain/blockage.

"Did you try to decimate your data in anyways or you are using all samples in the training process?": Yes, the data is split into train, test and valid in the ratio of 65/35/35.

"Have you tried other models than neural networks or neural network is your goal in this case?": Yes, started with ARIMA and Prophet,

If you want data from different sensor, I can provide one. Let me know if you need any clarification from my side.

wilsonrljr mentioned this issue Jul 1, 2023

Need to add extra feature in Narx model. #110

Closed

wilsonrljr added the question Further information is requested label Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model performance changing a lot. #111

Model performance changing a lot. #111

himanshupant24 commented May 17, 2023

ericglem commented May 19, 2023

wilsonrljr commented May 19, 2023

himanshupant24 commented Jun 12, 2023

wilsonrljr commented Jul 1, 2023

himanshupant24 commented Jul 12, 2023

wilsonrljr commented Jul 13, 2023

himanshupant24 commented Jul 19, 2023

Model performance changing a lot. #111

Model performance changing a lot. #111

Comments

himanshupant24 commented May 17, 2023

ericglem commented May 19, 2023

wilsonrljr commented May 19, 2023

himanshupant24 commented Jun 12, 2023

wilsonrljr commented Jul 1, 2023

himanshupant24 commented Jul 12, 2023

wilsonrljr commented Jul 13, 2023

himanshupant24 commented Jul 19, 2023