New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model performance changing a lot. #111
Comments
I think the best way to assure the variability is large is by running a significant benchmark (i.e, more than 2-3 times) and taking some measurements (like mean, std, median, whatever). Have you tried that? |
I agree with @ericglem. Besides, as I sent you by mail, if you could share a small sample of your data and code you are using it'd be great. Without some details its difficult to check whats is happening. For example: since you are using a neural network with stochastic optimizer, the performance can change each time you run it. But if the performance is changing too much we have to look at the data to check if it is a data related problem or is something else behind the scenes in the package implementation. Moreover, you said in the issue #110 you are having some troubles to defined the lags for multiple input cases, so maybe the problem can be related to some mistake in that part too. But as I said, I have to look your code and data (just a sample) to have a proper answer. |
Hi @wilsonrljr, I have sent you the data and other information over an email on 26th may. My email id is "himanshupant2411@gmail.com". Can you please have a look and give me your insight? |
Hey @himanshupant24 , I'll make some tests with the sample data you sent me. Regarding the model performance changing, it can absolutely happen in your case (given the details you sent me by mail). In your case, you are changing the lags and, respectively, the model form, so the performance can vary a lot depending on the data. I'll keep you updated. |
Hi @wilsonrljr, Sorry to bother you again. Did you get a chance to have a look on the dataset? Thanks, |
Hey @himanshupant24 ! I was looking at your case: First, you said you got the best results when you used water_level as input and output. This is actually wrong because you will have kind of a "causal" relation between the input and output of your system. Maybe I'm not getting the idea right, but it looks like that to me. Let me know if I'm wrong. So you tried to use the rain data as an input and got worse results. Compared with using the same data as input and output, this is expected. We should work on improve the model that use rain as input but probably we can't reach the save accuracy level of the "causal" method. In this respect, I want to ask another questions:
As soon as I have your answers we can try to follow some new ideas. |
Hi @wilsonrljr , Thanks so much for your response. PFB the response for your questions: “First, you said you got the best results when you used water_level as input and output”: The logic behind it was to use lag value of the column for forecasting. In this case, I am trying to forecast the value of water_level using its lag. That’s why it is both input and output. "Did you remove outliers from your data? I checked the data and there are some outliers on it, so we can try to improve the model a little by processing the outliers.": Actually these are not outliers because those points represents high water level due to rain/blockage. "Did you try to decimate your data in anyways or you are using all samples in the training process?": Yes, the data is split into train, test and valid in the ratio of 65/35/35. "Have you tried other models than neural networks or neural network is your goal in this case?": Yes, started with ARIMA and Prophet, If you want data from different sensor, I can provide one. Let me know if you need any clarification from my side. |
I tried the model for an example with Y_lag=26 and x_lag=10. I ran the model 2-3 times. I am using NSE and RMSE as performance measurement. I found my NSE changing from 0.45 to -0.3. My question is , is the performance of the model expected to change this much with same data and configuration?
The text was updated successfully, but these errors were encountered: