Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Models] Return and store models parameters during forecast and CV #639

Open
vspinu opened this issue Sep 16, 2023 · 5 comments
Open

[Models] Return and store models parameters during forecast and CV #639

vspinu opened this issue Sep 16, 2023 · 5 comments

Comments

@vspinu
Copy link

vspinu commented Sep 16, 2023

Description

Currently the models' fit during forecasting and crossvalidation is lost. Would be nice to have a way to preserve the optimal parameters of the model.

One way to implement this is to make the forecast method return the fitted parameters along other metadata. For example it could be a meta slot of the results objects except the vector outputs (cols_m, fitted, mean etc).

Same meta slot could be used for internal metadata, for example time taken for fitting/forecasting per model, a very useful comparison metric which, to the best of my knowledge, is not easy to retrieve in the current setup.

Use case

  • Inspect distribution of parameters of fitted models in forecasting and CV tasks
  • Store time-taken to fit/predict per model/series in forecasting and CV tasks
@jmoralez
Copy link
Member

Hey @vspinu, thanks for using statsforecast. The forecast method is designed to be more memory efficient by returning only the forecasted values. If you're interested in seeing the models attributes you should use fit + predict.

For CV it's the same case, it's designed to just return the forecasts in order to evaluate the models performance. If you want the attributes you can also compute the splits manually and run fit + predict for each fold.

I'll take a look at what we can do to allow you to save the fitting and forecasting times.

@vspinu
Copy link
Author

vspinu commented Sep 19, 2023

Thanks @jmoralez. Fit+predict is surely an option but it would require fitting the models twice. Also there are some implementation differences between fit+predict and forecast (ex. progress bar, fallback model).

I wonder if some consistent abstraction of parameters is warranted more generally. Is there currently a way to fit, say AutoETS, retrieve and store the parameters without storing the AutoETS object itself, and finally recreate the AutoETS from the parameters?

@jmoralez
Copy link
Member

Why would you need to fit the models twice? In your use case you said you wanted to inspect the parameters of the fitted models, this only requires fitting once.

About restoring a model, the parameters vary a lot between the different models, we currently don't have a consistent way to save/retrieve these, but I think it's something we could have on the roadmap. Depending on how you're using the library you currently have a couple of options:

  1. Extracting the model type to avoid searching for it again.
from statsforecast import StatsForecast
from statsforecast.models import AutoETS
from statsforecast.utils import AirPassengersDF

# first fit finds the best model type
sf = StatsForecast(models=[AutoETS(season_length=12)], freq='D')
sf.fit(df=AirPassengersDF)
# fitted_ is of shape n_series, n_models
learned_model = sf.fitted_[0, 0].model_['components']
single_ets = AutoETS(season_length=12, model=learned_model[:3], damped=learned_model[3] != 'N')
single_ets.fit(AirPassengersDF['y'].values)
forecasts = single_ets.predict(h=12, level=[80])
  1. Using the ETS functions directly, recomputes only residuals and some statistics
from statsforecast import StatsForecast
from statsforecast.ets import ets_f, forecast_ets
from statsforecast.models import AutoETS
from statsforecast.utils import AirPassengersDF

# first fit finds the best model type
sf = StatsForecast(models=[AutoETS(season_length=12)], freq='D')
sf.fit(df=AirPassengersDF)
# fitted_ is of shape n_series, n_models
fitted_model = sf.fitted_[0, 0].model_
# use the learned params & state
learned_params = {k: v for k, v in fitted_model.items() if k in ('components', 'par', 'm', 'fit', 'n_params')}
single_ets = ets_f(AirPassengersDF['y'].values, m=12, model=learned_params)
forecasts = forecast_ets(single_ets, h=12, level=[80])

Please let us know if this helps.

@vspinu
Copy link
Author

vspinu commented Sep 20, 2023

Why would you need to fit the models twice?

Once for forecasting and once to get the parameters from fit, or once for CV and once for parameters. I am a bit confused and don't know all the details regarding the redundancy between fit+predict/predict_in_sample vs forecast (with or without fitted=True). I guess it should be possible to get the forecast and even CV by myself from the fit objects. Then yes, one fit would be enough.

Please let us know if this helps.

It does, but both approaches require dealing with internals to some extent, and are not exactly "user-friendly" or generic. Given the huge number of time series in real-life scenarios one would ideally be able store the parameters in a database and would re-create the models on the fly in prediction or monitoring applications. In any case, not a big deal. Feel free to close this one if not considered of great importance.

@jmoralez
Copy link
Member

Sorry for the confusion, the overview is:

  • fit: learns params.
  • predict: uses the params learned during fit to compute future predictions.
  • predict_in_sample: this is used to compute the predictions for the training set. This requires more work and thus is disabled by default. By setting fitted=True these are saved during the fit step.
  • forecast: learns params and compute future predictions, returns only the predictions. This is designed to be more memory efficient, for example in a distributed setting where sending big objects back and forth is expensive.

I agree with you on the second point. We're working towards making deployments easier and more efficient, as a first step we're trying to reduce the dependencies so that the size of the library is smaller (#509, #596, #631). We can address having a way to easily save/load models as a next step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants