Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use best model and transformer from forecaster pipeline on new data without actual y #77

Open
fstayco opened this issue Aug 29, 2023 · 6 comments
Assignees

Comments

@fstayco
Copy link

fstayco commented Aug 29, 2023

Hi @mikekeith52 . Sorry to ask again. Am still getting familiar with scalecast. This is related to #57 .

I got the following from the forecaster pipeline I ran:

best model = knn
best params = {'n_neighbors': 43}
optimal transformer = Transformer(
transformers = [
('DetrendTransform', {'loess': True}),
('DiffTransform', 1),
('ScaleTransform',)
]
)
f = Forecaster(
DateStartActuals=2016-01-10T00:00:00.000000000
DateEndActuals=2021-01-10T00:00:00.000000000
Freq=None
N_actuals=260
ForecastLength=4
Xvars=['month_8', 'quarter_2', 'quarter_3', 'COVID19', 'dengue_lag_4', 'dengue_lag_5', 'dengue_lag_6', 'dengue_lag_7', 'dengue_lag_8', 'symptoms_of_dengue_lag_8']
TestLength=4
ValidationMetric=rmse
ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'xgboost', 'lightgbm', 'knn']
CILevel=None
CurrentEstimator=knn
GridsFile=Grids
)

Is there a way I can use these in sklearn (if not scalecast) to forecast new values of y based on just the values of Xvars? From what I understand forecaster needs y.

I would appreciate your assistance a lot.

@mikekeith52
Copy link
Owner

Hi, you don't need actual y values. You need historical data to train the models on, but the predictions over the unknown forecast horizon only use the Xvars you pass to the model. So for your data, calling f.export('lvl_fcsts') should give you the forecasted points over the next four periods.

@fstayco
Copy link
Author

fstayco commented Aug 29, 2023

Thanks heaps for clarifying and for your patience. Related to the code that you previously shared, suppose f2 contains more recent data in addition to the one used in f1, would it be equivalent to getting predictions over the new unknown forecast horizon based on the original model?

from scalecast.Forecaster import Forecaster
from scalecast.util import find_optimal_transformation
 
f1 = Forecaster(...)
f2 = Forecaster(...)

f1.add_ar_terms(12)
f2.add_ar_terms(12)
 
# find optimal transformation on series 1
transformer, reverter = find_optimal_transformation(f1)
f1 = transformer.fit_transform(f1)
 
# tune lasso model on series 1
f1.set_estimator('lasso')
f1.tune()
chosen_params = f1.best_params # save best params -- these will also be in f1.history['lasso']['HyperParams']
f1.auto_forecast()
 
# apply transformation to series 2
f2 = transformer.fit_transform(f2)

# apply lasso model with optimal hyperparams to series 2
f2.set_estimator('lasso')
f2.manual_forecast(**chosen_params)

@mikekeith52
Copy link
Owner

mikekeith52 commented Aug 29, 2023

Oh, I think I know what you are asking. One of the nuances with scalecast is that models have to retrain every time predictions are generated. To do what you are describing, you can use f.export_Xvars_df() to see the actual utilized regressors in the model and then f.history[model_nickname]['regr'] to get the scikit-learn regression model to make predictions with. It would be somewhat manual. It would also be very difficult to incorporate transformations. Maybe a good future enhancement would be making that easier. But typically, for what I need time series forecasting for, I always want to retrain the model with the most recent known observations, which is my thinking behind the current scalecast functionality.

@fstayco
Copy link
Author

fstayco commented Aug 29, 2023

For our specific use-case, we need to be able to monitor model drift. With regard the work around, may I request for a code snippet of how I may implement it?

@mikekeith52
Copy link
Owner

Sure, I'll start working on mocking something up. One potentially easier work around would be to iteratively try longer and longer forecast horizons. As long as you know the actuals, the model predictions wouldn't change even if you used shorter horizons but didn't retrain the model.

@mikekeith52 mikekeith52 self-assigned this Aug 29, 2023
mikekeith52 pushed a commit that referenced this issue Sep 15, 2023
- Added `Forecaster.transfer_predict()` method. Only univariate sklearn models supported for now (#77).
- Added `Forecaster.transfer_cis()` method.
- Added `carry_fit_models` attribute in `Forecaster` object that can be changed when object is initialized.
- Added `util.infer_apply_Xvar_selection()` function.
- Changed how many history attributes are stored for each evaluated model, making the `Forecaster` object more memory efficient.
- Refactored forecasting code for sklearn models so that model evaluation is more efficient.
- Changed the `max_ar = 'auto'` behavior in `Forecaster.auto_Xvar_select()`.
- Changed scikit-learn dependency to `<1.3.0` due to it not working with the shap library.
- Fixed an issue with combo modeling where defaults were not working when a previous model had been run test only.
@mikekeith52
Copy link
Owner

Instead of code, I decided to build a method for the Forecaster object that can be used for what you are describing. Please see the notebook for an example of how to apply it. Right now, only sklearn univariate is available for this process but I am planning to implement this for all model types so let me know if there is one that you would prefer to be implemented next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants