How to use best model and transformer from forecaster pipeline on new data without actual y #77

fstayco · 2023-08-29T15:38:12Z

Hi @mikekeith52 . Sorry to ask again. Am still getting familiar with scalecast. This is related to #57 .

I got the following from the forecaster pipeline I ran:

best model = knn
best params = {'n_neighbors': 43}
optimal transformer = Transformer(
transformers = [
('DetrendTransform', {'loess': True}),
('DiffTransform', 1),
('ScaleTransform',)
]
)
f = Forecaster(
DateStartActuals=2016-01-10T00:00:00.000000000
DateEndActuals=2021-01-10T00:00:00.000000000
Freq=None
N_actuals=260
ForecastLength=4
Xvars=['month_8', 'quarter_2', 'quarter_3', 'COVID19', 'dengue_lag_4', 'dengue_lag_5', 'dengue_lag_6', 'dengue_lag_7', 'dengue_lag_8', 'symptoms_of_dengue_lag_8']
TestLength=4
ValidationMetric=rmse
ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'xgboost', 'lightgbm', 'knn']
CILevel=None
CurrentEstimator=knn
GridsFile=Grids
)

Is there a way I can use these in sklearn (if not scalecast) to forecast new values of y based on just the values of Xvars? From what I understand forecaster needs y.

I would appreciate your assistance a lot.

mikekeith52 · 2023-08-29T16:09:04Z

Hi, you don't need actual y values. You need historical data to train the models on, but the predictions over the unknown forecast horizon only use the Xvars you pass to the model. So for your data, calling f.export('lvl_fcsts') should give you the forecasted points over the next four periods.

fstayco · 2023-08-29T17:50:11Z

Thanks heaps for clarifying and for your patience. Related to the code that you previously shared, suppose f2 contains more recent data in addition to the one used in f1, would it be equivalent to getting predictions over the new unknown forecast horizon based on the original model?

from scalecast.Forecaster import Forecaster
from scalecast.util import find_optimal_transformation
 
f1 = Forecaster(...)
f2 = Forecaster(...)

f1.add_ar_terms(12)
f2.add_ar_terms(12)
 
# find optimal transformation on series 1
transformer, reverter = find_optimal_transformation(f1)
f1 = transformer.fit_transform(f1)
 
# tune lasso model on series 1
f1.set_estimator('lasso')
f1.tune()
chosen_params = f1.best_params # save best params -- these will also be in f1.history['lasso']['HyperParams']
f1.auto_forecast()
 
# apply transformation to series 2
f2 = transformer.fit_transform(f2)

# apply lasso model with optimal hyperparams to series 2
f2.set_estimator('lasso')
f2.manual_forecast(**chosen_params)

mikekeith52 · 2023-08-29T18:35:27Z

Oh, I think I know what you are asking. One of the nuances with scalecast is that models have to retrain every time predictions are generated. To do what you are describing, you can use f.export_Xvars_df() to see the actual utilized regressors in the model and then f.history[model_nickname]['regr'] to get the scikit-learn regression model to make predictions with. It would be somewhat manual. It would also be very difficult to incorporate transformations. Maybe a good future enhancement would be making that easier. But typically, for what I need time series forecasting for, I always want to retrain the model with the most recent known observations, which is my thinking behind the current scalecast functionality.

fstayco · 2023-08-29T19:08:37Z

For our specific use-case, we need to be able to monitor model drift. With regard the work around, may I request for a code snippet of how I may implement it?

mikekeith52 · 2023-08-29T22:27:57Z

Sure, I'll start working on mocking something up. One potentially easier work around would be to iteratively try longer and longer forecast horizons. As long as you know the actuals, the model predictions wouldn't change even if you used shorter horizons but didn't retrain the model.

- Added `Forecaster.transfer_predict()` method. Only univariate sklearn models supported for now (#77). - Added `Forecaster.transfer_cis()` method. - Added `carry_fit_models` attribute in `Forecaster` object that can be changed when object is initialized. - Added `util.infer_apply_Xvar_selection()` function. - Changed how many history attributes are stored for each evaluated model, making the `Forecaster` object more memory efficient. - Refactored forecasting code for sklearn models so that model evaluation is more efficient. - Changed the `max_ar = 'auto'` behavior in `Forecaster.auto_Xvar_select()`. - Changed scikit-learn dependency to `<1.3.0` due to it not working with the shap library. - Fixed an issue with combo modeling where defaults were not working when a previous model had been run test only.

mikekeith52 · 2023-09-15T14:58:17Z

Instead of code, I decided to build a method for the Forecaster object that can be used for what you are describing. Please see the notebook for an example of how to apply it. Right now, only sklearn univariate is available for this process but I am planning to implement this for all model types so let me know if there is one that you would prefer to be implemented next.

mikekeith52 self-assigned this Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use best model and transformer from forecaster pipeline on new data without actual y #77

How to use best model and transformer from forecaster pipeline on new data without actual y #77

fstayco commented Aug 29, 2023

mikekeith52 commented Aug 29, 2023

fstayco commented Aug 29, 2023 •

edited

mikekeith52 commented Aug 29, 2023 •

edited

fstayco commented Aug 29, 2023

mikekeith52 commented Aug 29, 2023

mikekeith52 commented Sep 15, 2023

How to use best model and transformer from forecaster pipeline on new data without actual y #77

How to use best model and transformer from forecaster pipeline on new data without actual y #77

Comments

fstayco commented Aug 29, 2023

mikekeith52 commented Aug 29, 2023

fstayco commented Aug 29, 2023 • edited

mikekeith52 commented Aug 29, 2023 • edited

fstayco commented Aug 29, 2023

mikekeith52 commented Aug 29, 2023

mikekeith52 commented Sep 15, 2023

fstayco commented Aug 29, 2023 •

edited

mikekeith52 commented Aug 29, 2023 •

edited