How to use `MultiplexForecaster` in auto-ml use case #5122

yarnabrina · 2023-08-18T14:13:48Z

yarnabrina
Aug 18, 2023
Collaborator

This is a manual code that performs an auto-ml forecasting use case:

Minimal Reproducible Example

from sklearn.linear_model import Lasso, LinearRegression, Ridge
from sklearn.preprocessing import StandardScaler
from sktime.datasets import load_longley
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import (
    ForecastingPipeline,
    ForecastX,
    TransformedTargetForecaster,
    make_reduction,
)
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import mean_absolute_error
from sktime.transformations.compose import TransformerPipeline
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.transformations.series.detrend import Deseasonalizer, Detrender
from sktime.transformations.series.impute import Imputer

sklearn_scaler = StandardScaler()
sktime_scaler = TabularToSeriesAdaptor(sklearn_scaler)

sktime_imputer = Imputer(method="median")

sktime_detrender = Detrender()

sktime_deseasonaliser = Deseasonalizer()

X_transformer = TransformerPipeline([sktime_scaler, sktime_imputer])
y_transformer = TransformerPipeline([sktime_detrender, sktime_deseasonaliser])

sklearn_lm = LinearRegression()
sklearn_ridge = Ridge()
sklearn_lasso = Lasso()

sktime_lm = make_reduction(sklearn_lm, windows_identical=False)
sktime_ridge = make_reduction(sklearn_ridge, windows_identical=False)
sktime_lasso = make_reduction(sklearn_lasso, windows_identical=False)

model_lm = TransformedTargetForecaster([y_transformer, sktime_lm])
model_ridge = TransformedTargetForecaster([y_transformer, sktime_ridge])
model_lasso = TransformedTargetForecaster([y_transformer, sktime_lasso])

forecaster_lm = ForecastX(model_lm.clone(), model_lm.clone())
forecaster_ridge = ForecastX(model_ridge.clone(), model_ridge.clone())
forecaster_lasso = ForecastX(model_lasso.clone(), model_lasso.clone())

pipeline_lm = ForecastingPipeline([X_transformer, forecaster_lm])
pipeline_ridge = ForecastingPipeline([X_transformer, forecaster_ridge])
pipeline_lasso = ForecastingPipeline([X_transformer, forecaster_lasso])

y, X = load_longley()

fh = ForecastingHorizon([1, 2, 3, 4])

y_train, y_test, X_train, X_test = temporal_train_test_split(y, X=X, fh=fh)

pipeline_lm.fit(y_train, X=X_train, fh=fh)
pipeline_ridge.fit(y_train, X=X_train, fh=fh)
pipeline_lasso.fit(y_train, X=X_train, fh=fh)

y_pred_lm = pipeline_lm.predict(fh=fh, X=X_test)
y_pred_ridge = pipeline_ridge.predict(fh=fh, X=X_test)
y_pred_lasso = pipeline_lasso.predict(fh=fh, X=X_test)

error_lm = mean_absolute_error(y_test, y_pred_lm)
error_ridge = mean_absolute_error(y_test, y_pred_ridge)
error_lasso = mean_absolute_error(y_test, y_pred_lasso)

errors = {"lm": error_lm, "ridge": error_ridge, "lasso": error_lasso}
pipelines = {"lm": pipeline_lm, "ridge": pipeline_ridge, "lasso": pipeline_lasso}

best_model = min(errors, key=errors.get)
best_pipeline = pipelines[best_model]

best_pipeline.update(y_test, X=X_test, update_params=True)

The main steps are as follows:

Initiate a TransformerPipeline for X.
Initiate a TransformerPipeline for y
Initiate forecasters 1, 2, ..., k
Initiate TransformedTargetForecaster for all forecasters combining steps 2 and 3
Initiate ForecastingPipeline combining steps 1 and 4
Split training data into train and test
Train each of the forecasting pipelines on training data
Predict for test data using each of the forecasting pipelines
Evaluate each of the forecasting pipelines
Pick best forecasting pipeline
Update step 10 pipeline with test data
Make final predictions for future

As it's already pointed out in Discord, steps 1/2 may be redundant as steps 4/5 can take multiple transformers directly. But since it's not wrong or having any adverse effects, keeping it unchanged.

Now, this is working, but here all transformers are being trained multiple times, and that's completely unnecessary. The training is also not parallel, which is bad case for any amount of big data. So, I'm looking for a way to switch to sktime native approaches. Of course the obvious choice was MultiplexForecaster, and that's where I start to face problems.

My first issue is that it needs specification of which model to use. My goal is to find best forecaster among forecasters, so I'm not sure how to provide selected_forecaster.

Let's suppose I don't provide any, and let it use the "first forecaster in the list". Then as per documentation, it picks the first one. This is the second issue, as this is not desired and I want to train all forecaster to see what's best.

My third issue is there does not seem to be a way to validate and compare models. I could not find an option to specify the validation dataset to use and the evaluation metric of choice.

Please let me know if any further details is required.

yarnabrina · 2023-08-18T14:14:22Z

yarnabrina
Aug 18, 2023
Collaborator Author

FYI: @fkiraly, @benHeid as continuation of Discord thread.

0 replies

fkiraly · 2023-08-18T16:28:50Z

fkiraly
Aug 18, 2023
Maintainer

Thanks for the code!

My suggestion from discord would be to use the MultiplexForecaster here to allow tuning over the forecaster selection.

from sklearn.linear_model import Lasso, LinearRegression, Ridge
from sklearn.preprocessing import StandardScaler
from sktime.datasets import load_longley
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import (
    ForecastingPipeline,
    ForecastX,
    MultiplexForecaster,
    TransformedTargetForecaster,
    make_reduction,
)
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import mean_absolute_error
from sktime.transformations.compose import TransformerPipeline
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.transformations.series.detrend import Deseasonalizer, Detrender
from sktime.transformations.series.impute import Imputer

sklearn_scaler = StandardScaler()
sktime_scaler = TabularToSeriesAdaptor(sklearn_scaler)

sktime_imputer = Imputer(method="median")

sktime_detrender = Detrender()

sktime_deseasonaliser = Deseasonalizer()

X_transformer = TransformerPipeline([sktime_scaler, sktime_imputer])
y_transformer = TransformerPipeline([sktime_detrender, sktime_deseasonaliser])

sklearn_lm = LinearRegression()
sklearn_ridge = Ridge()
sklearn_lasso = Lasso()

sktime_lm = make_reduction(sklearn_lm, windows_identical=False)
sktime_ridge = make_reduction(sklearn_ridge, windows_identical=False)
sktime_lasso = make_reduction(sklearn_lasso, windows_identical=False)

forecaster_multiplex = MultiplexForecaster(forecasters=[
    ("lm", sktime_lm),
    ("ridge", sktime_ridge),
    ("lasso", sktime_lasso),
])

model = TransformedTargetForecaster([y_transformer, forecaster_multiplex])

forecasterx = ForecastX(forecaster_multiplex.clone(), forecaster_multiplex.clone())

pipeline = ForecastingPipeline([X_transformer, forecasterx])

param_grid={
    "ForecastX__forecaster_y__selected_forecaster": ["lm", "ridge", "lasso"],
    "ForecastX__forecaster_X__selected_forecaster": ["lm", "ridge", "lasso"],
}

tuned_pipeline = ForecastingGridSearchCV(pipeline, params=param_grid , metric=mean_absolute_error)

y, X = load_longley()

fh = ForecastingHorizon([1, 2, 3, 4])

5 replies

hazrulakmal Aug 18, 2023
Collaborator

any reason why it is not the transformed y forecaster that is used as forecaster_y?

model = TransformedTargetForecaster([y_transformer, forecaster_multiplex])
forecasterx = ForecastX(model.clone(), forecaster_multiplex.clone())

yarnabrina Aug 18, 2023
Collaborator Author

Hi @fkiraly, thanks for sharing this, but unfortunately it's not working for me. Can you please share how to use train/test splits with ForecastingGridSearchCV, like what to pass for positional cv argument?

And I'm thinking about creating the grid. I'm not sure if I can predict the key names reliabbly in a completely automated flow.

yarnabrina Aug 18, 2023
Collaborator Author

~~@hazrulakmal I didn't follow your question. Why should we not use the transformations for forecaster_X?~~

~~Will it not make sense to treat those as independent forecasting problem with no exogenous? In this particular case, won't it be better to make those stationary first?~~

I see you meant the same thing. I misread @fkiraly's code. Sorry, please ignore.

I'd prefer this though.

model = TransformedTargetForecaster([y_transformer, forecaster_multiplex])
forecasterx = ForecastX(model.clone(), model.clone())

hazrulakmal Aug 18, 2023
Collaborator

My third issue is there does not seem to be a way to validate and compare models. I could not find an option to specify the validation dataset to use and the evaluation metric of choice.

&

Can you please share how to use train/test splits with ForecastingGridSearchCV, like what to pass for positional cv argument?

for validation with splitter, the cv argument, you can create your own iterable or you can use ExpandingGreedySplitter as a temporal_train_test_split for your use cases.

cv = ExpandingGreedySplitter(test_size=4, fold=1)  # will create the last 4 time points as a test set.
tuned_pipeline = ForecastingGridSearchCV(pipeline, cv=cv, param_grid=param_grid , scoring=mean_absolute_error)

but using ExpandingGreedySplitter is more of patch for temporal_train_test_split see #5107

fkiraly Aug 18, 2023
Maintainer

Can you please share how to use train/test splits

any cv should work here?

yarnabrina · 2023-08-19T07:27:29Z

yarnabrina
Aug 19, 2023
Collaborator Author

Hi @fkiraly, @hazrulakmal, sorry to ask one naive question after another, but I am facing results that is different from what I expect, so asking again.

Here's the reproducible code:

import time
import warnings

from sklearn.linear_model import Lasso, LinearRegression, Ridge
from sklearn.preprocessing import StandardScaler
from sktime.datasets import load_longley
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import (
    ForecastingPipeline,
    MultiplexForecaster,
    TransformedTargetForecaster,
    make_reduction,
)
from sktime.forecasting.model_selection import (
    ExpandingGreedySplitter,
    ForecastingGridSearchCV,
    temporal_train_test_split,
)
from sktime.performance_metrics.forecasting import mean_absolute_error
from sktime.transformations.compose import TransformerPipeline
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.transformations.series.detrend import Deseasonalizer, Detrender
from sktime.transformations.series.impute import Imputer

warnings.filterwarnings("ignore")

y, X = load_longley()

fh = ForecastingHorizon([1, 2])

y_past, y_future, X_past, X_future = temporal_train_test_split(y, X=X, test_size=max(fh))

sklearn_scaler = StandardScaler()
sktime_scaler = TabularToSeriesAdaptor(sklearn_scaler)

sktime_imputer = Imputer(method="median")

sktime_detrender = Detrender()
sktime_deseasonaliser = Deseasonalizer()

X_transformer = TransformerPipeline([("scaler", sktime_scaler), ("imputer", sktime_imputer)])
y_transformer = TransformerPipeline(
    [("detrender", sktime_detrender), ("deseasonaliser", sktime_deseasonaliser)]
)

sklearn_lm = LinearRegression()
sklearn_ridge = Ridge()
sklearn_lasso = Lasso()

sktime_lm = make_reduction(sklearn_lm, windows_identical=False)
sktime_ridge = make_reduction(sklearn_ridge, windows_identical=False)
sktime_lasso = make_reduction(sklearn_lasso, windows_identical=False)

model_lm = TransformedTargetForecaster(
    [("y_transformer", y_transformer), ("reduction_lm", sktime_lm)]
)
model_ridge = TransformedTargetForecaster(
    [("y_transformer", y_transformer), ("reduction_ridge", sktime_ridge)]
)
model_lasso = TransformedTargetForecaster(
    [("y_transformer", y_transformer), ("reduction_lasso", sktime_lasso)]
)

pipeline_lm = ForecastingPipeline([("X_transformer", X_transformer), ("model_lm", model_lm)])
pipeline_ridge = ForecastingPipeline(
    [("X_transformer", X_transformer), ("model_ridge", model_ridge)]
)
pipeline_lasso = ForecastingPipeline(
    [("X_transformer", X_transformer), ("model_lasso", model_lasso)]
)

y_train, y_test, X_train, X_test = temporal_train_test_split(y_past, X=X_past, test_size=max(fh))

tic_1 = time.perf_counter()

pipeline_lm.fit(y_train, X=X_train, fh=fh)
pipeline_ridge.fit(y_train, X=X_train, fh=fh)
pipeline_lasso.fit(y_train, X=X_train, fh=fh)

y_pred_lm = pipeline_lm.predict(fh=fh, X=X_test)
y_pred_ridge = pipeline_ridge.predict(fh=fh, X=X_test)
y_pred_lasso = pipeline_lasso.predict(fh=fh, X=X_test)

error_lm = mean_absolute_error(y_test, y_pred_lm)
error_ridge = mean_absolute_error(y_test, y_pred_ridge)
error_lasso = mean_absolute_error(y_test, y_pred_lasso)

errors = {"lm": error_lm, "ridge": error_ridge, "lasso": error_lasso}
pipelines = {"lm": pipeline_lm, "ridge": pipeline_ridge, "lasso": pipeline_lasso}

best_model = min(errors, key=errors.get)
best_pipeline = pipelines[best_model]

best_pipeline.update(y_test, X=X_test, update_params=True)

best_pipeline_preds = best_pipeline.predict(fh=fh, X=X_future)

toc_1 = time.perf_counter()

print("#" * 25, "best_pipeline", "#" * 25)
print(f"time {round(toc_1 - tic_1, 4)}")
print(f"model {best_model}")
print(f"score {errors[best_model]}")
print(f"{best_pipeline_preds}")
print("#" * 25, "best_pipeline", "#" * 25)

sktime_multiplex = MultiplexForecaster(
    [
        ("reduction_lm", sktime_lm),
        ("reduction_ridge", sktime_ridge),
        ("reduction_lasso", sktime_lasso),
    ]
)

sktime_model = TransformedTargetForecaster(
    [("y_transformer", y_transformer), ("sktime_multiplex", sktime_multiplex)]
)

sktime_pipeline = ForecastingPipeline(
    [("X_transformer", X_transformer), ("sktime_model", sktime_model)]
)

cv = ExpandingGreedySplitter(max(fh), folds=1)

grid = {
    "sktime_model__sktime_multiplex__selected_forecaster": [
        "reduction_lm",
        "reduction_ridge",
        "reduction_lasso",
    ],
    "sktime_model__sktime_multiplex__selected_forecaster": [
        "reduction_lm",
        "reduction_ridge",
        "reduction_lasso",
    ],
}

tuner = ForecastingGridSearchCV(
    sktime_pipeline, cv, grid, mean_absolute_error, refit=True, update_behaviour="inner_only"
)

tic_2 = time.perf_counter()

tuner.fit(y_past, X=X_past, fh=fh)

tuner_preds = tuner.predict(fh=fh, X=X_future)

toc_2 = time.perf_counter()

print("#" * 25, "tuner", "#" * 25)
print(f"time {round(toc_2 - tic_2, 4)}")
print(f"model {tuner.best_params_}")
print(f"score {tuner.best_score_}")
print(f"{tuner_preds}")
print("#" * 25, "tuner", "#" * 25)

Based on my current understanding, the following two codes (in same script) are supposed to be identical with respect to forecasting logic. So, my expectations were at least these:

both will give identical predictions (there's not supposed to be a random component in any of these estimators)
multiplex+grid-search will be faster than manual approach (no need to train common steps, i.e. transformers multiple times)

But this is not the case. I ran this 5 time, and here are the outputs:

❯ python multiplex_test.py
######################### best_pipeline #########################
time 0.4305
model ridge
score 446.0292865423544
1961    71012.904293
1962    70445.476100
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### best_pipeline #########################
######################### tuner #########################
time 0.9406
model {'sktime_model__sktime_multiplex__selected_forecaster': 'reduction_ridge'}
score 446.0292865423544
1961    71013.132734
1962    70445.628125
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### tuner #########################
❯ python multiplex_test.py
######################### best_pipeline #########################
time 0.4095
model ridge
score 446.0292865423544
1961    71012.904293
1962    70445.476100
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### best_pipeline #########################
######################### tuner #########################
time 0.9334
model {'sktime_model__sktime_multiplex__selected_forecaster': 'reduction_ridge'}
score 446.0292865423544
1961    71013.132734
1962    70445.628125
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### tuner #########################
❯ python multiplex_test.py
######################### best_pipeline #########################
time 0.4115
model ridge
score 446.0292865423544
1961    71012.904293
1962    70445.476100
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### best_pipeline #########################
######################### tuner #########################
time 0.8695
model {'sktime_model__sktime_multiplex__selected_forecaster': 'reduction_ridge'}
score 446.0292865423544
1961    71013.132734
1962    70445.628125
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### tuner #########################
❯ python multiplex_test.py
######################### best_pipeline #########################
time 0.3849
model ridge
score 446.0292865423544
1961    71012.904293
1962    70445.476100
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### best_pipeline #########################
######################### tuner #########################
time 0.9152
model {'sktime_model__sktime_multiplex__selected_forecaster': 'reduction_ridge'}
score 446.0292865423544
1961    71013.132734
1962    70445.628125
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### tuner #########################
❯ python multiplex_test.py
######################### best_pipeline #########################
time 0.3936
model ridge
score 446.0292865423544
1961    71012.904293
1962    70445.476100
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### best_pipeline #########################
######################### tuner #########################
time 0.9738
model {'sktime_model__sktime_multiplex__selected_forecaster': 'reduction_ridge'}
score 446.0292865423544
1961    71013.132734
1962    70445.628125
Freq: A-DEC, Name: TOTEMP, dtype: float64
######################### tuner #########################

I interpret the following from these:

both approaches pick same best model
validation metric corresponding to best model from both approaches are same
manual is always faster than tuner, and it's almost twice based on the numbers (probably not twice, rather a fixed cost)
future predictions from both approaches always differ, and it's not random as replications show consistent predictions for each approach

Please let me know if you can not reproduce these results and/or disagree with these interpretations and/or draw some additional inferences.

Do these match your expectations?

I tried with a sample random dataset as well instead of Longley datasets, and I do infer similarly for that one as well. Only difference is the amount of speed difference, which rather provides evidence to the guess of fixed costs.

from sktime.utils._testing.hierarchical import _make_hierarchical

df = _make_hierarchical(
    hierarchy_levels=(2, 3),
    max_timepoints=(30 * 3),
    min_timepoints=(30 * 3),
    n_columns=4,
    all_positive=True,
    random_state=0,
)

y = df[["c0", "c1"]]
X = df[["c2", "c3"]]

fh = ForecastingHorizon(range(1, 30 + 1))

3 replies

yarnabrina Aug 19, 2023
Collaborator Author

@benHeid will it be possible for you to share a script to do the same using graphical pipeline? I'd like to try it out as well and benchmark by checking out your branch locally.

fkiraly Aug 19, 2023
Maintainer

one naive question after anothe

You're very polite, this is not a naive question, and in fact astutely observed - I would have dismissed the observation, but indeed there are small, unexplained differences.

First, I looked in detail into the two estimators, and it looks to me they should be the same.

Three sources of differences I thought could be possible and hard to spot, so I explicitly checked:

different tuning grid
tuning happens at a different level
split is different

At least in what the object docstrings claim, it appears to me these should be all identical in your code.
Of course they can be nominally identical but de-facto different, e.g., an offset of plus-minus-one somewhere by accident.

both will give identical predictions (there's not supposed to be a random component in any of these estimators)

I would also expect that, up to numerical accuracy.
In theory there could be, since the solvers employ gradient descent based methods, but I would also believe that the differencese - while they exist - should be of lower order, even with error propagation.

It should be noted that the differences are not large - that would indicate, if not explained by randomness, that some subsetting issue with a minor difference in data (plusminus 1 index?) could be going on.

multiplex+grid-search will be faster than manual approach (no need to train common steps, i.e. transformers multiple times)

The difference is probably explained by you not re-fitting a fresh estimator on all the data in workflow 1, as compared to workflow 2.

If not though, there might be some runtime leak to hunt down - a minimal difference example would be appreciated.

yarnabrina Aug 20, 2023
Collaborator Author

I'll check and get back with that example (or absence of it), but two quick questions:

You said about plus minus 1 index multiple times. I don't know what that means. Can you please explain?
Assuming that the time difference is coming from update in manual vs full refit in multiplex+tune approach, do you mean there's no expected performance improvement of this approach? That'll demotivate me at least in planning to switch to this format, because I have a lot more abilities (full control of everything, e.g. error handling, logging, metrics, etc.) in manual for loop and apparently nothing much to gain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use `MultiplexForecaster` in auto-ml use case #5122

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to use MultiplexForecaster in auto-ml use case #5122

yarnabrina Aug 18, 2023 Collaborator

Replies: 3 comments · 8 replies

yarnabrina Aug 18, 2023 Collaborator Author

fkiraly Aug 18, 2023 Maintainer

hazrulakmal Aug 18, 2023 Collaborator

yarnabrina Aug 18, 2023 Collaborator Author

yarnabrina Aug 18, 2023 Collaborator Author

hazrulakmal Aug 18, 2023 Collaborator

fkiraly Aug 18, 2023 Maintainer

yarnabrina Aug 19, 2023 Collaborator Author

yarnabrina Aug 19, 2023 Collaborator Author

fkiraly Aug 19, 2023 Maintainer

yarnabrina Aug 20, 2023 Collaborator Author

How to use `MultiplexForecaster` in auto-ml use case #5122

yarnabrina
Aug 18, 2023
Collaborator

Replies: 3 comments 8 replies

yarnabrina
Aug 18, 2023
Collaborator Author

fkiraly
Aug 18, 2023
Maintainer

hazrulakmal Aug 18, 2023
Collaborator

yarnabrina Aug 18, 2023
Collaborator Author

yarnabrina Aug 18, 2023
Collaborator Author

hazrulakmal Aug 18, 2023
Collaborator

fkiraly Aug 18, 2023
Maintainer

yarnabrina
Aug 19, 2023
Collaborator Author

yarnabrina Aug 19, 2023
Collaborator Author

fkiraly Aug 19, 2023
Maintainer

yarnabrina Aug 20, 2023
Collaborator Author