Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARIMA Normalization Functionality #89

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ml874
Copy link
Contributor

@ml874 ml874 commented Jan 29, 2020

ARIMA here is used without the moving averages component to normalize and forecast time series data.

An ARIMA model is selected from 9 possible combinations: (0,0,0), (1,0,0), (2,0,0), (0,1,0), (1,1,0), (2,1,0), (0,2,0), (1,2,0), (2,2,0). The time series is split into train and test sets and an ARIMA model is fit for every combination on the training set. The model with the lowest mean-squared error (MSE) on the test set is selected as the best model. The original times series can then be transformed by the best model.

import pandas as pd
import numpy as np
import gs_quant.timeseries as ts

series = pd.Series(np.random.randint(0,100,size=(1, 100))[0])

arima = ts.arima()
arima.fit(series, train_size=0.8)

arima.transform(series)

gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
@@ -0,0 +1,246 @@
# Copyright 2020 Goldman Sachs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the file should live in one of the packages - either statistics or econometrics

gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
self.best_params = {}


def _evaluate_arima_model(self, X: Union[pd.Series, pd.DataFrame], arima_order: Tuple[int, int, int], train_size: float, freq: str) -> Tuple[float, dict]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train size should be an int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so it could take in a float, int or None (similar to what scikit-learn does).

Is that too complicated or should we just use int to simplify things

gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
best_ma_coef = ma_coef
best_resid = resid
except Exception as e:
print(' {}'.format(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls raise exception, remove print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certain combinations of (p, q, d) will raise the following exception: Estimation requires the inclusion of least one AR term, MA term, a constant or an exogenous variable.

Raise exception will then break the training loop. Maybe it's a better idea to just print the error and move on to the next combination of (p, q, d)?

gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
gs_quant/test/timeseries/test_arima.py Outdated Show resolved Hide resolved
gs_quant/test/timeseries/test_arima.py Outdated Show resolved Hide resolved
gs_quant/test/timeseries/test_arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
gs_quant/timeseries/arima.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants