-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeseries Cross Validation #200
Comments
Thanks for opening this, and thank you for the example code and traceback! It looks like the issue stems from the fact that I'd love to add support for this! Can you provide/recommend a toy dataset that looks like the one you're trying to use, so I can build some regression tests? At the risk of making myself sound like a dummy, I don't think I've worked on a problem that uses both |
Lol! That is not possible!
They are not features in the DataFrame. They are distinct pandas Series that are timestamps of when an equity trade is made. In Finance, typically want to use a walk forward analysis where we remove training samples that have eval times posterior to the validation prediction times. These samples are removed based on the pred_times and eval_times needed as arguments to PurgedWalkForwardCV. In my case, I have a training set and two series of timestamps for pred_times and eval_times where the date indexes all match.
I created 3 pickle files for the data set (X_train, eval_times, pred_times):
Here's a sample of how to make the splits:
Thank you so much for your help :) |
Sorry about the delay! The TL;DR version of my findings is that I don’t think HH can support time-series CV right now. Here’s the long version: I was able to throw together a quick/dirty subclass of So I don’t think we can get this working properly just by using some combination of custom My quick and dirty from hyperparameter_hunter import Environment, CVExperiment
from timeseriescv.cross_validation import PurgedWalkForwardCV
from xgboost import XGBClassifier
import pandas as pd
class UglyPurgedWalkForwardCV(PurgedWalkForwardCV):
def __init__(self, pred_times=None, eval_times=None, split_by_time=False, **kwargs):
"""Override initialization to receive the three extra kwargs expected by
:meth:`split`. Mangle the attribute names to avoid any possible
collisions with the original attributes of :class:`PurgedWalkForwardCV`"""
self.__pred_times = pred_times
self.__eval_times = eval_times
self.__split_by_time = split_by_time
super().__init__(**kwargs)
def split(self, X, y=None, **kwargs):
"""Override `split` to look more like SKLearn's CV classes, and fetch the
mangled attributes set on initialization, rather than expecting them here"""
return super().split(
X,
y,
pred_times=self.__pred_times,
eval_times=self.__eval_times,
split_by_time=self.__split_by_time
)
if __name__ == "__main__":
data_df = pd.read_pickle(train_data_path)
p_times = pd.read_pickle(pred_times_path)
e_times = pd.read_pickle(eval_times_path)
env = Environment(
train_dataset=data_df,
target_column="bin",
results_path="HyperparameterHunterAssets",
metrics=["roc_auc_score"],
cv_type=UglyPurgedWalkForwardCV,
cv_params=dict(n_splits=5, pred_times=p_times, eval_times=e_times),
)
exp = CVExperiment(XGBClassifier) Output/error traceback:
I'd love to hear your thoughts on adding support for time-series problems! Sorry it's not working at the moment, though! |
Thank you so much for your effort to get this to work. I should be able to work with the sklearn TimeSeriesSplit method so it's not a huge deal. I will definitely look into how I can help to support timeseriescv. I am working on a project at the moment that is taking a considerable amount of time so I'm not sure I can look into it for a few weeks. I will definitely keep you posted no how it goes. On a side note, this package is really nice. I really appreciate you sharing this to the community! It will definitely be part of my toolbox going forward! |
Does SKLearn's |
Sorry for the delay, I have been traveling and will be for the next couple of weeks. I recall that it accepted the parameters (ie the lack of event times). However, I don't recall if I tested completely the sklearn SplitTimeSeries. I will give it a shot soon and report back. Thanks again for your help on this. |
Hi,
So far loving this package! Question, I am using time series data and would like to use a more sophisticated cross validation than TimeSeriesSplit offered by sklearn. Specifically, I am interested in using the following CV which has a similar API to sklearn:
https://github.com/sam31415/timeseriescv
Here is a snip of my code:
Here is the error output:
It looks as though HH doesn't like the "pred_times" and "eval_times" arguments required by PurgedWalkForwardCV. Any way to allow the arguments to be passed?
Thanks for your help!
The text was updated successfully, but these errors were encountered: