-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] ExpandingCutoffSplitter
- splitter by moving cutoff
#6360
Conversation
…ub.com/ninedigits/sktime # Conflicts: # sktime/forecasting/compose/_fallback.py # sktime/forecasting/compose/tests/test_fallback.py
@fkiraly I think there are some issues with other commits getting included here, but I think the core logic of |
…/sktime into ExpandingCutoffFilter
I don't see any other commits? All looks fine. Further, could you kindly ensure your code passes formatting, then the automated tests will run. |
restarted tests |
sktime/split/expandingcutoff.py
Outdated
↓ | ||
|---------------------|←---fh---→|------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fkiraly Any issues with including characters like ↓←→
as seen in line 42-43? Not sure if this is causing one of the unit tests to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure - I cannot download or access the logs. So, perhaps? I will try to run the tests locally.
@yarnabrina or @benHeid - can you see the test logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran it locally - the failure is due to fh
being modified in __init__
. For any sklearn
-like object, the self.param
should be identical to what was passed in __init__
. If you want to check_fh
, you should write it to self._fh
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yarnabrina or @benHeid - can you see the test logs?
Yes, I can see CI logs. Sharing two sample failures:
FAILED sktime/tests/test_all_estimators.py::TestAllObjects::test_set_params_sklearn[ExpandingCutoffSplitter] - AssertionError: get_params result of ExpandingCutoffSplitter (x) does not match what was passed to set_params (y). Reason for discrepancy: [fh].type, x.type = <class 'sktime.forecasting.base._fh.ForecastingHorizon'> != y.type = <class 'list'>
assert False
FAILED sktime/split/tests/test_expandingcutoff.py::test_expandingcutoff_fh_list_007 - AttributeError: attribute 'freq' of 'pandas._libs.tslibs.timestamps._Timestamp' objects is not writable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yarnabrina and @fkiraly. I believe I fixed those test cases, I think a few more errors might pop up for other issues.
I think L119-L127
input_args = [y[0], self.cutoff]
if not _inputs_are_supported(input_args) and not _is_all_args_periodlike(
input_args
):
msg = (
"y indicies and cutoff must have the same datatypes, but instead "
f"found type(y) = {y.dtype} and type(cutoff) = {type(self.cutoff)}"
)
raise TypeError(msg)
Might conflict with:
def get_test_params(cls, parameter_set="default"):
"""..."""
return [{"fh": [2, 4], "cutoff": 3, "step_length": 1}]
I have the current class designed such that you can only use cutoff
as an int if the index of y
is also int, and conversely, when y
has a datelike index, then cutoff
must be datelike.
Here's one of the errors:
FAILED sktime/split/tests/test_all_splitters.py::TestAllSplitters::test_split_loc[ExpandingCutoffSplitter-False] - TypeError: y indicies and cutoff must have the same datatypes, but instead found type(y) = datetime64[ns] and type(cutoff) = <class 'int'>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's probably related to one of the special characters - it is always a bit of a hassle to decode "position 1234" to the actual location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense? I might be overanalyzing it.
I think it makes sense, and it might be confusing to the user if not explained clearly how cutoff and index types combine.
Still, I can think of iloc
indexing in case of date-like index making sense: In the picture above:
index: 123456789
s1: --*******
s2: *********
s3: -********
this alignment happens if you use negative iloc
indices, e.g., -1 and -2 will index the last two indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this alignment happens if you use negative iloc indices, e.g., -1 and -2 will index the last two indices.
I’d considered this as well. I'd be open to modifying the code to work with datetime indices when the cutoff value is an int. Should we design it such that only negative cutoff values are allowed when working with datetime-like indices, or should we allow all integers? The former might still be a headache for the get_test_params
method.
We need to clarify some assumptions. For hierarchical data, it’s not guaranteed that subgroups will end with the same index. If they don't end with the same index (represented by a datetime-like value in this example), then negative indices would reference different datetime values. However, if they do end with the same index, which I suspect is the case for most scenarios, then negative indices should work appropriately.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we design it such that only negative cutoff values are allowed when working with datetime-like indices, or should we allow all integers?
This is really a difficult question, sorry for the delay in replying.
I would allow all integers, even in date-like case. That would be consistent with negative integers, so needs less case handling.
If they don't end with the same index (represented by a datetime-like value in this example), then negative indices would reference different datetime values. However, if they do end with the same index, which I suspect is the case for most scenarios, then negative indices should work appropriately.
I think it is fine if negative iloc
index refers to different loc
index - this could be a reasonable scenario, similar to the greedy splitter. In any case, as long as we explain this clearly to the user, it will be ok imo, all solutions seem to have an unpleasant aspect, but this is possibly the least confusing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fkiraly I actually implemented this in the last commit. The logic goes like this:
- When an index with dates is paired with an integer cutoff, the cutoff serves as an index locator (iloc).
- When the index is an integer, it directly compares the values, except when the integer is negative, in which case it functions as an iloc.
- example: index = 5 6 7 8; cutoff = 6 would be a direct comparison, whereas cutoff -2 would be the iloc equivalent. If, for example, someone used a cutoff of 10, then it would raise an error, since it's not present in the index.
…/sktime into ExpandingCutoffFilter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Minor points:
- I think the
Parameters
docstring section has one indent too much - do you want to say a few words how the different index/offset combinations are interpreted, in the docstring preamble?
- could you add at least one more test parameter set?
ExpandingCutoffFilter
ExpandingCutoffSplitter
Done. |
seems like the docstring example is failing - looks like a minor thing |
Ah, I think I fixed it now. Edit: apparently I did not fix it. Not sure what the issue is. |
The failure is unrelated, it is this sporadic problem: #6406 |
Oh, got it. Anything left to do on my end? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, all fine!
Forgot to approve, apologies for the oversight.
ExpandingCutoffSplitter
ExpandingCutoffSplitter
- splitter by moving cutoff
Reference Issues/PRs
#6327
#6359
What does this implement/fix? Explain your changes.
This PR introduces a new splitter,
ExpandingCutoffSplitter
, to the time series forecasting module of sktime. This splitter starts with an initial training window based on a specified cutoff point and expands incrementally in each split. This feature addresses the need for a dynamic adjustment of the training window in time series forecasting based on a cutoff date, especially in cases where many different time series will be tested on common cutoff dates but otherwise have different starting points.Does your contribution introduce a new dependency? If yes, which one?
No new dependencies are introduced with this PR.
What should a reviewer concentrate their feedback on?
ExpandingCutoffSplitter
logic.Did you add any tests for the change?
Yes, tests have been added to ensure the
ExpandingCutoffSplitter
works as intended and interacts correctly with other splitters and forecasting tools in sktime.Any other comments?
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
sktime
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
maintainers
tag - do this if you want to become the owner or maintainer of an estimator you added.See here for further details on the algorithm maintainer role.
For new estimators
docs/source/api_reference/taskname.rst
, follow the pattern.Examples
section.python_dependencies
tag and ensureddependency isolation, see the estimator dependencies guide.