Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BroadcasterTransformer to broadcast a series transformer over a collection #1244

Draft
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

TonyBagnall
Copy link
Contributor

@TonyBagnall TonyBagnall commented Feb 24, 2024

Currently, BaseTransformers can handle both single series and collections by internally "Vectorize" (i.e. broadcast) across channels and instances depending on the input data. This is legacy code which to me very confusing, involves huge amounts of redundant checks and is not maintainable by the current team. We should imo move away from this model.

This is a quick mock up of my favoured solution, for comment. I've not tested it properly yet, putting it up before I go too far down the line. Basically have a BroadcastTransformer that we pass a BaseSeriesTransformer which is applied over all instances in a collection.

Main points

  1. It takes all its tags from the series transformer. These should then be checked in the base class
  2. It does not broadcast over channels. If the SeriesTransformer cannot handle multivariate, nor can the Broadcaster.
  3. If fit is empty is true in the SeriesTransformer, then Broadcaster only uses a single instance of the SeriesTransformer. If fit is empty is false, it clones it n_cases times and stores the transformers
  4. it calls the private fit/transform of the SeriesTransformer. The assumption is that if it passes checks in the BaseCollectionTransformer, and the tags are taken from SeriesTransformer, then this should be fine. I think this should be true. If its not, we should make it so.

@TonyBagnall TonyBagnall added the transformations Transformations package label Feb 24, 2024
@aeon-actions-bot aeon-actions-bot bot added the enhancement New feature, improvement request or other non-bug code enhancement label Feb 24, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I would have added the following labels to this PR based on the changes made: [ $\color{#41A8F6}{\textsf{transformations}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

@baraline
Copy link
Contributor

baraline commented Mar 6, 2024

A drawback of our current approach is that a transformer that needs to be fitted could only be broadcasted to a set of samples from the same size. For example if we had X1.shape = (5, 1, 100) used for fit and then transform, a set X2.shape = (10, 1, 100) could not be transformed, as we stored the 5 transformer fitted on X1.
But I don't know if this would ever happen considering we are broadcasting a single series transformer, which by essence (if fit is not empty) fits and transform one series.

@baraline
Copy link
Contributor

baraline commented Mar 6, 2024

Here is a draft for using joblib, need to add some tests and sort out tags for input format, as for now the example using MatrixProfileSeriesTransformer fails because it didn't convert the series to univariate as expected

@baraline
Copy link
Contributor

baraline commented Mar 11, 2024

To-dos left for this PR before it is ready for review:

  • Fix broadcaster test, seem to be failing some check in cases where fit_empty=True in the broadcasted series transformer
  • Add new broadcaster transformer to the docs
  • Add an example notebook on how to use the broadcaster for different use cases
  • Add type hints when not a union of lot of possible input types

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@TonyBagnall TonyBagnall marked this pull request as ready for review March 17, 2024 17:00
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the essentially intended to be the parallel of https://github.com/aeon-toolkit/aeon/blob/main/aeon/transformations/collection/_collection_wrapper.py?

I think we should attempt to unify design a bit for these, the collection version currently does not use the series transformer base class. Don't have to change the collection version in this PR of course.

I prefer CollectionToSeriesWrapper or similar to the current name.

aeon/transformations/collection/_broadcaster.py Outdated Show resolved Hide resolved
aeon/transformations/collection/_broadcaster.py Outdated Show resolved Hide resolved
aeon/transformations/series/_dummy.py Outdated Show resolved Hide resolved
aeon/transformations/series/tests/test_matrix_profile.py Outdated Show resolved Hide resolved
@MatthewMiddlehurst MatthewMiddlehurst added the codecov actions Run the codecov action on a PR label Mar 22, 2024
@MatthewMiddlehurst
Copy link
Member

included the coverage tag, just to make sure most of it is happening correctly

@baraline
Copy link
Contributor

baraline commented Mar 22, 2024

Thanks for reviewing ! Concerning your questions:

So is the essentially intended to be the parallel of https://github.com/aeon-toolkit/aeon/blob/main/aeon/transformations/collection/_collection_wrapper.py?

Isn't _collection_wrapper.py a Collection to Series while the introduction of this PR is a Series to Collection ? (In the Collection to Series cases, there is no need for parallel I think ?)

I think we should attempt to unify design a bit for these, the collection version currently does not use the series transformer base class. Don't have to change the collection version in this PR of course.
I prefer CollectionToSeriesWrapper or similar to the current name.

I agree, both Collection to Series and Series to Collection transformers should be close to each other.
So It would be SeriesToCollectionWrapper and CollectionToSeriesWrapper (for _collection_wrapper.py, already implemented)

@TonyBagnall
Copy link
Contributor Author

there is in fact a problem calling _fit, since the X_inner_type of collections and series are fundamentally different, will unfortunately have to call base class fit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants