Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] Update similarity search with new base classes : Query Search #1508

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

baraline
Copy link
Contributor

@baraline baraline commented May 9, 2024

Reference Issues/PRs

Part of #1243

What does this implement/fix? Explain your changes.

As described in #1243, this is a first PR that implement base classes for query search task, and transfers the existing code under this new submodule.

Remaining TODOs:

  • Adapt distance profile function for unequal length
  • Add tests for unequal length for both dummy and top-k
  • Fix missing docstrings

Reported to another PR as this one is already big enough

  • Add typing to function/class parameters

PR checklist

For new estimators and functions
  • I've added the estimator to the online API documentation.
  • (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

@baraline baraline linked an issue May 9, 2024 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@aeon-actions-bot aeon-actions-bot bot added examples Example notebook related maintenance Continuous integration, unit testing & package distribution similarity search Similarity search package labels May 9, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#EC843A}{\textsf{maintenance}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#45FD64}{\textsf{examples}}$, $\color{#006b75}{\textsf{similarity search}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

@baraline baraline marked this pull request as ready for review May 18, 2024 20:12
@baraline baraline added the codecov actions Run the codecov action on a PR label May 18, 2024
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. All experimental so there some leeway, not going to say I went through every function with a fine tooth comb. Some of the comments may be a bit more general and apply to more than the highlighted code.

Would be good to update aeon-toolkit/aeon-admin#1 with the contents of your comments just so there a centralised document for interested people until we decide to make this non-experimental.

I would be careful with the base class parameters, if it does not apply to all (or the vast majority) of inheriting classes then it probably should not be here. Not saying that is the case, but there are some bases i.e. clustering with n_clusters where we are going to run into issues later.



@njit(cache=True, fastmath=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike uncontrolled parallelization. AFAIK the only way to control this is to alter the numba num_threads global attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is kept, at the very least it should be noted in the docstring and we should change the test config to disable numba threading.

@final
def predict(
self,
q,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why q? We should stick to x and y where possible.

)


class BaseInstance(BaseSimiliaritySearch):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does it make sense to create a test class for this? I assumes it was abstract and each submodule would have its own base class
  2. I would continue with the "MockEstimator" format we current have in testing if possible.

@@ -0,0 +1 @@
"""Series search module."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty

from aeon.similarity_search.query_search import BaseQuerySearch


class DummyQuerySearch(BaseQuerySearch):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to a different comment, but I would say the dummy estimators provide a "reasonable" baseline with a strategy (i.e. intentional random guessing or mean of the label), while mock estimators try to go through the interface as quick as possible even if the output is nonsense. May not apply here, but the top of the docstring gives that impression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codecov actions Run the codecov action on a PR examples Example notebook related maintenance Continuous integration, unit testing & package distribution similarity search Similarity search package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MNT] Update similarity search with new base classes
2 participants