Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new argument for limiting the maximum epsilon #529

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

prodrigues-tdx
Copy link

@prodrigues-tdx prodrigues-tdx commented Feb 22, 2022

This PR aims to introduce to HDBSCAN an argument for a max threshold to the epsilon used when picking the best clusters. With this PR we allow for this new argument, cluster_selection_epsilon_max, to be used in the EOM search method.

This is very useful for cases where you know from the get go that your samples should not be very far from each other, because you have some domain knowledge.

For this implementation, we use cluster_selection_epsilon_max in a very similar way to max_cluster_size. This way the clusters with an epsilon bigger than cluster_selection_epsilon_max can still appear if there are no valid clusters bellow that epsilon. This is, in fact, the exact same behavior as max_cluster_size.

@lmcinnes
Copy link
Collaborator

Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended?

@prodrigues-tdx
Copy link
Author

Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended?

I totally missed your comment:s I'll do that yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants