You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a cluster module where specific algorithms will live
a Clusterer class that makes it easy to parallelize and makes code readable
The goal should be minimal magic and replication of existing algorithms where possible.
My impression is that a common way to cluster is to use UMAP and HDBSCAN in some combination, as in pykanto and as in the rook paper above.
So an initial implementation would add dependencies on UMAP + HDBSCAN, and allow direct access to their parameters as in nilomr/pykanto#30.
But I'm worried about the number of dependencies we have already, and would rather limit dependencies to core scientific Python if possible. Long term we might want to vendor e.g. in a vocalpy.cluster._vendor sub-sub-package.
Also we (I) need to actually understand all the parameters involved (see nilomr/pykanto#32 (comment)). We shouldn't add this if we can't provide tutorials with suggestions on what parameters to use for diff't datasets.
We want something like https://github.com/marathomas/tutorial_repo.
And we will want to clearly document assumptions and any work related to caveats.
The text was updated successfully, but these errors were encountered:
Clusterers gonna cluster.
People will want this; it's a feature of e.g. pykanto, songexplorer, koe, voice and frequently appears in papers, see e.g. https://royalsocietypublishing.org/doi/full/10.1098/rsos.231713.
So we should add
cluster
module where specific algorithms will liveClusterer
class that makes it easy to parallelize and makes code readableThe goal should be minimal magic and replication of existing algorithms where possible.
My impression is that a common way to cluster is to use UMAP and HDBSCAN in some combination, as in pykanto and as in the rook paper above.
So an initial implementation would add dependencies on UMAP + HDBSCAN, and allow direct access to their parameters as in nilomr/pykanto#30.
But I'm worried about the number of dependencies we have already, and would rather limit dependencies to core scientific Python if possible. Long term we might want to vendor e.g. in a
vocalpy.cluster._vendor
sub-sub-package.Also we (I) need to actually understand all the parameters involved (see nilomr/pykanto#32 (comment)). We shouldn't add this if we can't provide tutorials with suggestions on what parameters to use for diff't datasets.
We want something like https://github.com/marathomas/tutorial_repo.
And we will want to clearly document assumptions and any work related to caveats.
The text was updated successfully, but these errors were encountered: