Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.2 dev #55

Draft
wants to merge 59 commits into
base: master
Choose a base branch
from
Draft

V0.2 dev #55

wants to merge 59 commits into from

Conversation

dillondaudert
Copy link
Owner

@dillondaudert dillondaudert commented Sep 14, 2023

To Do

Functionality

  • Allow multiple "views" of dataset at fit/transform time ( UMAP#206, UMAP#601, UMAP docs )
    • Discrete metric data with discrete distances ( UMAP#624 )
  • Support passing precomputed distances
    • As KNNGraph
    • As matrix
  • Helper functions to construct fit/transform config

Tests

  • config.jl
  • utils.jl
  • membership_fn.jl
  • neighbors.jl
  • simplicial_sets.jl
  • embeddings.jl
  • optimize.jl
  • fit.jl
  • transform.jl

Docs

  • UMAPConfig component structs and how to use
  • UMAPResult/UMAPTransformResult
  • What is public API vs. unstable/in-progress functionality
  • Tutorials for basic vs. advanced usage
  • Documenter (?) - we want docs that auto-build, and have some guarantee to be up-to-date...

@dillondaudert
Copy link
Owner Author

dillondaudert commented Sep 15, 2023

Need to find the right code path for categorical (e.g. supervised) views. Ultimately, we want to use the proper simplicial set intersection logic (in coalesce_views - general_simplicial_set_intersection.)

This function takes two views' fuzzy simplicial set representations and combines them. If one (or both?) of these is based on a categorical metric, the pairwise edge probabilities are updated by ...

  1. If vertices are same class, technically multiply by exp(0) = 1.
  2. If one of the vertices' classes is unknown, multiply by exp(-unknown_dist)
  3. If vertices are different classes, multiply by exp(-far_dist).

Some of the complexity comes from representing this fuzzy simplicial set as a matrix - normally, they are sparse matrices, but here this is a dense matrix. The fast_intersection function in the python implementation optimizes this.

In order to dispatch to the proper logic, we may need to create relatively simply FuzzySimplicialSet structs that wrap the graphs. Then fuzzy_simplicial_set((knns, dists), knn_params, src_params::SourceViewParams) can dispatch on knn_params to create the appropriate FuzzySimplicialSet struct.

Might first investigate only creating such a struct for the categorical metrics, dispatching appropriately for the simplicial set intersection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant