You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm working with the recent DreaMS paper, and I find the LSH approach to cluster mass-spectra on the scale of 1e+15 quite useful. The original idea of the this clustering approach is given in this paper.
Describe the solution you'd like
A new similarity score, something like:
classRandomProjection(BaseSimilarity):
def__init__(self, n_elems: int, n_hyperplanes: int, seed=42):
np.random.seed(seed)
self.H=np.random.randn(n_hyperplanes, n_elems)
defmatrix(self, r: np.array, q: np.array):
N, R=r.shape# N = Number of peaks, R = Number of spectrar_proj= (self.H @ r) >=0# [n_hyp, N] x [N,R] -> boolean [n_hyp, R]q_proj= (self.H @ q) >=0# [n_hyp, N] x [N,Q] -> boolean [n_hyp, Q]r_hash= ... # int64 [R] Convert each boolean col in r_proj, into a single hash numberq_hash= ... # int64 [Q]sparse_similarity= ... # Hash equality means similarity, Build`SparseStack` from r_hash and q_hash here...returnsparse_similarity
It should be used like:
lsh = RandomProjection(128, 1024, 42)
lsh.matrix(r_large, q_large) # -> outputs sparsestack filled with binary similarities (0, or 1).
This score will scale with O(R+Q), instead of O(R * Q), making it by far the fastest at large scales.
Any good starting points? This file in DreaMS is 90% of what we need.
Scientific reference
From DreaMS:
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I'm working with the recent DreaMS paper, and I find the LSH approach to cluster mass-spectra on the scale of
1e+15
quite useful. The original idea of the this clustering approach is given in this paper.Describe the solution you'd like
A new similarity score, something like:
It should be used like:
This score will scale with
O(R+Q)
, instead ofO(R * Q)
, making it by far the fastest at large scales.Any good starting points?
This file in DreaMS is 90% of what we need.
Scientific reference
From DreaMS:
The text was updated successfully, but these errors were encountered: