High performance distances and similarities for various dense and sparse representations with primary focus on applications in NLP and recommender systems.
matrix
from base RdgCMatrix
,dgRMatrix
anddgTMatrix
from Matrix packagesimple_triplet_matrix
fromslam
packagedata.frames
in primary-secondary-value (psv) formatlist
of named numeric or character vectors
matrix |
dgCMatrix |
dgRMatrix |
dgTMatrix |
slam |
psv |
list |
|
---|---|---|---|---|---|---|---|
cosine |
✔ | ✔ | ✔ | ✔ | ✔ | ||
euclidean |
✔ | ✔ | ✔ | ✔ | ✔ | ||
mahalanobis |
|||||||
jaccard |
dgCMatrix |
dgRMatrix |
dgTMatrix |
slam |
psv |
list |
|
---|---|---|---|---|---|---|
centroid |
✔ | ✔ | ✔ | ✔ | ||
semantic_min_max 1 |
✔ | ✔ | ✔ | ✔ | ||
semantic_min_sum 2 |
✔ | ✔ | ✔ | ✔ |
[1] More commonly known as "Relaxed Word Mover Distance" (RWMD) proposed in Kusner et. al. ‘From Word Embeddings To Document Distances’ (2015).
[2] Similar to RWMD measure, proposed in Mihalcea et.al. 'Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity' (2006)
norm_l1
, norm_l2
.