Skip to content

vspinu/simdist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status CRAN RStudio mirror downloads Development version CRAN version

High performance distances and similarities for various dense and sparse representations with primary focus on applications in NLP and recommender systems.

Supported and Planned Object Types

  • matrix from base R
  • dgCMatrix, dgRMatrix and dgTMatrix from Matrix package
  • simple_triplet_matrix from slam package
  • data.frames in primary-secondary-value (psv) format
  • list of named numeric or character vectors

Distances for 2D Representations

matrix dgCMatrix dgRMatrix dgTMatrix slam psv list
cosine
euclidean
mahalanobis
jaccard

Aggregation Distances for 3D Representations

dgCMatrix dgRMatrix dgTMatrix slam psv list
centroid
semantic_min_max1
semantic_min_sum2

[1] More commonly known as "Relaxed Word Mover Distance" (RWMD) proposed in Kusner et. al. ‘From Word Embeddings To Document Distances’ (2015).

[2] Similar to RWMD measure, proposed in Mihalcea et.al. 'Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity' (2006)

Transformations

norm_l1, norm_l2.

About

High performance similarity and distance metrics for sparse representations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published