Skip to content

A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.

License

swelcker/cmd.csp.similarity

Repository files navigation

csplogo

cmd.csp.similarity

License: MIT Maintenance GitHub release GitHub tag GitHub commits GitHub contributors

A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented. Used in the Cognitive Service Platform cmd.csp for NLP and classifier part.

Prerequisites

There are no prerequisites.

Included dependencies:

<dependency>
    <groupId>net.jcip</groupId>
    <artifactId>jcip-annotations</artifactId>
    <version>1.0</version>
</dependency>

Installing/Usage

To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):

<repository>
  <id>github</id>
  <name>GitHub swelcker Apache Maven Packages</name>
  <url>https://maven.pkg.github.com/swelcker</url>
</repository>

<dependency>
  <groupId>cmd.csp</groupId>
  <artifactId>cspsimilarity</artifactId>
  <version>1.0.0</version>
</dependency>

Then, import cmd.csp.postagger.*;` in your application :

// Example
import cspsimilarity.*;
...
	private NormalizedLevenshtein engineNL = new NormalizedLevenshtein();
	private JaroWinkler engineJW = new JaroWinkler();
	private MetricLCS engineMLCS = new MetricLCS();
	private NGram engineNGRAM = new NGram(3);
	private Cosine engineCOSINE = new Cosine(9);
	private Jaccard engineJACARD = new Jaccard(9);
	private SorensenDice engineSOREDICE= new SorensenDice(9);
...
    String source = (sourceText);
    String search = (toSearch);

    double sS=0d;

    sS=(engineNL.similarity(source, search));
    sS=(engineJW.similarity(source, search));
    sS=(1d-engineMLCS.distance(source, search));
    sS=(1d-engineNGRAM.distance(source, search));
    sS=(engineCOSINE.similarity(source, search));
    sS=(engineJACARD.similarity(source, search));
    sS=(engineSOREDICE.similarity(source, search));

Built With

  • Maven - Dependency Management

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

  • Stefan Welcker - Modifications based on tdebatty/java-string-similarity

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

About

A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages