Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Kondrak n-gram similarity #15

Open
tdebatty opened this issue Aug 11, 2016 · 4 comments
Open

Implement Kondrak n-gram similarity #15

tdebatty opened this issue Aug 11, 2016 · 4 comments

Comments

@tdebatty
Copy link
Owner

No description provided.

@MpoMp
Copy link
Contributor

MpoMp commented Sep 8, 2016

How about using the Lucene implementation ?

@paulirwin
Copy link
Contributor

@MpoMp Sounds like a good idea. Based on my interpretation of the Apache license, I assume we would just need to make sure to include the Apache license header and indicate that we changed the code (to fit this library's constraints)? Despite being an ASF member and my familiarity with Lucene and its respective projects, I'm not an expert in licenses and copying the actual code.

(To introduce myself, I'm a member of the team that ported this library to .NET, and a member of the Lucene.NET PMC.)

cc @jamesmblair

@paulirwin
Copy link
Contributor

Disregard my deleted comment. What I meant was, isn't this the same as src/main/java/info/debatty/java/stringsimilarity/NGram.java?

@MpoMp
Copy link
Contributor

MpoMp commented Sep 26, 2016

Looks like I misinterpreted the issue in the first place.

As @tdebatty mentions in the README:

The algorithm uses affixing with special character '\n' to increase the weight of first characters. The normalization is achieved by dividing the total similarity score the original length of the longest word.

In the paper, Kondrak also defines a similarity measure, which is not implemented (yet).

Which probably refers to the algorithms described on page 8 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants