Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-Gram Example Comment Incorrect? #53

Open
paulirwin opened this issue Oct 2, 2019 · 2 comments
Open

N-Gram Example Comment Incorrect? #53

paulirwin opened this issue Oct 2, 2019 · 2 comments

Comments

@paulirwin
Copy link
Contributor

This issue was reported to my team's .NET port of your library but I confirmed that it is an issue here as well.

The example on the README shows that the N-Gram code expects values of 0.416666 and 0.97222. However, different results are given when the code is actually ran. I am not sure if this is a bug in the code, or that the README comment is outdated/incorrect.

I created a unit test for the README example, and sure enough it fails:

    @Test
    public void exampleFromReadme() {
        // produces 0.416666
        NGram twogram = new NGram(2);
        assertEquals(0.416666, twogram.distance("ABCD", "ABTUIO"), 0.001);

        // produces 0.97222
        String s1 = "Adobe CreativeSuite 5 Master Collection from cheap 4zp";
        String s2 = "Adobe CreativeSuite 5 Master Collection from cheap d1x";
        NGram ngram = new NGram(4);
        assertEquals(0.97222, ngram.distance(s1, s2), 0.001);
    }

Results:

java.lang.AssertionError: 
Expected :0.416666
Actual   :0.5833333134651184

This result (0.583) is the same that we get on the .NET side of things. As I am not an expert in these algorithms, I am unsure if this is a code bug or a need to update the README.

@paulirwin
Copy link
Contributor Author

... and I just noticed after posting this that 0.583333 ~= (1 - 0.416666). So this is an issue of difference vs similarity I guess. But I suppose we still need to confirm if the returned value from the library is incorrect, or if the comment is incorrect.

@paulirwin
Copy link
Contributor Author

I think this might be a case of the README being incorrect, as it looks like the logic was changed as a result of #22

tdebatty added a commit that referenced this issue Oct 7, 2019
Fix issue #53 
Many thanks to @paulirwin  for the thorough issue analysis!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant