Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between java and python implementation: Spoiler, the problem is the round #99

Open
timotta opened this issue Apr 8, 2022 · 3 comments

Comments

@timotta
Copy link

timotta commented Apr 8, 2022

Here is the example where I was stuck. The python implemention gets 22, and the java implementation gets 23:

fuzz.token_set_ratio(
  "Vêndo ou troco por outro carro pode ser atrasado negócio volta ",
  "Titan 150 ano 2005 ", 
  False
)
FuzzySearch.tokenSetRatio(
  "Vêndo ou troco por outro carro pode ser atrasado negócio volta", 
  "Titan 150 ano 2005"
);

Debugging both code I could find that the problem is when rounding the value: 22.5

Python code, located in utils.py:

int(round(n))

Java code, located in SimpleRatio class is:

(int) Math.round(100 * DiffUtils.getRatio(s1, s2));

TLDR:

Java: Math.round(22.5) => 23
Python: round(22.5) => 22

Don't know which one is correct for this algorithm...

@timotta
Copy link
Author

timotta commented Apr 8, 2022

For now, I'm working with this local solution to have equivalence:

import me.xdrop.diffutils.DiffUtils;
import me.xdrop.fuzzywuzzy.ratios.SimpleRatio;

public class SimplePythonRatio extends SimpleRatio {
    @Override
    public int apply(String s1, String s2) {
        double ratio = 100 * DiffUtils.getRatio(s1, s2);
        double floored = Math.floor(ratio);
        double pythonRatio = ratio - floored <= 0.5 ? floored : Math.round(ratio);
        return (int) pythonRatio;
    }
}
new TokenSet().apply(s1, s2, new SimplePythonRatio());

What do you think of adding this to the java lib?

@timotta
Copy link
Author

timotta commented Apr 12, 2022

@leafah warned about HALF_EVEN python round behaviour ( https://en.wikipedia.org/wiki/Rounding#Round_half_to_even ), so the round(22.5) give us 22, and round(21.5) also returns 22. Java have this round implemented in BigDecimal:

import me.xdrop.diffutils.DiffUtils;
import me.xdrop.fuzzywuzzy.ratios.SimpleRatio;

import java.math.BigDecimal;
import java.math.RoundingMode;

public class SimplePythonRatio extends SimpleRatio {
    @Override
    public int apply(String s1, String s2) {
        double ratio = 100 * DiffUtils.getRatio(s1, s2);
        return (int) bankerRound(ratio);
    }

    // https://en.wikipedia.org/wiki/Rounding#Round_half_to_even
    double bankerRound(double value) {
        return new BigDecimal(value)
                .setScale(0, RoundingMode.HALF_EVEN)
                .doubleValue();
    }
}

@burdoto
Copy link
Contributor

burdoto commented Apr 13, 2022

Honestly at this point I'm kinda confused regarding whats the deal here.
If Python rounds 22.5 to 22 by default, then that's a Python issue (and honestly imo, yet another reason not to use Python at all).
Resolving algebraic problems is really not the task of this library, thus such an implementation would be far out of scope (plus there is a workaround in Java's stdlib already).

I'm still interested in what @xdrop says about this, but I doubt it would be much different from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants