Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division By Zero in def is_mispelling #39

Open
gffde3 opened this issue Jan 13, 2018 · 12 comments
Open

Division By Zero in def is_mispelling #39

gffde3 opened this issue Jan 13, 2018 · 12 comments

Comments

@gffde3
Copy link

gffde3 commented Jan 13, 2018

Hey,

I've been using your lib on 0.0.1 and just updated recently (I had to hack some of the SQLite fts keywords and will fix that up again) but I've come across a problem:

You get a div zero error in tokencomparison.py -> def is_mispelling(self, token1, token2)

Here are the values of the vars in that function when it throws:

float division by zero
token1: 0
token2: 2
mis_t1: []
mis_t2: []
common: []

I know you're comparing distance for string tokens, but what is the logic behind numeric values? Whats the logic behind determining if two numbers are misspellings? (even ignoring the 0 value)

Even if you swap the max( ) / min ( ) to min ( ) / max ( ) and take the inverse you'll still get 0 for 0 values.

Maybe an absolute difference is better but that stuffs you up when there are addition errors (e.g. 1 typo to 10)

Maybe edit distance is still best used here?

As an aside, thanks for making this library; it's saved me some time so far :)

@gffde3
Copy link
Author

gffde3 commented Jan 18, 2018

So I just set the exception for div 0 to return False. Seems to work alright.

@lalalandau
Copy link

I had this same issue but can't seem to replicate your fix. Do you mind posting the snippet of the is_mispelling function that you changed?

And thank you to you both, for making this package and working on this issue, as it would be a huge help.

@jacobod
Copy link

jacobod commented Mar 12, 2018

This bit seemed to work for me, though not sure if it is the most efficient:

        if (t1f == float(0)) | (t2f == float(0)):
            return False

        else:
            if max(t1f, t2f)/min(t1f, t2f) < self.number_fuzz_threshold:
                return True
            else:
                return False

@junaidahmed361
Copy link

junaidahmed361 commented May 2, 2018

I'm also getting the ZeroDivisionError and can't seem to figure out how to forego it while still returning the correctly linked dataframe. I saw the earlier comment mentioned changing the exception for div 0 to return False, and I would also like to see a snippet of what and how to fix the issue. I've tried to implement the snippet above, but same issue persisted.

@ghost
Copy link

ghost commented Sep 7, 2018

As pointed out by @gffde3, I added :

except ZeroDivisionError:
    pass 

on line 40 of tokencomparison.py and it did the trick. 🎉

@kennethzhu88
Copy link

As pointed out by @gffde3, I added :

except ZeroDivisionError:
    pass 

on line 40 of tokencomparison.py and it did the trick. 🎉

This work for me too, many thanks @gregobf

@chris1610
Copy link
Contributor

chris1610 commented Dec 1, 2018

I think changing line 42 to this is a little cleaner than adding a whole new exception line:

except (ValueError, ZeroDivisionError):

chris1610 added a commit to chris1610/fuzzymatcher that referenced this issue Dec 1, 2018
Fix Zero Division Error as described in RobinL#39 and RobinL#42
RobinL pushed a commit that referenced this issue Feb 22, 2019
Fix Zero Division Error as described in #39 and #42
@RobinL
Copy link
Owner

RobinL commented Feb 22, 2019

Closed by #43

@RobinL
Copy link
Owner

RobinL commented Feb 22, 2019

Thanks @chris1610 and those for reporting

@7cb15
Copy link

7cb15 commented Mar 27, 2019

I am still getting this error despite the update to tokencomparison.py (error is a ZeroDivision error on line 40 as noted above). Note, I pip installed the package so perhaps that is the issue. Any help is much appreciated!

@ghost
Copy link

ghost commented Apr 23, 2019

Same here, I used regular pip install and pulled from GitHub.

@kanlancb
Copy link

Same here, in colab through pip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants