Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong formula for the Coleman–Liau index #6

Open
kduxin opened this issue Aug 7, 2022 · 4 comments
Open

Wrong formula for the Coleman–Liau index #6

kduxin opened this issue Aug 7, 2022 · 4 comments

Comments

@kduxin
Copy link

kduxin commented Aug 7, 2022

Hi Lee,

There are two places wrong in the formula.

First, the original Coleman-Liau counts the number of letters per 100 words. Whereas in the code, it counts the number of tokens per 100 words.

Second, it is wrong to account for "per 100 words" by dividing the number by 100.
Rather, it should be
$$n_{letters} / (n_{tokens} / 100)$$

As a result, the produced score is always around -15.0.

My installed lingfeat version is 1.00b19. Have you fixed it up ?

@brucewlee
Copy link
Owner

I see. You are correct.

Thank you @kduxin. I'll make the appropriate changes.

@MarioGalindoQ
Copy link

MarioGalindoQ commented Oct 6, 2022

Hi Bruce,
I discovered the same bug.
In the file TraF.py at line 84, I think that the right code is:
result = 0.0588 * (self.n_char / self.n_token * 100) - 0.296 * (self.n_sent * 100 / self.n_token) - 15.8
Hope this will help you.
Thanks

@brucewlee
Copy link
Owner

Thank you Professor Queralt for the suggestions, including the one at #4 . I am planning to restructure this project and release a better version. Though I have been busy due to my job since releasing this library, I sincerely appreciate the continued attention.

@brucewlee
Copy link
Owner

The new update will likely be in November.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants