Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode word count #3

Open
creativefctr opened this issue Aug 17, 2016 · 2 comments
Open

Unicode word count #3

creativefctr opened this issue Aug 17, 2016 · 2 comments

Comments

@creativefctr
Copy link

The code is using str_word_count so many times, the problem with this function is that when Unicode strings are provided it returns a excessively high number which will make the spam detector miss some spam situations.
The only substitute function that worked fine for me was this one:

/**
     * Returns number of words in a unicode string
     * @param $string
     * @param int $mode
     * @return array|int
     */
    function utf8WordCount($string, $mode = 0) {
        static $it = NULL;

        if (is_null($it)) {
            $it = IntlBreakIterator::createWordInstance(ini_get('intl.default_locale'));
        }

        $l = 0;
        $it->setText($string);
        $ret = $mode == 0 ? 0 : array();
        if (IntlBreakIterator::DONE != ($u = $it->first())) {
            do {
                if (IntlBreakIterator::WORD_NONE != $it->getRuleStatus()) {
                    $mode == 0 ? ++$ret : $ret[] = substr($string, $l, $u - $l);
                }
                $l = $u;
            } while (IntlBreakIterator::DONE != ($u = $it->next()));
        }

        return $ret;
    }
@morrelinko
Copy link
Owner

@xfactor5

Thanks for pointing this out.. Please make a pull request so I can merge or I will do the update during my spare time.

@creativefctr
Copy link
Author

@morrelinko You're welcome, I did not use this function in your code since it seems incompatible as it only receives one argument, I extended the rife detector and rewrote the check function using this one for my self; So it's wise that you use this function in the code yourself and then conduct necessary test or checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants