Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:Tokenize方法返回的EndIndex比实际下标多1 #69

Open
agebullhu opened this issue May 23, 2020 · 0 comments
Open

BUG:Tokenize方法返回的EndIndex比实际下标多1 #69

agebullhu opened this issue May 23, 2020 · 0 comments

Comments

@agebullhu
Copy link

public IEnumerable Tokenize(string text, TokenizerMode mode = TokenizerMode.Default, bool hmm = true)
{
var result = new List();

        var start = 0;
        if (mode == TokenizerMode.Default)
        {
            foreach (var w in Cut(text, hmm: hmm))
            {
                var width = w.Length;
                result.Add(new Token(w, start, start + width));//此处应减1,否则会多包含一个字节
                start += width;
            }
        }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant