Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cut_all mixed chinese & english issue #76

Open
messense opened this issue Jul 19, 2020 · 2 comments
Open

Fix cut_all mixed chinese & english issue #76

messense opened this issue Jul 19, 2020 · 2 comments
Labels
bug Something isn't working

Comments

@messense
Copy link
Owner

The same as the fix of the Python version: fxsjy/jieba@97c3246

@messense messense added the bug Something isn't working label Jul 19, 2020
@messense
Copy link
Owner Author

cc @MnO2

@MnO2
Copy link
Collaborator

MnO2 commented Jul 19, 2020

@messense : Code mixing is a hard problem, it's about where would you draw the the boundary of Chinese vocabulary. Not only English alphabet could be used in the product names, but Japanese hiragana as well like . I would argue this is beyond the scope a Chinese segmenter, but for sure we can apply the work-around like the one in python implementation for practical reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants