Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Stork support CJK languages? #293

Closed
Jieiku opened this issue May 4, 2022 · 1 comment
Closed

Does Stork support CJK languages? #293

Jieiku opened this issue May 4, 2022 · 1 comment

Comments

@Jieiku
Copy link

Jieiku commented May 4, 2022

Does Stork support CJK languages? (Chinese, Japanese, and Korean)

I am interested in using Stork for Zola. I proposed it here: getzola/zola#1849

It was mentioned that there may not be specific stemmers/stopword lists for languages other than English?

EDIT: (I researched this through some of the open issues, please correct me if I am wrong on any of this, thank you.)

Stopword lists: not implemented yet: #250

Stemmers: multilingual is already supported by snowball stem: #48 but it seems that CJK languages are not on the list for stemmers: https://snowballstem.org/algorithms/

Next I see that maybe stemmers are not applicable to CJK?:

Stemming is not a concept applicable to all languages. It is not, for example, applicable in Chinese. [ source ]

even if stemming is not applicable to CJK, it seems it can still be analyzed and improved with tokenization? https://www.microfocus.com/documentation/starteam/163/en/Help/SvrAdmin/GUID-DAC55170-60DC-490B-BC4F-42F4F45F6029.html

@jameslittle230
Copy link
Owner

I'm going to migrate this to a Discussion and continue there - hope that's okay.

Repository owner locked and limited conversation to collaborators May 5, 2022
@jameslittle230 jameslittle230 converted this issue into discussion #294 May 5, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants