Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging mecab-ko for easier use #398

Open
polm opened this issue Feb 16, 2022 · 6 comments
Open

Packaging mecab-ko for easier use #398

polm opened this issue Feb 16, 2022 · 6 comments

Comments

@polm
Copy link

polm commented Feb 16, 2022

Hello. I'm a spaCy core developer, and we currently use mecab-ko for our Korean language support, but we're not entirely satisfied with it because it requires mecab-ko to be installed outside of Python, which is inconvenient for many users.

Separately from my work on spaCy I also maintain MeCab related packages for Japanese, mecab-python3 and fugashi. These packages use wheels so that MeCab is included in them and doesn't have to be installed outside of Python - you can get a working setup with just pip install.

Would there be any interest in creating a similar package for mecab-ko? I don't speak Korean so I can't check if results are correct or not on my own beyond a very basic level, but I'd be glad to help with packaging or getting started. I could even just reproduce fugashi and replace MeCab with mecab-ko and set everything up if someone could check things and (better) take over the project from me after that.

@combacsa
Copy link
Contributor

Hello, @polm!

  • There is a project called python-mecab-ko( https://github.com/jonghwanhyeon/python-mecab-ko ) and I have heard of some alternatives whose documentation is not well written in non-Korean languages.

  • Since spaCy is MIT License, KoNLPy itself is not a desirable option(we are GPLv3).

  • Personally I'd be very glad to help with checking the correctness of the result. Please let community know when you've finished your setup with mecab-ko porting for fugashi, I'd be happy to contribute then.

@polm
Copy link
Author

polm commented Feb 18, 2022

Thanks for the heads-up on python-mecab-ko and offer the help check! It might take me a bit but I'll see about creating a fugashi-ko and check back here when it's ready to be checked for reasonable output.

@polm
Copy link
Author

polm commented Aug 18, 2022

I never got around to doing this with fugashi, but someone made a package called pymecab-ko that's like mecab-python3 for mecab-ko, so that might be useful to anyone who was waiting on it. Like mecab-python3 it allows you install a working dictionary and MeCab just using pip.

In spaCy we're considering switching to using this package, but are concerned it might be disruptive for existing users (PR here). Do you have any idea how common it is to use a customized dictionary with mecab-ko? Are there alternatives to mecab-ko-dic in use? (In Japanese there's ipadic, different UniDics, and NEologd, for example, but I'm not aware of anything in Korean.)

@NoUnique
Copy link

NoUnique commented Aug 30, 2022

@polm I made pymecab-ko for some people who misuse mecab. (e.g. mecab(not mecab-ko) with mecab-ko-dic) And it was totally influenced by your great work. Thank you.

I'm also working on applying pymecab-ko to KoNLPy and plannig to make a PR within this week.

Usually, general users rarely use custom dictionaries. but I heared that some companies have their own custom dictionaries made with their in-house corpus.

Recently, a dictionary trained using a new Korean corpus has been released. I will upload it to PyPI in this week.
edited) I released a python package for the dictionary. openkorpos-dic-py (openkorpos-dic in pypi)

@polm
Copy link
Author

polm commented Sep 1, 2022

@NoUnique Thank you for making pymecab-ko, it's a great project to have! Thank you also for the extra information about custom dictionary usage and the new dictionary release.

It's great to have these resources for Korean NLP and for things to be easier to use in general.

@NoUnique
Copy link

NoUnique commented Nov 7, 2022

Unfortunately, the replacement to pymecab-ko cannot be made until KoNLPy's Python2 support is completely ended.

This is because pymecab-ko only supports Python 3.6 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants