MeCab Text Cleaner

This is a simple Python package for getting japanese readings (yomigana) and accents using MeCab. Please also consider using pyopenjtalk (no accents) or pyopenjtalk_g2p_prosody (ESPnet) (with accents), as this package does not account for accent changes in compound words.

Installation

Install this via pip or pipx (or your favourite package manager):

pipx install mecab-text-cleaner[unidecode,unidic]

pip install mecab-text-cleaner[unidecode,unidic]

Usage

> mtc いい天気ですね。
イ]ー テ]ンキ デス ネ。
> mtc いい天気ですね。 --ascii
i] te]nki desu ne.
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words
イーテンキデスネ
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words -r kana
イイテンキデスネ

from mecab_text_cleaner import to_reading, to_ascii_clean

assert to_reading("     空、雲。\n雨！（") == "ソ]ラ、 ク]モ。\nア]メ！（"
assert to_ascii_clean("      한空、雲。\n雨！（") == "han so]ra, ku]mo. \na]me!("

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
.idea		.idea
docs		docs
src/mecab_text_cleaner		src/mecab_text_cleaner
templates		templates
tests		tests
.all-contributorsrc		.all-contributorsrc
.copier-answers.yml		.copier-answers.yml
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG		CHANGELOG
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
commitlint.config.js		commitlint.config.js
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
renovate.json		renovate.json
setup.py		setup.py

License

34j/mecab-text-cleaner

Folders and files

Latest commit

History

Repository files navigation

MeCab Text Cleaner

Installation

Usage

Contributors ✨

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Languages