KikiYomu is a lightweight, real-time Text-to-Speech (TTS) application that monitors your clipboard and instantly uses AI voice models to have anime-style characters narrate Japanese text from games or visual novels.
- Real-Time TTS: Automatically reads aloud Japanese text copied to your clipboard.
- Speaker Tag Handling: Option to remove speaker tags like
【Name】
commonly found in RPGMaker and WolfRPG games. - Game Compatibility: Designed to work well with most visual novels and games that use stylized dialogue formatting.
- User-Friendly GUI: Simple GUI.
- Manual Control Over TTS: Lets the user Force-read texts even when filtered out by the model.
- Image OCR Support: extracts Japanese text from images in your clipboard using OCR — Use with a snip tool for best result.
- GPU Acceleration: Optional — Uses GPU if available for faster OCR/Voice-over.
- Python 3.8 or later
- PyTorch (with CUDA if using GPU)
- SoundDevice (for audio playback)
-
Clone the Repository
git clone https://github.com/yourusername/KikiYomu.git cd KikiYomu
-
Install Python Dependencies
pip install -r requirements.txt
- Download Pretrained Voice Models
Visit the following Hugging Face repository to download the Pretrained AI voice models:
Place the .pth model files into the models/ directory.
- Start the App
python gui.py
- Additionally you can just you run the
KikiYomu.py
file in command line as it still offers most utilities.
- Load a Model
- In the "Models" panel, select a .pth model and click "Select Model".
- Configure Settings (if needed)
-
Set the opening/closing signs used for spoken text (e.g., 「 and 」).
- If your are playing an RPGMaker game, Enable the checkbox to remove RPGMaker/WolfRPG-style speaker tags (【Name】) at the start of lines.
-
Adjust playback speed with the slider.
- Copy Text to Speak
Copy any Japanese line of text to the clipboard. If it passes the filters, KikiYomu will automatically speak it aloud using the selected AI voice.
- Voice Models: zomehwh's VITS Models on Hugging Face