StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
-
Updated
Jun 11, 2024 - Python
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
OBS plugin for local speech recognition and captioning using AI
Open Voice OS Status Page
Dockerized Whisper C++ speech-to-text API for easy deployment and rapid integration. Offering the latest stable and nightly builds for efficient audio transcription.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
A library for real-time voice processing in web browsers
Running speech to text model (whisper.cpp) in Unity3d on your local machine.
The AssemblyAI Java SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.
A CLI app to interact with LLMs via text or audio using Hugging Face Transformers, with customizable models and generation parameters.
A simple speech-to-text and text-to-speech program/frontend.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Port of OpenAI's Whisper model in C/C++
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
BanterBrain Buddy is a Windows based Speech-To-Text to LLM to Text-To-Speech client-program for general entertainment or as a streaming companion.
🧠 Leon is your open-source personal assistant.
A Python library for solving reCAPTCHA v2 and v3 with Playwright
Introducing NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
A voice recognition-based tool for translating languages in real-time.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
🏃 💡 Talabat Hackathon 2022 API project
Add a description, image, and links to the speech-to-text topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-text topic, visit your repo's landing page and select "manage topics."