Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
-
Updated
Jun 11, 2024 - Python
Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
A voice-operated emailing mobile application that allows you to compose and send email messages through voice commands.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Customizable TTS Chat Bot using OpenAI & Google Cloud TTS/ElevenLabs
Talk to Rawan voice-to-voice using speech recognition or text-to-speech, with elevenlabs technology and chatgpt on the web.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
OBS plugin for local speech recognition and captioning using AI
Official Python SDK for Deepgram's automated speech recognition APIs.
Official repository for the Opensource Textdataset for NMT for local langues in West Africa (EWE Corpus)
A library for real-time voice processing in web browsers
Running speech to text model (whisper.cpp) in Unity3d on your local machine.
A simple speech-to-text and text-to-speech program/frontend.
Aplicação com o objetivo de permitir ao usuário de salvar notas, seja por áudio ou texto / Application aimed at allowing the user to save notes, either through audio or text.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Port of OpenAI's Whisper model in C/C++
Add a description, image, and links to the speech-recognition topic page so that developers can more easily learn about it.
To associate your repository with the speech-recognition topic, visit your repo's landing page and select "manage topics."