A PyTorch-based Speech Toolkit
-
Updated
May 14, 2024 - Python
A PyTorch-based Speech Toolkit
Reading list for research topics in multimodal machine learning
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
WaveNet vocoder
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
A speech-to-text framework and bot for Discord. Take control of your Discord server using speech and voice commands. Can also be useful for hearing impaired and deaf people.
SincNet is a neural architecture for efficiently processing raw audio samples.
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Foundation Architecture for (M)LLMs
A neural network for end-to-end speech denoising
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Real-time GCC-NMF Blind Speech Separation and Enhancement
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
This repository has implementation for "Neural Voice Cloning With Few Samples"
Open source audio annotation tool for humans
General Speech Restoration
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."