🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
-
Updated
Jun 12, 2024 - HTML
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Pytorch implementation of SoundCTM
Creative Text-to-Audio Generation via Synthesizer Programming @ ICML'24
A webui for different audio related Neural Networks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
A family of diffusion models for text-to-audio generation.
Generate Music using natural language prompts using Meta's MusicGen Small Model.
A text-to-audio application that turns words and sentiments into melodies.
Mustango: Toward Controllable Text-to-Music Generation
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying.
Python program to convert text to speech.
Whether it’s text or a link, it can be turned into a podcast!
TalkItOut is a Python and Flask-based web application that can convert text to speech, choose your preferred language for audio output, access a built-in dictionary for word meanings, and even extract text from images, complete with audio generation.
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
Add a description, image, and links to the text-to-audio topic page so that developers can more easily learn about it.
To associate your repository with the text-to-audio topic, visit your repo's landing page and select "manage topics."