Skip to content

0417keito/UTAUTAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UTAUTAI: Unrestricted Tune Automated Technology Artificial Interigence

README

📖 Quick Index

🚀Model Architecture

UTAUTAI main architecture 🙇sorry for hand-draw

🤔What is UTAUTAI?

An open-source repository aimed at generating matching vocal and instrumental tracks from lyrics, similar to Suno AI's Chirp and Riffusion.

🐍Method

UTAUTAI's method are mainly inspired by SPEAR TTS

During training, the input consists of semantic tokens obtained from 'lyrics2semantic AR', which extracts semantic tokens from lyrics, as well as Acoustic tokens. Additionally, MERT representations derived from the music are subjected to k-means quantization to obtain further semantic tokens.

However, during inference, it is not possible to obtain MERT representations from the music. Therefore, we train a Style Module following the methodology of Prompt TTS2 to acquire the target MERT representations from the prompt during inference. The Style Module is composed of a transformer-based diffusion model.

I think that using this approach, we can successfully accomplish the target tasks. What do you think?

🧠TODO

  • How can we obtain lyrics that match the cropped audio? Or should we even crop the audio in the first place? code
  • Examine the handling of phonemization and special tokens, and make necessary code modifications. code
  • Correct the collator in the dataset. code
  • Complete the StyleModule inference code. code
  • Other minor code fixes, such as masking strategies.
  • Eliminate the diffusion model and adapt the consistency model.

🙏Appreciation

⭐️Show Your Support

If you find UTAUTAI interesting and useful, give us a star on GitHub! ⭐️ It encourages us to keep improving the model and adding exciting features.

🙆Welcome Contributions

Contributions are always welcome.

Releases

No releases published

Packages

No packages published

Languages