Skip to content

CNN based bangla text-to-speech model with Attention mechanism.

License

Notifications You must be signed in to change notification settings

hrahmansha/TTS_Bn

Repository files navigation

This bangla text to speech model is a CNN based architecture with Attention mechanism.

Methodology based on : Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

Implementation based on: pytorch-dc-tts

Bengali Text to Speech Dataset: Bangla tts dataset by google contains approximately 3100 bangla sentences. This dataset was collected from native Indian Bengali and Bangladesh Bengali speakers.

Result

As there was hardware limitation, the training for the coarse mel spectrogram to the full STFT spectrogram was done only for 60 iterations. The audio samples and pretrained models can be found here link

About The Model Architecture

This TTS model consists of two networks: (1) Text2Mel, which synthesize a mel spectrogram from an input text, and (2) Spectrogram Super-resolution Network (SSRN), which convert a coarse mel spectrogram to the full STFT(Short-time Fourier transform) spectrogram. Figure below shows the overall architecture of the model. For more read this

Training Process

  1. Download the dataset into /datasets folder
  2. Preprocess the dataset.
  3. Train the Text2Mel model
  4. Train the SSRN model`
  5. Test the model

Colab Notebook : (https://colab.research.google.com/drive/1AjsxzBu6ekcv0GF3dyWubj04hhwwHkjE?usp=sharing) This colab playground might seems to be a total mess.

About

CNN based bangla text-to-speech model with Attention mechanism.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages