skipNet

It is a Tensorflow implementation of 'Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck' ICCV 2019 https://arxiv.org/abs/1908.07094

Licensed under the MIT License.

Installing Dependencies

Install Python 3
Install the latest version of TensorFlow for your platform. For better performance, install with GPU support if it's available. This code works with TensorFlow 1.4.
Install other denpendencies: pip install -r requirements.txt

Training

Prepare Data

Download Dataset

For Image-Text pair, our model trained on COCO, and can be downloaded from: http://cocodataset.org/#download

For Speech-Text pair, our model trained on MS inner dataset. But it can also be trained by other public dataset, e.g. LibriSpeech http://www.openslr.org/12

Preprocess the Data

For audio samples, our model trained on mel-spectrogram.

Write your training list in this format: 'mel-spec-path'|'linear-spec-path'|'lenth'|'text'

Train the model

python3 train.py

Finetune or Train on Your Own Data

You can set your own appropriate hypapameters in hparams.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
models		models
util		util
README.md		README.md
hparams.py		hparams.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

util

util

README.md

README.md

hparams.py

hparams.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

skipNet

Installing Dependencies

Training

Finetune or Train on Your Own Data

About

Releases

Packages

Contributors 2

Languages

yunyikristy/skipNet

Folders and files

Latest commit

History

Repository files navigation

skipNet

Installing Dependencies

Training

Finetune or Train on Your Own Data

About

Resources

Stars

Watchers

Forks

Languages