DeepPredSpeech

Computational models of predictive speech coding based on deep learning

This repository contains the source code used to design the computational models of predictive speech coding described in Hueber. T., Tatulli, E., Girin, L., Schwartz, J-L, "How predictive can be predictions in the neurocognitive processing of auditory and audiovisual speech? A deep learning study.". The initial submission is available as a preprint at https://doi.org/10.1101/471581 (the paper is currently under revision in a peer-reviewed journal).

This repository contains the following scripts (Python 2.7):

"do_sim_pred_coding_ntcdtimit_cnn.py" and "do_sim_pred_coding_ntcdtimit_cnn_audio.py" which are related to the experiments conducted on the NTCD-TIMIT audiovisual speech database. The first one is related to the experiments based on MFCC-spectrogram, the second one on log-magnitude spectrogram (both eventually combined with visual input, i.e. lip movements). See the preprint for more details about these audio representations and the corresponding model architectures (e.g. feed-forwared deep neural network or convolutional neural network or a combination of both). All training/test data (including audio features and raw lip images), pre-trained models, and experimental results have been made publicly available on Zenodo (DOI 10.5281/zenodo.1487974).
do_sim_pred_coding_librivox.py which are related to the experiments conducted on the Librispeech (audio-only database), based on the MFCC spectrogram (not presented in the preprint).

Many options related to the feature extraction, the architecture of audio, visual and audio-visual predictive models, their training, etc. can be modified by editing directly the "CONFIG" section at the beginning of each script (see comments for a brief explanation of each options).

This repository contains also a jupyter notebook DeepPredSpeechXplore allowing to simulate predictive coding of speech with our models pre-trained on Librispeech from any audio file. Pretrained models and audio sound examples are available in the DeepPredSpeechXplore_res.zip archive file (which has to me unzip and placed in the same directory as the notebook).

Don't hesitate to contact me for more details.

Thomas Hueber, Ph. D. CNRS researcher, GIPSA-lab, Grenoble, France thomas.hueber@gipsa-lab.fr

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DeepPredSpeechXplore.ipynb		DeepPredSpeechXplore.ipynb
LICENSE		LICENSE
README.md		README.md
do_sim_pred_coding_librivox.py		do_sim_pred_coding_librivox.py
do_sim_pred_coding_ntctimit_cnn.py		do_sim_pred_coding_ntctimit_cnn.py
do_sim_pred_coding_ntctimit_cnn_audio.py		do_sim_pred_coding_ntctimit_cnn_audio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepPredSpeechXplore.ipynb

DeepPredSpeechXplore.ipynb

LICENSE

LICENSE

README.md

README.md

do_sim_pred_coding_librivox.py

do_sim_pred_coding_librivox.py

do_sim_pred_coding_ntctimit_cnn.py

do_sim_pred_coding_ntctimit_cnn.py

do_sim_pred_coding_ntctimit_cnn_audio.py

do_sim_pred_coding_ntctimit_cnn_audio.py

Repository files navigation

DeepPredSpeech

About

Releases

Packages

Languages

License

thueber/DeepPredSpeech

Folders and files

Latest commit

History

Repository files navigation

DeepPredSpeech

About

Resources

License

Stars

Watchers

Forks

Languages