Automatic Speech Recognition Lab

Motivation

This repo covers most of the math theory necessary to tackle the Voice User Interfaces Capstone project as part of the Artificial Intelligence Engineer Nanodegree VUI Concentration. You will find a lot of useful references, some recommended by Udacity and some other found along the way, within each notebook.

A good starting point to put yourself in context can be this video lecture from Apple of Dr. Lee, suggested by Udacity.

The motivation behind these notebooks is to introduce the first approach to implement these algorithms in Python, so you will find that some code is mixed with the notes and the maths.

Index

Basic Acoustics

This first notebook is an introduction to acoustics, and it is needed to understand the complexity of an acoustic signal, and the dimension of the problem we are facing in ASR.

Concepts covered:

Speech signal
Fourier Theory
Spectrograms and FFT (+ intro to implementation)
Feature extraction (Cepstra coeff)
- Mel Scale + Math
- Computation of Cepstrum

Automatic Speech Recognition(IN PROGRESS)

This notebook covers the main components of an ASR pipeline and explains component by component the problem, the task to be done and the math models behind. Most references for this notebook have been taken from [https://web.stanford.edu/~jurafsky/slp3/9.pdf] and this [https://web.stanford.edu/class/cs224s/lectures/224s.17.lec3.pdf], which are excellent references but quite long to process.

Concepts covered:

Challenges in ASR
Task Dimensions
Pipeline
Acoustic Model and Hidden Markov Models
Language Model and n-Grams

HandsOn LibriSpeech Dataset Minilab

This minilab comes from Udacity and is supposed to be a hands-on the VUI Capstone project. You may notice some difference with the original Udacity's Voice lab, since some of the quizzes from the Student Interface, and related to this problem, have been included here.

The purpose of this lab is to gain familiarity with speech data you might use to train an Automatic Speech Recognition (ASR) system. For this problem, we will be using a subset of LibriSpeech dataset.

Concepts covered:

Explore the LibriSpeech data set and format
Create your own audio files
Build your own audio data set

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
AIND-VUI-Voice-Data--minilab		AIND-VUI-Voice-Data--minilab
assets		assets
.gitignore		.gitignore
Automatic_Speech_Recognition_ASR.ipynb		Automatic_Speech_Recognition_ASR.ipynb
Basic_Acoustics_from_Speech_to_Features_using_MFCC.ipynb		Basic_Acoustics_from_Speech_to_Features_using_MFCC.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIND-VUI-Voice-Data--minilab

AIND-VUI-Voice-Data--minilab

assets

assets

.gitignore

.gitignore

Automatic_Speech_Recognition_ASR.ipynb

Automatic_Speech_Recognition_ASR.ipynb

Basic_Acoustics_from_Speech_to_Features_using_MFCC.ipynb

Basic_Acoustics_from_Speech_to_Features_using_MFCC.ipynb

README.md

README.md

Repository files navigation

Automatic Speech Recognition Lab

Motivation

Index

Basic Acoustics

Concepts covered:

Automatic Speech Recognition(IN PROGRESS)

Concepts covered:

HandsOn LibriSpeech Dataset Minilab

Concepts covered:

About

Releases

Packages

Languages

nvmoyar/voice-user-interfaces--notes

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition Lab

Motivation

Index

Concepts covered:

Automatic Speech Recognition(IN PROGRESS)

Concepts covered:

Concepts covered:

About

Topics

Resources

Stars

Watchers

Forks

Languages