Name		Name	Last commit message	Last commit date
parent directory ..
1221-135766-0000.wav		1221-135766-0000.wav
LICENCE_DEEPSPEACH2		LICENCE_DEEPSPEACH2
LICENSE_LANGURAGE_MODEL		LICENSE_LANGURAGE_MODEL
README.md		README.md
README_JP.md		README_JP.md
decoder.py		decoder.py
deepspeech2.py		deepspeech2.py
requirements.txt		requirements.txt

README.md

deepspeech2

input

audio file（16kHz)

LibriSpeech ASR corpus
http://www.openslr.org/12
1221-135766-0000.wav

output

texts

#librispeech_pretrained_v2
how strange it seemed to the sad woman as she watched the growth and the beauty that became every day more brilliant and the intelligence that through its quivering sunshine over the tiny features of this child

#an4_pretrained_v2
sthiee sixtysx s one cs one stwp teoh ten teny kwenth three t four eineaieteen twonr two seven te ine  thine shirn i np twe tseiox sven sie

#ted_pretrained_v2
howstrange at seemed to the sad woman she wachd the grolt han the beauty that became every day more brilliant and the intelligence that through its equivering sunching over the tiny peacturs of this child

Usage

Basic

File input

$ python3 deepspeech2.py -i 1221-135766-0000.wav -s output.txt

Mic input

$ python3 deepspeech2.py -V

speak into the microphone when "Please speak something."
end the recording after about 1 second of silence and do voice recognition
return to 1 again after displaying the forecast results
type Ctrl+c if you want to exit

Options

With the -d option, decode the recognition results in BeamDecoder using the language model. With the -a option, you can use other trained models.

Setup

Install pyaudio

Mac OS:

brew install portaudio
pip install pyaudio

Linux:

sudo apt-get install portaudio19-dev
pip install pyaudio

Install ctcdecode (for -d option)

This module is required to perform Beam Decode using the language model.

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Reference

deepspeech.pytorch

Framework

PyTorch

Model Format

ONNX opset = 10

Netron

an4_pretrained_v2.onnx.prototxt

librispeech_pretrained_v2.onnx.prototxt

ted_pretrained_v2.onnx.prototxt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeech2

deepspeech2

1221-135766-0000.wav

1221-135766-0000.wav

LICENCE_DEEPSPEACH2

LICENCE_DEEPSPEACH2

LICENSE_LANGURAGE_MODEL

LICENSE_LANGURAGE_MODEL

README.md

README.md

README_JP.md

README_JP.md

decoder.py

decoder.py

deepspeech2.py

deepspeech2.py

requirements.txt

requirements.txt

README.md

deepspeech2

input

output

Usage

Basic

Options

Setup

Install pyaudio

Install ctcdecode (for -d option)

Reference

Framework

Model Format

Netron

Files

deepspeech2

Directory actions

More options

Directory actions

More options

Latest commit

History

deepspeech2

Folders and files

parent directory

deepspeech2

input

output

Usage

Basic

Options

Setup

Install pyaudio

Install ctcdecode (for -d option)

Reference

Framework

Model Format

Netron