Speech-Recognition-Exercise

This project delves into the realm of speech recognition, aiming to convert spoken language into written text. By leveraging various libraries and technologies, the project transcribes audio from diverse sources and applies a range of recognition engines.

Data Source

The project utilizes various audio samples as primary data sources. These samples were recorded live and captured from the computer's microphone.

Libraries Used

The project utilizes a variety of libraries to facilitate speech recognition and data analysis:

SpeechRecognition: Primary Python library for speech-to-text conversion.
Google Cloud Services: Cloud-based recognition engines.
Apache Spark: For large-scale data processing and analysis.

Analysis

The project's primary objective is to transcribe spoken language with high accuracy. To achieve this, the analysis involves:

Audio Processing: Direct audio capture from microphones and handling pre-recorded audio files.
Recognition Engines: Integration with Google Web Speech API, Google Cloud Speech API, CMU Sphinx, and other engines.
Diarization: Using algorithms to separate speakers in the audio files, thus attributing spoken content to individual participants.
Model Optimization: Techniques and algorithms to enhance the accuracy of transcriptions.
Data Analysis: Utilizing Apache Spark for large-scale data processing and analysis of transcribed data.

Key Achievements

Implemented a versatile speech recognition system capable of handling varied speech patterns.
Enhanced transcription granularity through diarization, allowing for a detailed breakdown of spoken content.
Undertook model optimization efforts to refine transcriptions, achieving notable improvements in accuracy.
Applied large-scale data analysis techniques using Apache Spark, deriving valuable insights from transcribed content.

Conclusion

The "Speech-Recognition-Exercise" project demonstrates the power and versatility of modern speech recognition techniques. By combining various libraries and methodologies, the project offers a comprehensive system for transcribing spoken language. This system holds potential for a wide array of applications, from transcription services to voice assistants and beyond.

Future Work

Further advancements in this project could encompass:

Integration with more advanced recognition engines.
Exploration of neural network-based models for enhanced accuracy.
Extension of the diarization process to handle more complex audio samples with multiple speakers.
Incorporation of natural language processing techniques to refine and structure transcribed content.

Note

To fully understand the conclusions drawn in this analysis, it is recommended to go through the entire notebook, including the code and its outputs. You can view the HTML version of the notebook here.

Author

Jesus Cantu Jr.

Last Updated

June 6, 2023

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github/workflows		.github/workflows
.ipynb_checkpoints		.ipynb_checkpoints
Papers		Papers
Recordings		Recordings
Transcriptions		Transcriptions
dependencies		dependencies
.DS_Store		.DS_Store
Dockerfile		Dockerfile
Overview-of-Voice-Recognition-SystemSpeech-to-text.png		Overview-of-Voice-Recognition-SystemSpeech-to-text.png
README.md		README.md
Speech_Recognition_Exercise.ipynb		Speech_Recognition_Exercise.ipynb
Speech_Recognition_Exercise.py		Speech_Recognition_Exercise.py
environment.yml		environment.yml
pa_stable_v190700_20210406.tar		pa_stable_v190700_20210406.tar
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
speech_recognition_image.png		speech_recognition_image.png

JESUSC1/Speech-Recognition-Exercise

Folders and files

Latest commit

History

Repository files navigation

Speech-Recognition-Exercise

Data Source

Libraries Used

Analysis

Key Achievements

Conclusion

Future Work

Note

Author

Last Updated

About

Topics

Resources

Stars

Watchers

Forks

Languages