MNIST-Audio-Digit-Classifier

This project utilizes deep learning techniques for the classification of the free-spoken-digit-dataset, offering an audio-oriented counterpart to the traditional MNIST dataset.

Getting Started

To run this project, ensure that you have Python 3 installed on your system. Follow these steps:

Create a Python environment using the following commands:
```
conda create -n audio python=3
activate audio
```
Install the necessary libraries listed in the requirements.txt file:
```
pip install -r requirements.txt
```
Clone this repository and navigate to the root folder.
Run the main.py file:
```
python src/main.py
```

Enhanced Workflow

When you execute the main.py file, it will perform the following workflow:

Folder Preparation:
- Create the necessary folders for organizing data, models, and reports:
  - data/processed: Processed data will be stored here.
  - data/production_data: Random audio samples for production testing will be placed here.
  - models: Trained models will be saved in this folder.
  - reports: Graphical representations and evaluation metrics will be stored here.
Data Arrangement:
- Arrange raw data into folders by digits and save them into the data/processed_data folder.
Production Testing Data Selection:
- Select random audio samples from each digit to create a production testing dataset. These samples will be stored in the data/production_data folder.
Graphical Representations:
- Take random audio files, plot graphical representations, and save the figures into the reports folder.
Feature Extraction:
- Extract log mel spectrograms from the processed data to construct the dataset for training and evaluation.
Data Splitting:
- Split the audio features into train, validation, and test sets to prepare for model training.
Model Training:
- Train the models using the prepared dataset.
Performance Assessment:
- Assess the models' performances using metrics such as Accuracy, Precision, Recall, etc.
- Save figures such as Confusion Matrix, AUC ROC, loss and accuracy curves into the reports folder.
Model Saving:
- Save the trained models in the models folder for future use during production.

In addition to images, refer to console logs for a clear understanding of the program's evolution.

Testing the App in Production

Let's try the app in production by following these steps:

Run the Application:
- Execute the app.py file to launch the application.
Access the Application:
- Open your web browser and go to http://127.0.0.1:5000.
Choose Model for Prediction:
- Select the desired model for prediction from the available options:
  - CNN Model: http://127.0.0.1:5000/predict_using_cnn/
  - Conv1D Model: http://127.0.0.1:5000/predict_using_conv1d/
  - LSTM Model: http://127.0.0.1:5000/predict_using_lstm/
  - Hybrid Model: http://127.0.0.1:5000/predict_using_hybrid/
Audio File Prediction:
- Choose an audio file from the data/production_data folder.
Insert Parameter:
- Insert the selected file's name as a parameter in the prediction route URL.
Run Prediction:
- Execute the link to observe the prediction output.

References

Audio-Classification with Seth Adams: Seth Adams' repository on Audio-Classification, offering valuable insights and code that contributed to the development of this project.
Deep Learning (Audio) Application: From Design to Deployment: A video tutorial providing a comprehensive overview of designing and deploying deep learning applications for audio, influencing the development of this project.
Deep Learning for Audio Classification: A series of videos covering deep learning techniques specifically tailored for audio classification, serving as a valuable resource during the project.

Feel free to explore these references for deeper insights and guidance on audio classification, deep learning, and related topics.

Happy coding!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data/raw/free-spoken-digit-dataset		data/raw/free-spoken-digit-dataset
images		images
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/raw/free-spoken-digit-dataset

data/raw/free-spoken-digit-dataset

images

images

src

src

.gitignore

.gitignore

README.md

README.md

app.py

app.py

requirements.txt

requirements.txt

Repository files navigation

MNIST-Audio-Digit-Classifier

Getting Started

Enhanced Workflow

Testing the App in Production

References

About

Releases

Packages

Languages

IdrisseAA/MNIST-Audio-Digit-Classifier

Folders and files

Latest commit

History

Repository files navigation

MNIST-Audio-Digit-Classifier

Getting Started

Enhanced Workflow

Testing the App in Production

References

About

Topics

Resources

Stars

Watchers

Forks

Languages