Acoustic Sentiment Analysis

I. Aim To analyse sentiments based on acoustic features and to classify the sentiments into 10 classes.

II. Classes

Female angry
Female calm
Feamle fearful
Female happy
Female sad
Male angry
Male calm
Male fearful
Male happy
Male sad

III. DataSet

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). Out of speech and song we have used speech dataset.  The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. We have used Audio-only (16bit,48kHz) data. Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440. Out of 8 emotions we have chosen calm, happy, sad, angry, fearful emotions from dataset for classification i.e. 960 samples.

IV. Speech file loading parameters

Sampling rate : 44.1 Khz Speech file duration : 2.5 seconds Hop length : 512 Number of frames : (44100*2.5) / 512 = 216 frames

We have experimented with different hop lengths and sampling rates. The chosen hop length and sampling rate gives good accuracy.

V. Requirements

VI. Feature

MFCC : Mel Frequency Cepstral Coefficents (MFCCs) are a feature widely used in automatic speech and speaker recognition. The main point to understand about speech is that the sounds generated by a human are filtered by the shape of the vocal tract including tongue, teeth etc. This shape determines what sound comes out. If we can determine the shape accurately, this should give us an accurate representation of the phoneme being produced. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum, and the job of MFCCs is to accurately represent this envelope.

Steps to find MFCC:

Frame the signal into short frames and for each frame calculate the periodogram estimate of the power spectrum.
Apply the mel filterbank to the power spectra, sum the energy in each filter.
Take the logarithm of all filterbank energies.
Take the DCT of the log filterbank energies.
Keep DCT coefficients 2-13, discard the rest.

VII. Exploratory Data Analysis (EDA)

Waveform plot for speech sample

Scaled MFCC (13x216) Number of nMFCC coefficients=13 Number of frames=216

3)CNN result

We have also tested MFCC with MLP and LSTM but CNN gave better performance than both of them.

Below Flowchart shows the overall flow of EDA and use cases presented to customer

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
eda.py		eda.py
intercluster_distance.png		intercluster_distance.png
results.png		results.png
sentiment_analysis.png		sentiment_analysis.png
waveform.png		waveform.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

eda.py

eda.py

intercluster_distance.png

intercluster_distance.png

results.png

results.png

sentiment_analysis.png

sentiment_analysis.png

waveform.png

waveform.png

Repository files navigation

Acoustic Sentiment Analysis

About

Releases

Packages

Languages

dhanashribhole/Acoustic-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Acoustic Sentiment Analysis

About

Topics

Resources

Stars

Watchers

Forks

Languages