Skip to content

dhanashribhole/Acoustic-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Acoustic Sentiment Analysis

I. Aim To analyse sentiments based on acoustic features and to classify the sentiments into 10 classes.

II. Classes

  1. Female angry
  2. Female calm
  3. Feamle fearful
  4. Female happy
  5. Female sad
  6. Male angry
  7. Male calm
  8. Male fearful
  9. Male happy
  10. Male sad

III. DataSet

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). Out of speech and song we have used speech dataset.  The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. We have used Audio-only (16bit,48kHz) data. Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440. Out of 8 emotions we have chosen calm, happy, sad, angry, fearful emotions from dataset for classification i.e. 960 samples.

IV. Speech file loading parameters

Sampling rate : 44.1 Khz Speech file duration : 2.5 seconds Hop length : 512 Number of frames : (44100*2.5) / 512 = 216 frames

We have experimented with different hop lengths and sampling rates. The chosen hop length and sampling rate gives good accuracy.

V. Requirements

python | tensorflow | librosa | matplotlib | keras | sklearn

VI. Feature

MFCC : Mel Frequency Cepstral Coefficents (MFCCs) are a feature widely used in automatic speech and speaker recognition. The main point to understand about speech is that the sounds generated by a human are filtered by the shape of the vocal tract including tongue, teeth etc. This shape determines what sound comes out. If we can determine the shape accurately, this should give us an accurate representation of the phoneme being produced. The shape of the vocal tract manifests itself in the envelope of the short time power spectrum, and the job of MFCCs is to accurately represent this envelope.

Steps to find MFCC:

  1. Frame the signal into short frames and for each frame calculate the periodogram estimate of the power spectrum.
  2. Apply the mel filterbank to the power spectra, sum the energy in each filter.
  3. Take the logarithm of all filterbank energies.
  4. Take the DCT of the log filterbank energies.
  5. Keep DCT coefficients 2-13, discard the rest.

VII. Exploratory Data Analysis (EDA)

  1. Waveform plot for speech sample

Waveform of random sample

  1. Scaled MFCC (13x216) Number of nMFCC coefficients=13 Number of frames=216

Intercluster distance between emotions

3)CNN result

CNN result

We have also tested MFCC with MLP and LSTM but CNN gave better performance than both of them.

Below Flowchart shows the overall flow of EDA and use cases presented to customer

sentiment analysis

Releases

No releases published

Packages

No packages published

Languages