Film Quality Prediction using Acoustic, Prosodic and Lexical Cues

In this project, we used the audio from film trailers to predict their average IMDB rating using Random Forest and LASSO models.

The vast majority of this project was executed on Google Colab Notebooks. Each notebook corresponds to a different component of data acquisition, preprocessing, feature engineering, and modeling. Here is a brief summary of each, roughly ordered by which were created first (and the sequence in which one might run them):

Data Acquisition

get_youtube_data.ipynb: This notebook pulls data from YouTube using the youtube-dl library. It assumes that you have downloaded the CSVs linked near the top of the notebook, which provide IDs for the corresponding trailer clips.
remove_music.ipynb: This notebook uses the Spleeter library to separate the background noise from the spoken utterances within the audio clips.
transcription.ipynb: This notebook uses Google Speech-to-Text API. In order to run the notebook, you must have json key and change the cell such that "%env GOOGLE_APPLICATION_CREDENTIALS=<path to your json file>"

Data Preprocessing and Feature Engineering

prosodic_embeddings.ipynb: This notebook trains a Tacotron + prosodic embeddings model.
preprocess_utterances.ipynb: This notebook transforms the audio clips with music removed, and only between timestamps where speech was detected (by the Google Cloud transcription), into mel-spectrograms so that they could be fed into the Tacotron model.
prosody_inference.ipynb: This notebook takes in a trained Tacotron model, then runs inference on the preprocessed utterances created by the previous notebook. This yields a set of embeddings for each utterance for each trailer.
Praat_Extraction.ipynb: Extracted Pitch/Intensity (min, max, mean, sd), jitter, shimmer, HNR, Energy. Contains code for segmenting speech to IPU.
aeneas.ipynb: Used Aeneas for forced alignment of transcript to audio. https://github.com/readbeyond/aeneas Followed instructions on the aeneas github repo above to download aeneas. Refer to aeneas.ipynb in this repo to process aeneas for our project.
sox: We used it for extracting speech segments. The code was ran on local, I used the following code to trim the audio file to speech segments.

brew install sox

dir_name is directory in which you uncompress all the output_example*.zip

import pandas as pd
import os

for file_path in glob('dir_name/*/OEBPS/Resources/*.csv'):
    df = pd.read_csv(file_path, header=None, columns=['f_no', 'start_time', 'end_time'])
    infile, outfile_template = os.path.basename(file_path), os.path.basename(file_path).replace('.wav', '_{}.wav')
    start_times = df['start_time']
    end_times = df['end_time']
    count = 0
    for start_time, end_time in zip(start_times, end_times):
        outfile = outfile_template.format(count)
        !sox $infile $outfile trim $start_time $end_time 
        count += 1

opensmile_extraction.ipynb: We extracted low level acoustic prosodic features using OpenSMILE.https://www.audeering.com/opensmile/

Setup:

!brew install audioconf automake libtool m4 gcc

!tar -zxvf opensmile-2.3.0.tar.gz

!cd opensmile-2.3.0

!bash autogen.sh (repeat this over 2 times)

!./configure

!make -j4 ; make

!make install

ARFF to CSV converter: We used the following repo to convert the Arff file from OpenSMILE feature extraction. https://github.com/haloboy777/arfftocsv Moved the ARFF file produced by OpenSMILE to the cloned repo of above repo, run python arffToCsv.py

Modeling

ML_notebook_RF.ipynb: This notebook first merges all the features in our dataset, then trains a Random Forest model to predict IMDB average rating.
ML_notebook.ipynb: This notebook first merges all the features in our dataset, then trains a LASSO model to predict IMDB average rating.
metascore.ipynb: This notebook trains RF/Lasso models to predict Metascore rating, which is the final metric used in the SLT paper.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
ML_notebook.ipynb		ML_notebook.ipynb
ML_notebook_RF.ipynb		ML_notebook_RF.ipynb
Praat_Extraction.ipynb		Praat_Extraction.ipynb
README.md		README.md
aeneas.ipynb		aeneas.ipynb
get_youtube_data.ipynb		get_youtube_data.ipynb
metascore.ipynb		metascore.ipynb
opensmile_extraction.ipynb		opensmile_extraction.ipynb
preprocess_utterances.ipynb		preprocess_utterances.ipynb
prosodic_embeddings.ipynb		prosodic_embeddings.ipynb
prosody_inference.ipynb		prosody_inference.ipynb
remove_music.ipynb		remove_music.ipynb
transcription.ipynb		transcription.ipynb

iamsusiep/slp2019

Folders and files

Latest commit

History

Repository files navigation

Film Quality Prediction using Acoustic, Prosodic and Lexical Cues

Data Acquisition

Data Preprocessing and Feature Engineering

Modeling

Other Links

Slides:

Speech Files:

Folder Containing Extracted Features: https://drive.google.com/drive/folders/1lsKFIMiyJEkInHC0fZiYB6BtCSGWMAdE?usp=sharing

About

Topics

Resources

Stars

Watchers

Forks

Languages