Skip to content

Leveraging various traditional and modern NLP approaches to analyse a dataset with script lines from the US TV-show "The Office".

Notifications You must be signed in to change notification settings

timo282/NLP-The-Office

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

That’s what the data said:
An NLP Analysis of Script Lines from the US TV-Show "The Office"

project_status delivery_status

Project Goal

Our objective is to apply various traditional and methorn methods of NLP in order to gain interesting insights into the show and its characters by only looking at "what the data says". More specific, we analyze characters, relationships, sentiments and topics to identify speaking styles and developments. We want to provide additional insights both for fans and for people who did not watch the show.

Find our used data here.

This repository also contains scripts to train models to generate scenes (such as the scene above) and to classify the speaker of a line.

Use our models

We uploaded the fine-tuned models to HuggingFace to make them easy accessible for everyone. There you can find the Speaker Classification and Scene Generation models and directly test them via Inference API.

Read more in our blog articles

the-office-whoa

This project was done in the course of the lecture "Intelligent Text Analysis" at Ravensburg Cooperative State University (DHBW). The paper we wrote on our results can also be found in this repository.

About

Leveraging various traditional and modern NLP approaches to analyse a dataset with script lines from the US TV-show "The Office".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published