whatsapp-language-modeling

A language model for Whatsapp conversations, with LSTM-based architectures in Pytorch.

General

This project aims at modeling instant messaging chats between two users of Whatsapp using Deep Learning. In particular, the focus is on being able to produce realistic conversations rather than just realistic messages, i.e. modeling also how the users alternate their message in the conversation time window. To this end, A single conversation represents a corpus, a consecutive exchange of messages represent a training instance (or document). Words and punctuation are the fundamental tokens. Change of user, end of message and end of conversation are all modeled through corresponding metatokens. Moreover, emoji are also introduced in the project-level dictionary of the algorithm as valid as any other word or punctuation symbol. These new aspects require some additional effort from the neural network to effectively understand how a realistic exchange of message of Whatsapp occurs.

Data

Input data of the system are Whatsapp conversation exports, as they are downloadable from the mobile version of the app in Android. From this export the loader system is able to segment conversation, create the token dictionary and train a custom langage predition/generation model.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md
dataset.py		dataset.py
generate.py		generate.py
loader.py		loader.py
models.py		models.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dataset.py

dataset.py

generate.py

generate.py

loader.py

loader.py

models.py

models.py

train.py

train.py

Repository files navigation

whatsapp-language-modeling

General

Data

About

Releases

Packages

Languages

paolo-notaro/whatsapp-language-modeling

Folders and files

Latest commit

History

Repository files navigation

whatsapp-language-modeling

General

Data

About

Topics

Resources

Stars

Watchers

Forks

Languages