Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach (SIGIR 2021)

Folder "data" :

Contains the dataset we train our model on.

Folder "analysis_data" :

This folder has COVID-19 related tweets from India, that we perform our aspect based analysis on. It has two csv files, that contain predictions of our model along with cleaned tweets

panacea_india_data.csv: containing all tweets from January to July 4th of 2020
panacea_india_data_filt.csv: contains tweets from March 1 of 2020 to July 4th of 2020 (day number:61 to day number:186)

Folder "aspects" :

It has two subfolders:

raw: it has the raw ABAE output: (7 aspects for Annoyed, Optimistic and Surprised, with 100 support words and their scores for each of the aspects)
filtered: it has hand filtered output, where incoherent aspects have been discarded. The remaining aspects have been named, and a few generic, irrelevant support words have been discarded as well. This has been carried out for Annoyed and Optimistic. The final data is saved in json format

word2vec.py

We use this python file to get word2vec models which are required by ABAE to generate the aspects.

normalize_tweets.py :

We use the function normalize tweets, for normalizing the tweets, before using word2vec.py and also to generate the "clean_text" field of panacea_india_data_filt.csv

For scraping/hydrating (scrape.py) :

python scrape.py -s True -q [queries] -l [limit on tweets]  
python scrape.py -H True -f [files containing tweets ids]

Note : The -H stands for hydration, and -s for scraping. Restrictions related to coordinates, time intervals, can be modified directly in the script.

For plotting graphs (plot_graphs.ipynb) :

It's used to plot the counts of aspects (filtered/annoyed.json and filtered/optimistic.json) for tweets read from panacaea_data_india_filt.csv. We count the number of occurences of any of the aspect categores for both emotions in chunks of tweets having 4000 tweets in them, and containing the emotion being considered (e.g. for annoyed, each tweet must have annoyed in its predictions). Ocurrence of any of the support words for an aspect of an emotion, contributes 1 to the total count. Run all the cells of plot_graphs.ipynb to generate the plots.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
analysis_data		analysis_data
aspects		aspects
data		data
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
normalize_tweets.py		normalize_tweets.py
parameter_search_git.py		parameter_search_git.py
plot_graphs.ipynb		plot_graphs.ipynb
scrape.py		scrape.py
train_mtl_git.py		train_mtl_git.py
word2vec.py		word2vec.py

License

atharva-naik/VADEC

Folders and files

Latest commit

History

Repository files navigation

Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach (SIGIR 2021)

Folder "data" :

Folder "analysis_data" :

Folder "aspects" :

word2vec.py

normalize_tweets.py :

For scraping/hydrating (scrape.py) :

For plotting graphs (plot_graphs.ipynb) :

About

Topics

Resources

License

Stars

Watchers

Forks

Languages