Mining Archive.org's Twitter Stream Grab for Pharmacovigilance Research Gold

This repository contains the code and data from the following paper - Mining Archive.org's Twitter Stream Grab for Pharmacovigilance Research Gold - https://www.biorxiv.org/content/10.1101/859611v1

Please use the following instructions if you intend to reproduce the results. If you are looking for the dataset, scroll all the way down!!

The Internet Archive's files must be first downloaded from - https://archive.org/details/twitterstream.

Example - wget https://archive.org/download/archiveteam-twitter-stream-2018-10/twitter-2018-10-01.tar.

Unzip the files using - sh separate.sh Note: Please change the source directly location on the first line of separate.sh
Use spacy_separate_onlyjson.py to filter drug tweets using the dictionary.

Eg: python spacy_separate_onlyjson.py -i Oct -d drug_dict_singlestr.csv -o oct_singlestr_tweets.json

To train the deep learning or classical models, please download data as given under the Claassification Section of the paper. Once you obtain the training data, you can use the code for training Classical Models and Deep Learning models. Please follow the readme instructions on Classical and Deep Learning models for specific instructions and examples

Dataset

The 3,346,758 tweet ids dataset is currently available at - https://doi.org/10.5281/zenodo.3571328

Contact Information

Ramya Tekumalla - rtekumalla1@student.gsu.edu

Juan M Banda - jbanda@gsu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Classical Models		Classical Models
Deep_Learning_models		Deep_Learning_models
README.md		README.md
drug_dict_difference.csv		drug_dict_difference.csv
drug_dict_singlestr.csv		drug_dict_singlestr.csv
separate.sh		separate.sh
spacy_separate_onlyjson.py		spacy_separate_onlyjson.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classical Models

Classical Models

Deep_Learning_models

Deep_Learning_models

README.md

README.md

drug_dict_difference.csv

drug_dict_difference.csv

drug_dict_singlestr.csv

drug_dict_singlestr.csv

separate.sh

separate.sh

spacy_separate_onlyjson.py

spacy_separate_onlyjson.py

Repository files navigation

Mining Archive.org's Twitter Stream Grab for Pharmacovigilance Research Gold

Dataset

Contact Information

About

Releases

Packages

Languages

thepanacealab/InternetArchive-Pharmacovigilance-Tweets

Folders and files

Latest commit

History

Repository files navigation

Mining Archive.org's Twitter Stream Grab for Pharmacovigilance Research Gold

Dataset

Contact Information

About

Topics

Resources

Stars

Watchers

Forks

Languages