Skip to content

EricAsna/Twitter-Archive-Data-Wrangling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Archive Data Wrangling

WeRateDogs is a Twitter account that rates people’s dogs with a humorous comment about the dog. In this project, the wrangling efforts are conducted on the tweet archive of WeRateDogs account using Python and its libraries.

Data Gathering

Data has been gathered from the following sources:

  1. The WeRateDogs twitter archive which is given to be downloaded manually and uploaded to the jupyter notebook.
  2. The tweet image predictions that contains the breed of dogs identified from dog images bu running every image through a neural network. This data was downloaded programmatically using the requests library.
  3. The Twitter API for each tweet’s JSON data using Python’s Tweepy library and store each tweet’s entire set of JSON data in a file. The outputs of the twitter API are in json and these outputs were stored in a text file called tweet_json.txt in separate lines. Deleted tweets were also stored in a dictionary. After storing the data, the text file was opened and read to extract tweet_id, retweet_counts and favorite_counts from each json content.

Data Source

All data are available in Source folder.

Codes

Python codes are available in Code folder.

Insights

Created insights are summarized in act_report.pdf