WeRateDogs Twitter Handle Analysis

Overview

WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. The account was started in 2015 by college student Matt Nelson, and has received international media attention both for its popularity and for the attention drawn to social media copyright law when it was suspended by Twitter for breaking these aforementioned laws. Read more
The main objective of this project is data wrangling. In this project, I did web scraping using the Request library and Tweepy. I also performed little exploratory and explanatory analysis, found insights and suggested ways to increase tweet retweeting.

This project required gathering three data sets. The method used to gather each data was different and are as follows.

Twitter archive file: This can be downloaded manually or programmatically with the use of the Request library
The tweet image predictions: This can only be downloaded programmatically using the Request library because the file image_predictions.tsv is hosted on Udacity's servers and cannot be accessed manually.
Tweets: Each tweet's retweet count and favorite ("like") count at minimum, and any additional data found to be interesting are scraped. This is done by:
- Extracting the tweet IDs in the WeRateDogs Twitter archive and store in another file (tweet_id.txt)
- Quering the Twitter API for each tweet's JSON data using Python's Tweepy library and store the data in another file (tweet_json.txt)

In the archive table

In the image table

In the tweet table

Data Tidiness

A new data set named 'twitter_archive_master' was produced by merging the three data sets named above, on tweet_id. Read more

Favorite count and retweet count has been found to reach their peaks in June. This can be rationally attributed to the fact that dog festival normally occur during this period. Followed by this month is January and December for favorite count and retweet count respectively. Third on the list is also December and January (respectively). This may be due to increased festive activities during the perionds
Saturday usually has the highest favorite count followed by Friday. This is probably due to less busy schedules on these days (weekend).
Also, as expected, the correlation between favorite count and retweet count is, positively, very strong (0.86). Hence, favorite tweets are more likey to be retweeted.
On the other hand, the correlation between the each feature (favorite count and retweet count) and numerator rating is and denominator rating is very weak, positive for the former and negative for the latter.

It is prefferable that posts are targeted on Fridays and Saturdays.
Dog events should be hosted around June, December or January.
Another factor should be used in predicting probability of retweeing as the numerator and denominator ratings are not effective.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
LICENSE		LICENSE
README.md		README.md
act_report.ipynb		act_report.ipynb
act_report.pdf		act_report.pdf
image_predictions.tsv		image_predictions.tsv
tweet_ids.txt		tweet_ids.txt
tweet_json.txt		tweet_json.txt
twitter_archive_enhanced.csv		twitter_archive_enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
we-rate-dogs.jpg		we-rate-dogs.jpg
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.ipynb		wrangle_report.ipynb
wrangle_report.pdf		wrangle_report.pdf