-
Notifications
You must be signed in to change notification settings - Fork 0
/
act_report.txt
1 lines (1 loc) · 1.62 KB
/
act_report.txt
1
In creating visualizations and performing analysis on the datasets several insights were found. Looking at the histogram of rating_numerator values, it was found that, while there were some extremely high outlier values, that the majority of the distribution was unimodal with a slight left skew with a mode at 12. It was also found that the distributions for both retweet_count and favorite_count followed a log scaling. Retweet count was more normally distributed and unimodal, while favorite count was both more left skewed and slightly bimodal. Bar charts of the top ten most populous predictions of dog breed showed consistently that labrador retrievers and golden retrievers were the most commonly predicted breed, though the validity of this would need to be verified in a larger project. Despite their being some differences in the histograms of retweet_count and favorite_count, there was found to be an extremely strong, positive, and linear correlation between the two of them shown in the scatter plot of the two with log scaling on both axes. Rating_numerator also had very similar distributions in scatterplots against retweet_count and favorite count, with potentially very weak positive correlations. It was found that, while 'pupper' was the most common stage category, that there was actually a potential trend for tweets with higher retweet_counts having the stage categories of 'doggo' or 'puppo', though more analysis outside of the scope of this project would need to be done to see what factors are causing this trend, such as 'doggo' and 'puppo' tweets getting relatively higher ratings and therefore having higher retweet_counts.