To use PySpark, PostgresSQL, Pandas, and AWS service to determine if the reviews for the Health Personal Care category have any bias between vine (paid) program and non-vine program.
- How many Vine reviews and non-Vine reviews were there?
There are 497 reviews from the Vine program.
There are 120,825 reviews from the non-Vine program.
- How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?
After we filtered with reviews that have equal or greater than 20 votes of helpful_votes and the helpful_votes have equal or greater than 50% of rates, we got the following results for the counts of reviews: Total Review: 121,322 Total Vine Review: 497 Total non-Vine Review: 120,825
There are about 44% of five-stars review for the Vine program.
There are about 62% of five-stars review for the non-Vine program.
From the results we can see, after we applied the filters for the reviews, the total reviews for the Vine program were only 497 reviews out of 121,322 total reviews, which is a very small number. And there are about 44% of five-star reviews out of 497 reviews for the Vine program v.s. 62% of five-star reviews out of 120.825 reviews for the non-Vine program. The five-star reviews from the non-vine program are overwhelmingly higher than the Vine program. From the results, I can't see the signs of biased reviews from the Vine program.
For further analysis, we can see how many of the Vine five-star reviews are verified purchases and how many of the non-Vine five-star reviews are verified purchases. Or if the number of helpful_votes for vine program reviews is higher than non-Vine program reviews.