Skip to content

andymithamclarke/Pundits-Review-Scraping

Repository files navigation

Pundits Review Scraping Process

Development of the scraping process used to collect data for the Pundits Review website - https://www.punditsreview.com/

Pundits Review scrapes and processes news articles about the Premier League in order to give players and teams a review score each week. Each Monday, the project collects articles, divides them into phrases, identifies the player or club being referred to and then predicts the sentiment of the phrase. See more on how it works here!

About this repository

This repository shows the progression of the method used to scrape and process football articles from news sites. The directories show the workings involved in each phase of building a solution. Phase One represents the first method used and final solution represents the method eventually integrated into the Pundits Review project.

NOTE:

Prediction models have been removed from this repository

Contents

Phase One Method: Combination of Beautiful Soup & Requests libraries used inside of notebook

Phase Two Method: Scrapy takes place of beautiful soup & requests inside notebook

Phase Three Method: More functions incorporated into modules. Pipeline takes shape but crawler still called from notebook

Core files used inside Scrapy Spider which was eventually integrated into project

Associated Repositories

Pundits Review - 11/09/2020 - Complete directory for Pundits Review web application.
Resources - Data, images & Python dictionary of Premier League players & teams
Entity Extraction - Development of the process used to recognise Premier League player & club entities within a news article
Sentiment Prediction - Development of the prediction model used to predict the sentiment in football news articles

Any Questions ... Send me an email!