Skip to content

DeevanshiSharma/Bundesliga-Big-Data-Analysis-using-PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Big Data Analysis on Budesliga using PySpark

CONTEXT

Bundesliga, is a professional association football league in Germany. At the top of the German football league system, the Bundesliga is Germany's primary football competition.

PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing.

OBJECTIVE

The objective of this project is to perform Big Data Analysis on Bundesliga (German Football Association) dataset using PySpark.

Questions-

Q1- Who are the winners of the D1 division in the Germany Football Association (Bundesliga) between 2000-2010?

Q2- Which teams have been relegated in the past 10 years?

Q3- Does octoberfest affect the performance of Bundesliga?

Q4- Which season of bundesliga was the most competitive in the last decade?

Q5- What's the best month to watch Bundesliga?

About

Performed Big Data Analysis on Bundesliga Football League Dataset using tools PySpark, spark-SQL, and numpy and done in Jupyter Notebook.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published