Why we did this project? We choose World Cup dataset on Kaggle because we are both soccer fans. Also, we just went through an exciting 2018 world cup in Russia. So we choose this theme to do some analysis.
Our tech stack? We use pandas, seaborn, numpy, matplotlib,plotly modules in Python Jupyter Notebook to do our analysis. In this project, we found these modules very useful, flexible, and interesting.
The tournament has 16 teams at first, was expanded to 24 teams in 1982, and then to 32 in 1998. We can see that the audience increases as the number of teams increase. For example, there is a big increase in total attendence in 1982 compare the 1978, 1994 USA World Cup has the highest total attendence overall,the probable reason is that the stadiums in the USA have higher capacities. 2014 World Cup has the second highest total attendence.
There are 79 country teams have ever played in World Cup. Brazil attends most with 21 times. Europe has the most country teams(34) ever played in World Cup, Africa is the second. This data implies that in Europe and Africa, most country teams are competent,the competency pattern changed very often.
Brazil and Germany played most with 113 games, which implies they usually qualify from group stage and advance to the knockout stage games. So it's a strong indicator of soccer strength.
Europe has most teams and also most referees. Africa, Asia & Oceania and South America have second most teams. South America and Asia & Oceania has more referees than Africa. The difference might be related to high level professional leagues.
If your country team is in home position, you are more likely to win when the referee comes from your continent, and your opponent does not. If your country team is in away position, it's best have a continential neutral referee.Otherwise it's better to have a referee comes from your continent.
Link to our dataset:https://github.com/xzhang0529/WorldCup-Analysis/tree/master/Dataset
Link to IPython Notebook Viewer:https://nbviewer.jupyter.org/github/xzhang0529/WorldCup-Analysis/blob/master/World%20Cup%20Analysis%20Visualization%20Final.ipynb