PISA 2012 Data Analysis

by Nadine Amin

Dataset

PISA is a survey of students' skills and knowledge as they approach the end of compulsory education. It focuses on examining how well prepared the students are for life beyond school.

Around 510,000 students in 65 economies took part in the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally. Of those economies, 44 took part in an assessment of creative problem solving and 18 in an assessment of financial literacy.

Files Used

'pisa2012.csv' file is the original dataset. It was not attached in this project because the file size exceeded Github's limit.

'pisadict2012.csv' file is the dictionary for the original 'pisa2012.csv' dataset.

'pisa_adj_cols.csv' file is the dictionary for only the chosen columns that were used in the data analysis project.

'exploration_template.ipynb' is the Jupyter Notebook with both the exploratory and explanatory data analyses. The .html file is also included 'exploration_template.html'.

'slide_deck_template.ipynb' is the slideshow with the summary of the most important findings. The .html slideshow is also included 'slide_deck_template.html'.

'output_toggle.tpl' is a file used in the 'slide_deck_template.ipynb' slideshow to hide the code lines from the slideshow.

'df.csv' is the file with the final results exported from the 'exploration_template.ipynb' notebook to be used in the 'slide_deck_template.ipynb' slideshow.

Libraries Versions Used

python 3.7.10

numpy 1.20.2

matplotlib 3.3.4

pandas 1.2.4

seaborn 0.11.1

Summary of Findings

Before starting this study, I thought the features that would affect the total scores the most were the teachers' influences, the students' immigration status, the class size, and the parents' highest schooling. However, almost none of my assumptions were correct once I started to see the relationships of the variables with the total scores and with other variables.

The number of cellphones, TVs, computers & books, the parents' schooling & jobs, and the homework study time were the variables that affected the total scores.

The higher the number of cellphones, TVs, computers and books, the higher the chances of getting a better total score. This could be because the family's social status was better, and therefore provided better support for the students.

As long as the parents' schooling was level 3A or higher, there is a good chance that the students would get higher grades. Furthermore, parents who had full-time jobs resulted in their children getting higher scores. This could be because having role models to look up to will make you work harder and believe in yourself more.

Finally, students who studied for longer hours had a higher chance of scoring better.

Key Insights for Presentation

In the presentation, I displayed the plots that affected the total scores the most. Those included the bivariate plots of the variables mentioned above against the total score. I also included the multivariate plot of the father and mother's jobs vs. the number of cellphones vs. the total score.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
df.csv.zip		df.csv.zip
exploration_template.html		exploration_template.html
exploration_template.ipynb		exploration_template.ipynb
output_toggle.tpl		output_toggle.tpl
pisa_adj_cols.numbers		pisa_adj_cols.numbers
pisadict2012.csv		pisadict2012.csv
slide_deck_template.ipynb		slide_deck_template.ipynb
slide_deck_template.slides.html		slide_deck_template.slides.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

df.csv.zip

df.csv.zip

exploration_template.html

exploration_template.html

exploration_template.ipynb

exploration_template.ipynb

output_toggle.tpl

output_toggle.tpl

pisa_adj_cols.numbers

pisa_adj_cols.numbers

pisadict2012.csv

pisadict2012.csv

slide_deck_template.ipynb

slide_deck_template.ipynb

slide_deck_template.slides.html

slide_deck_template.slides.html

Repository files navigation

PISA 2012 Data Analysis

by Nadine Amin

Dataset

Files Used

Libraries Versions Used

Summary of Findings

Key Insights for Presentation

About

Releases

Packages

Languages

nadineamin/pisa_data_analysis

Folders and files

Latest commit

History

Repository files navigation

PISA 2012 Data Analysis

by Nadine Amin

Dataset

Files Used

Libraries Versions Used

Summary of Findings

Key Insights for Presentation

About

Topics

Resources

Stars

Watchers

Forks

Languages