Skip to content

The SAT, or Scholastic Aptitude Test, is a test that high school seniors in the U.S. take every year. The SAT has three sections, each is worth 800 points. Colleges use the SAT to determine which students to admit. High average SAT scores are usually indicative of a good school.

Notifications You must be signed in to change notification settings

phyllisgitu/SAT_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Analyzing Standardized Tests on NYC High School Data

Source: NYC OpenData

The SAT, or Scholastic Aptitude Test, is a test that high school seniors in the U.S. take every year. The SAT has three sections, each is worth 800 points. Colleges use the SAT to determine which students to admit. High average SAT scores are usually indicative of a good school.

New York City has published data on student SAT scores by high school, along with additional demographic datasets which can be found here.

We combined the following datasets into a single, clean pandas data frame:

  • SAT scores by school - SAT scores for each high school in New York City
  • School attendance - Attendance information for each school in New York City
  • Class size - Information on class size for each school
  • AP test results - Advanced Placement (AP) exam results for each high school (passing an optional AP exam in a particular subject can earn a student college credit in that subject)
  • Graduation outcomes - The percentage of students who graduated, and other outcome information
  • Demographics - Demographic information for each school
  • School survey - Surveys of parents, teachers, and students at each school

New York City is very diverse, so comparing demographic factors such as race, income, and gender with SAT scores is a good way to determine whether the SAT is a fair test. For example, if certain racial groups consistently perform better on the SAT, we would have some evidence that the SAT is unfair.

We can learn a few different things from these resources to better understand the dataset:

Our datasets include several different types of schools. We'll need to clean them so that we can focus on high schools only. Each school in New York City has a unique code called a DBN or district borough number. Aggregating data by district allows us to use the district mapping data to plot district-by-district differences.

survey_all.txt and survey_d75.txt are in more complicated formats than the other files. For now, we'll focus on reading in the CSV files only, and then explore them.

We'll read each file into a pandas.dataframe and then store all of the dataframes in a dictionary. This gives us a convenient way to store them and a quick way to reference them later on.

About

The SAT, or Scholastic Aptitude Test, is a test that high school seniors in the U.S. take every year. The SAT has three sections, each is worth 800 points. Colleges use the SAT to determine which students to admit. High average SAT scores are usually indicative of a good school.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published