Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Final Project for IDS - UE18CS203. Analysis of the Meteorite landing dataset, available at https://www.kaggle.com/nasa/meteorite-landings

Notifications You must be signed in to change notification settings

IamShubhamGupto/EDA_Meteorite_Landing_Sites

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

EDA On Meteorite Landing Sites

This project was created as part of our 3rd semester Introduction to Data Science (UE18CS203) final project. The dataset is hosted on my Google Drive with slight preprocessing: https://drive.google.com/file/d/1nLCVDfQy8NUu9NnD55meyj54ynkxiowp/view

Modules utilized for the project

  • pandas
  • numpy
  • sklearn
  • statsmodels
  • seaborn
  • matplotlib
  • mpl_toolkits
  • scipy
  • cython
  • pydrive
  • google
  • oauth2client

Contents of the project

  • Data Cleaning
  • Data Normalization - using StandardScaler()
  • Visualizations
    • Box plot - Check for outliers
    • Histogram - Check for normalization
    • q-q plot - Check for normalization
    • Map visualizations - Visualize a heat map for landing sites
    • Pie charts - Distribution of different types of meteorites
    • Heat map - Confusion matrix for correlation graph
    • Scatter plot - Visualization for correlation graph
  • Correlation Graph - Find correlations between columns using Heat Map generated
  • Hypothesis testing -
    H0: The difference between mean of sample mass and population mass mass is a statistical fluctuation.
    H1: The difference between mean of sample mass and population mass mass is significatn and not a mere case of statistical fluctuation.

Team