Skip to content

nhuntwalker/udacity_projects

Repository files navigation

Udacity Projects


About

The directories in this repository are the end-results of projects done toward the completion of the Udacity nanodegree in Data Analysis. Each project including code and summary is visible as either an iPython notebook, or some form of HTML output. Descriptions of each project are as follows:

Project 0: Chopsticks

  • Languages/Skills: Statistics, Python
  • Libraries/Frameworks: Pandas, iPython Notebook, matplotlib

This project uses some data generated by a simple experiment on optimal chopstick length to illustrate some basic uses of Python and statistics.

Project 1: Test a Perceptual Phenomenon

  • Languages/Skills: Statistics, Python
  • Libraries/Frameworks: Pandas, iPython Notebook, matplotlib, Numpy

Analysis of the Stroop effect using descriptive statistics to provide an intuition about the data, and inferential statistics to draw a conclusion based on the results.

Project 2: Analyzing the New York Subway Dataset

  • Languages/Skills: Statistics, Python
  • Libraries/Frameworks: Pandas, iPython Notebook, ggplot, Numpy, Scipy, statsmodels

Used data science techniques including data wrangling, applied statistics, machine learning, and effective visualization to answer questions and draw conclusions about subway and weather data for New York City.

Project 3: Wrangle OpenStreetMap Data

  • Languages/Skills: Python, XML, MongoDB, JSON, regex
  • Libraries/Frameworks: iPython Notebook, pymongo

Assessed a portion of the Austin, TX OpenStreetMap data for validity, accuracy, completeness, consistency, and uniformity and then cleaned up problems found in the data.

Project 4: A Brief Foray into Prosper Loan Data

  • Languages/Skills: R
  • Libraries/Frameworks: RStudio, ggplot

This project involved doing exploratory data analysis on a data set containing loans from Prosper between 2005 and 2014. This project is done exclusively using R, and explores univariate, bivariate, and multivariate visualizations of loans in this time period. Ultimately I end with a linear model that attempts to predict how much a lender will lose on a loan given certain borrower characteristics.

Project 5: Classifying and Predicting Enron Persons of Interest

  • Languages/Skills: Python
  • Libraries/Frameworks: Scikit-Learn

In this project, I delve into the email corpus from the infamous Enron. The goal is to use the email and financial data from known Enron persons of interest to try and see if anyone else warrants investigation. To do this I will use various machine learning algorithms and assessment methods to classify known POIs, then use the classifier to try to classify other employees as either possible persons of interest or just regular employees.