Udacity Projects

Author: Nicholas Hunt-Walker nhuntwalker@gmail.com

About

The directories in this repository are the end-results of projects done toward the completion of the Udacity nanodegree in Data Analysis. Each project including code and summary is visible as either an iPython notebook, or some form of HTML output. Descriptions of each project are as follows:

Project 0: Chopsticks

Languages/Skills: Statistics, Python
Libraries/Frameworks: Pandas, iPython Notebook, matplotlib

This project uses some data generated by a simple experiment on optimal chopstick length to illustrate some basic uses of Python and statistics.

Project 1: Test a Perceptual Phenomenon

Languages/Skills: Statistics, Python
Libraries/Frameworks: Pandas, iPython Notebook, matplotlib, Numpy

Analysis of the Stroop effect using descriptive statistics to provide an intuition about the data, and inferential statistics to draw a conclusion based on the results.

Project 2: Analyzing the New York Subway Dataset

Languages/Skills: Statistics, Python
Libraries/Frameworks: Pandas, iPython Notebook, ggplot, Numpy, Scipy, statsmodels

Used data science techniques including data wrangling, applied statistics, machine learning, and effective visualization to answer questions and draw conclusions about subway and weather data for New York City.

Project 3: Wrangle OpenStreetMap Data

Languages/Skills: Python, XML, MongoDB, JSON, regex
Libraries/Frameworks: iPython Notebook, pymongo

Assessed a portion of the Austin, TX OpenStreetMap data for validity, accuracy, completeness, consistency, and uniformity and then cleaned up problems found in the data.

Project 4: A Brief Foray into Prosper Loan Data

Languages/Skills: R
Libraries/Frameworks: RStudio, ggplot

This project involved doing exploratory data analysis on a data set containing loans from Prosper between 2005 and 2014. This project is done exclusively using R, and explores univariate, bivariate, and multivariate visualizations of loans in this time period. Ultimately I end with a linear model that attempts to predict how much a lender will lose on a loan given certain borrower characteristics.

Project 5: Classifying and Predicting Enron Persons of Interest

Languages/Skills: Python
Libraries/Frameworks: Scikit-Learn

In this project, I delve into the email corpus from the infamous Enron. The goal is to use the email and financial data from known Enron persons of interest to try and see if anyone else warrants investigation. To do this I will use various machine learning algorithms and assessment methods to classify known POIs, then use the classifier to try to classify other employees as either possible persons of interest or just regular employees.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.ipynb_checkpoints		.ipynb_checkpoints
d3_miniproject		d3_miniproject
data-visualization-and-d3		data-visualization-and-d3
data_analysis_with_r		data_analysis_with_r
project0		project0
project1		project1
project2		project2
project3		project3
project4		project4
project5		project5
project6		project6
.gitignore		.gitignore
README.md		README.md

nhuntwalker/udacity_projects

Folders and files

Latest commit

History

Repository files navigation

Udacity Projects

About

Project 0: Chopsticks

Project 1: Test a Perceptual Phenomenon

Project 2: Analyzing the New York Subway Dataset

Project 3: Wrangle OpenStreetMap Data

Project 4: A Brief Foray into Prosper Loan Data

Project 5: Classifying and Predicting Enron Persons of Interest

About

Resources

Stars

Watchers

Forks

Languages