Skip to content

SzilvasiPeter/UdacityDataAnalystNanodegree

Repository files navigation

UdacityDataAnalystNanodegree

Udacity Data Analyst Nanodegree

Course 1: Introduction to Data Analysis

LESSON ONE Anaconda Learn to use Anaconda to manage packages and environments for use with Python
LESSON TWO Jupyter Notebooks Learn to use this open-source web application to combine explanatory text, math equations, code, and visualizations in one sharable document
LESSON THREE Data Analysis Process Learn about the key steps of the data analysis process. Investigate multiple datasets using Python and Pandas.
LESSON FOUR Pandas and AND NumPy: Case Study 1 Perform the entire data analysis process on a dataset. Learn to use NumPy and Pandas to wrangle, explore, analyze, and visualize data
LESSON FIVE Pandas and AND NumPy: Case Study 2 Perform the entire data analysis process on a dataset. Learn more about NumPy and Pandas to wrangle, explore, analyze, and visualize data
LESSON SIX Programming Workflow for Data Analysis Learn about how to carry out analysis outside Jupyter notebook using IPython or the command line interface

Course 2: Practical Statistics

LESSON ONE Simpson’s Paradox Examine a case study to learn about Simpson’s Paradox
LESSON TWO Probability Learn the fundamental rules of probability.
LESSON THREE Binomial Distribution Learn about binomial distribution where each observation represents one of two outcomes. Derive the probability of a binomial distribution.
LESSON FOUR Conditional Probability Learn about conditional probability, i.e., when events are not independent.
LESSON FIVE Bayes Rule Build on conditional probability principles to understand the Bayes rule. Derive the Bayes theorem.
LESSON SIX Standardizing Convert distributions into the standard normal distribution using the Z-score. Compute proportions using standardized distributions.
LESSON SEVEN Sampling Distributions and Central Limit Theorem Use normal distributions to compute probabilities. Use the Z-table to look up the proportions of observations above, below, or in between values.
LESSON EIGHT Confidence Intervals Estimate population parameters from sample statistics using confidence intervals.
LESSON NINE Hypothesis Testing Use critical values to make decisions on whether or not a treatment has changed the value of a population parameter.
LESSON TEN T-Tests and A/B Tests Test the effect of a treatment or compare the difference in means for two groups when we have small sample sizes.
LESSON ELEVEN Regression Build a linear regression model to understand the relationship between independent and dependent variables. Use linear regression results to make a prediction.
LESSON TWELVE Multiple Linear Regression Use multiple linear regression results to interpret coefficients for several predictors
LESSON THIRTEEN Logistic Regression Use logistic regression results to make a prediction about the relationship between categorical dependent variables and predictors.

Course 3: Data Wrangling

LESSON ONE Intro to Data Wrangling Identify each step of the data wrangling process (gathering, assessing, and cleaning). Wrangle a CSV file downloaded from Kaggle using fundamental gathering, assessing, and cleaning code.
LESSON TWO Gathering Data Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs. Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files. Store gathered data in a PostgreSQL database.
LESSON THREE Assessing Data Assess data visually and programmatically using pandas. Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues). Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity.
LESSON FOUR Cleaning Data Identify each step of the data cleaning process (defining, coding, and testing). Clean data using Python and pandas. Test cleaning code visually and programmatically using Python.

Course 4: Data Visualization with Python

LESSON ONE Data Visualization in Data Analysis Understand why visualization is important in the practice of data analysis. Know what distinguishes exploratory analysis from Explanatory analysis, and the role of data visualization in each.
LESSON TWO Design of Visualizations Interpret features in terms of level of measurement. Know different encodings that can be used to depict data in visualizations. Understand various pitfalls that can affect the effectiveness and truthfulness of visualizations.
LESSON THREE Univariate Exploration of Data Use bar charts to depict distributions of categorical variables. Use histograms to depict distributions of numeric variables. Use axis limits and different scales to change how your data is interpreted.
LESSON FOUR Bivariate Exploration of Data Use scatterplots to depict relationships between numeric variables. Use clustered bar charts to depict relationships between categorical variables. Use violin and bar charts to depict relationships between categorical and numeric variables. Use faceting to create plots across different subsets of the data.
LESSON FIVE Multivariate Exploration of Data Use encodings like size, shape, and color to encode values of a third variable in a visualization. Use plot matrices to explore relationships between multiple variables at the same time. Use feature engineering to capture relationships between variables.
LESSON SIX Explanatory Visualizations Understand what it means to tell a compelling story with data. Choose the best plot type, encodings, and annotations to polish your plots. Create a slide deck using a Jupyter Notebook to convey your findings.
LESSON SEVEN Visulization Case Study Apply your knowledge of data visualization to a dataset involving the characteristics of diamonds and their prices.