GitHub - nvmoyar/datascience_notebooks: Datacamp's Data Science Track: comprehensions, plots, pandas, hacker statistics

Descriptive Statistics with Python

The purpose of this repo is to introduce some simple tasks about data manipulation and visualizations, with Python and its libraries and packages Numpy, Pandas, MatPlotlib and Scikit-Learn. All these Jupyter Notebooks are taken as notes from Datacamp's Data Analyst track exercises. During my journey through Udacity Deep Learning and Artificial Intelligence Engineer Nanodegrees, I found that besides struggling with neural architectures, tuning, statistical models, calculus and algebra needed to understand in order to build and train any model, I needed to be fluent manipulating data since data pre-process is usually needed before feeding your model. These notebooks are not an attempt at teaching Data Analysis to anyone. These notebooks just cover some routinary tasks in Python. Each notebook is named after its Datacamp's analogous course.

DataCamp is a time flexible, online data science learning platform offering tutorials and courses in data science. Students can master data analysis from the comfort of their browser, at their own pace, and tailored to their needs and expertise. These Data Analyst courses are also available in R. The learning experience is really fast since you do not need to install anything and from the very beginning, you are encouraged to focus only in coding and learn to code, thus, your goal.

Since some people have asked me similar questions, I point some references here if you need some refresher courses in Statistics:

ProbStat by Stanford Lagunita -> Awesome hands-on course, no programming experience needed. Udacity offers ud827, which is ok as well.

Statistical thinking in Python I and Statistical thinking in Python II -> Awesome courses from Justin Bois offered by Datacamp. You go through EDA and CDA, concept by concept and exercise by exercise. Some experience manipulating data frames is needed.

A month ago on March 30th, 2018, Rachael Tatman from Kaggle, run a nice set of tutorials to be performed in 5 days -one day each, but now it is over, you can do it at your pace-, related to how to deal with ordinary operations like scaling and normalizing your data, dealing with time series, data inconsistencies, missing data, etc. If you do not have yet a Kaggle profile, this could be your chance by forking her notebook and start working without installing anything. Data Cleaning Challenge.

Introduction to Databases in Python -> Connecting and manipulating databases from a Python client.
Introduction_to_Data_Visualization and customizing plots -> Customize your plots
Manipulating Dataframes -> Transform, extract, and filter data from DataFrames, Work with pandas indexes and hierarchical indexes, Reshape and restructure your data. Cleaning data.
Merging Dataframes with Pandas -> Merge data: one-to-one, one-to-many, many-to-many.
Statistical Thinking in Python -> EDA -Exploratory Data Analysis-
Statistical Thinking in Python II -> Inference statistics, Pearson Correlation, Hypothesis testing, Test statistics and p-values, and case study through all these.
Supervised Learning with Scikit-learn -> Regression and classification problems, error functions and regularization.
Introduction to Natural Language Processing with NLTK - NLTK is a leading platform for building Python programs to work with human language data. This notebook covers different of words tokenization. This notebook does not contain the 100% contents of its analogous course

Requirements and environment

These are the packages you will need in your environment in order to run these notebooks. You are free to use https://conda.io/docs/ or whatever you feel comfortable with, of course.

name: data-analytics channels:

anaconda
defaults dependencies:
matplotlib=2.1.1
jupyter=1.0.0
nb_conda=2.2*
pandas=0.22.0
python=3.5.4
scikit-learn=0.19.1
scipy=1.0.0

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
mldatasets		mldatasets
.gitignore		.gitignore
Introduction_to_Data_Visualization.ipynb		Introduction_to_Data_Visualization.ipynb
Introduction_to_Databases_in_Python.ipynb		Introduction_to_Databases_in_Python.ipynb
Manipulating_DataFrames_with_Pandas.ipynb		Manipulating_DataFrames_with_Pandas.ipynb
Merging_DataFrames_with_pandas.ipynb		Merging_DataFrames_with_pandas.ipynb
Natural_Language_Processing_Fundamentals.ipynb		Natural_Language_Processing_Fundamentals.ipynb
README.md		README.md
Statistical_Thinking_in_Python_I.ipynb		Statistical_Thinking_in_Python_I.ipynb
Statistical_Thinking_in_Python_II.ipynb		Statistical_Thinking_in_Python_II.ipynb
Supervised_Learning_with_scikit-learn.ipynb		Supervised_Learning_with_scikit-learn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mldatasets

mldatasets

.gitignore

.gitignore

Introduction_to_Data_Visualization.ipynb

Introduction_to_Data_Visualization.ipynb

Introduction_to_Databases_in_Python.ipynb

Introduction_to_Databases_in_Python.ipynb

Manipulating_DataFrames_with_Pandas.ipynb

Manipulating_DataFrames_with_Pandas.ipynb

Merging_DataFrames_with_pandas.ipynb

Merging_DataFrames_with_pandas.ipynb

Natural_Language_Processing_Fundamentals.ipynb

Natural_Language_Processing_Fundamentals.ipynb

README.md

README.md

Statistical_Thinking_in_Python_I.ipynb

Statistical_Thinking_in_Python_I.ipynb

Statistical_Thinking_in_Python_II.ipynb

Statistical_Thinking_in_Python_II.ipynb

Supervised_Learning_with_scikit-learn.ipynb

Supervised_Learning_with_scikit-learn.ipynb

Repository files navigation

Descriptive Statistics with Python

Contents

Requirements and environment

About

Releases

Packages

Languages

nvmoyar/datascience_notebooks

Folders and files

Latest commit

History

Repository files navigation

Descriptive Statistics with Python

Contents

Requirements and environment

About

Topics

Resources

Stars

Watchers

Forks

Languages