RCS_ML_01_20

RCS Data Science and Machine Learning section January 2020 in conjuction with Accenture

Binder(cloud hosted Jupyter notebooks) Beta

Google Colab

- open all of current repo

Course Plan

Goal

Build a complete data analysis pipeline using Python ecosystem

Define the problem
Gather the raw data
Process (clean) the data
Explore
Analysis (apply models, make predictions)
Reports and Visual Results in a form understandable to stakeholders

Course Contents (50h)

Workplace Organization(~2h)

Git version control / command line
Jupyter / Anaconda environment for Data Science
Text Editors

Python (~10h)

Built in Data Types
Control Structures
Functions and Classes
List/Dictionary Comprehensions
File Manipulation
Advanced Concepts (Generators/Decorators)
useful Python standard libraries - Collections, functools, etc

external Numerical Libraries (~8h)

NumPy/Pandas
SciPy.Stats

noSQL databases noSQL: Hbase, MongoDB, Cassandra (~8h)

principles, types, CAP
Key-value DB, e.g., Redis
Columnar db, e.g., HBase, Cassandra
Document db, e.g., MongoDB
Graph db, e.g., Neo4j [some practical tasks on each]

Projects

get data, transform data

Machine Learning using Scikit-Learn, Keras(w/ Tensorflow), PyTorch ~12h

Data Preperation - preprocessing, tidydata
Training Data / Testing Data / splitting
Supervised / Unsupervised learning
Classification
Clustering
Regression
Dimensionality reduction (curse of dimensionality)
post-processing

Data Visualization Techniques - ~10h

Visualization Libraries in Python, Plotly, matlplotlib
Building your own dashboards with Flask web micro framework
Dashboards with Tableau / PowerBi

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
HomeWork		HomeWork
JSON		JSON
Keras_TensorFlow_Image_Recognition		Keras_TensorFlow_Image_Recognition
Machine_Learning_intro_scikit-learn		Machine_Learning_intro_scikit-learn
NoSQL		NoSQL
NumPy		NumPy
Pandas-CookBook		Pandas-CookBook
Pandas		Pandas
PySpark		PySpark
PyTorch_ML_Library		PyTorch_ML_Library
Python_Core		Python_Core
SCALA		SCALA
SQL		SQL
Titanic		Titanic
Visualizations		Visualizations
WebScraping		WebScraping
data		data
handson-ml		handson-ml
img		img
scikit-learn		scikit-learn
.gitignore		.gitignore
Data_Analysis_Python_Introduction.pdf		Data_Analysis_Python_Introduction.pdf
Git_Workflow.md		Git_Workflow.md
Jupyter_tips.md		Jupyter_tips.md
LICENSE		LICENSE
Python Introduction_01_2020.ipynb		Python Introduction_01_2020.ipynb
Python Introduction_01_2020_in_class.ipynb		Python Introduction_01_2020_in_class.ipynb
Python Learning Resources.ipynb		Python Learning Resources.ipynb
Python Learning Resources.md		Python Learning Resources.md
README.md		README.md
Yak_Shaving.md		Yak_Shaving.md
hw.py		hw.py
requirements.txt		requirements.txt

License

ValRCS/RCS_ML_01_20

Folders and files

Latest commit

History

Repository files navigation

RCS_ML_01_20

Binder(cloud hosted Jupyter notebooks) Beta

Google Colab

Course Plan

Goal

Course Contents (50h)

Workplace Organization(~2h)

Python (~10h)

external Numerical Libraries (~8h)

noSQL databases noSQL: Hbase, MongoDB, Cassandra (~8h)

Projects

Machine Learning using Scikit-Learn, Keras(w/ Tensorflow), PyTorch ~12h

Data Visualization Techniques - ~10h

About

Resources

License

Stars

Watchers

Forks

Languages