Skip to content

ValRCS/RCS_Data_Analysis_Python_11_2019

Repository files navigation

RCS_Data_Analysis_Python_11_2019

Python Data Analysis Course 11.2019

Binder(cloud hosted Jupyter notebooks) Beta

Binder

Course Plan

Goal

Build a complete data analysis pipeline using Python ecosystem

  • Define the problem
  • Gather the raw data
  • Process (clean) the data
  • Explore
  • Analysis (apply models, make predictions)
  • Reports and Visual Results in a form understandable to stakeholders

Setup (2h)

  • Git and Github
  • short intro to command line
  • Text Editors
  • Anaconda
  • cloud based tools (Google Colab, myBinder, etc)

General Python Introduction (10h)

  • basic data types
  • working with compound data(slicing)
  • structure (functions, classes, )
  • program flow (conditionals)
  • input/output
  • importing external libraries
  • introduction to NumPy, Pandas

Gathering Data with Python (2-4h)

  • web scraping with Selenium, Beautiful Soup
  • using APIs

Databases

SQL (2-4h)

  • reintroduction to SQL databases
  • ACID compliance

NoSQL (4-6h)

  • NodeJS
  • MongoDB
  • other NoSQL databases

Big Data(2-4h)

  • The 4 Vs - (volume, variety, velocity, veracity)
  • Apache Hadoop Ecosystem
  • Apache Lucene -> Elasticsearch

Cleaning Data (2-6h)

  • advancing your NumPy, Pandas skills

Analysis and Data Exploration(4-10h)

  • Pandas, matplotlib etc

Social Network Analytics

  • Graph Analysis (Network Analysis)

Machine Learning with Python (6-10h)

Note: ML section may be expanded if good progress is made in other sections :)

Principles of ML -

  • test/train data
  • supervised/unsupervised learning
  • classifiers
  • regressors

ML Tools

  • scikit-learn
  • TensorFlow with Keras
  • PyTorch
  • Tesseract for OCR

Visualization (4-6h)

  • PowerBI OR Tableau
  • Python visualization libraries (mathplotlib, Seaborn)
  • Graphviz
  • Dash/Plotly

Useful Python Libraries (2-6h)

  • PDF processing
  • email
  • PyQT
  • nltk

Building a complete data analysis pipeline (4-6h)

  • Course Project

Tools of the trade:

Anaconda Distribution(Python, R and more) https://www.anaconda.com/download/

About

Python Data Analysis Course 11.2019

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages