Skip to content

mdsmith44/Army_ORSA_Python_Data_Analysis_Tutorial

Repository files navigation

Army ORSA Python Data Analysis Tutorial Outline

Lesson 1: Intro to Python and Jupyter Notebooks

  • Why Python
  • Jupyter Notebook Environment
  • Basic Python Syntax: code syntax, comments, assigning variables, installing packages
  • Data Types
    • Individual Types: numbers, strings, booleans, None type, Type-Casting
    • Composite Data Structures: lists, tuples, sets, dicts, comprehensions, Oh My..
  • String Matching: regular expression, fuzzy matching
  • Code Structures
    • if-else controls
    • for/while loops
    • try-except
    • functions
    • classes
  • Numpy Basics

Lesson 2: Data Management with Pandas

  • Pandas Data Structures: DataFrame and Series
    • Pandas Overview
    • DataFrame and Series Objects
    • Working with Data by Index and Columns
    • Removing Data
    • Multi-Level Indexing
    • Common Series Methods
    • Filtering and Sorting Data
  • Getting Data
    • From Files
    • From User Generated Data
    • Webscraping
    • Writing and Saving Data
  • Cleaning Data
    • Handling Nulls
    • Using Where Functions
    • String Matching and Cleaning
  • Data Wrangling and Manipulation
    • Applying Functions and Transformations
    • Joining/Merging Data Sets
    • Aggregating Data: Groupby and Pivot Tables
  • Handling Dates and Times

Lesson 3: Data Visualization

  • Matplotlib
    • Quick Matplotlib Orientation
    • Figures, Axes, and Subplots
    • Common Plot Types
    • Labeling Plots
  • Plotting with Pandas
    • Pandas Plotting Overview
    • Example: Plotting GFEBS Data
  • Seaborn
  • Interactive Graphics with Bokeh
    • Example: COVID19 Dashboard
  • Making GIFs

Lesson 4: Machine Learning and Other Topics

  • Machine Learning with scikit-learn
    • Scikit-Learn Overview: Linear Regression
    • Data Prep
      • Data Reduction with PCA
      • Train-Test Splits
    • Classification
      • Multi-Layer Perceptron Neural Nets (MLPs)
      • Logistic Regression
      • Random Forest
      • Decision Trees
      • K Nearest Neighbors
    • Cross Validation with GridSearch
    • Unsupervised Learning / Clustering
  • Other Data Analysis Tools
    • Natural Lanugage Processing with nltk
    • Network Analysis with networkx
    • Statistical Modeling and Time Series Analysis with statsmodel
    • Optimization
    • Simulation
    • Association Analysis

References

With an active development community, there are now many resources and references avaialable to learn how to do data analysis in python. Depending on your learning style, level of interest, and specific area of interest, here are some resources you may find useful.

Many good textbooks are available on the O'Reily Learning (formerly called Safari Online Books) platform, which is available free to those with a .mil email address through the DoD MWR Online Digital Library: https://www.militaryonesource.mil/recreation-travel-shopping/recreation/libraries/morale-welfare-and-recreation-digital-library. Here are some great python data analysis references on that platform. If there's another area you are interested in, it's worth searching the O'Reily library to see if they have a book on that topic.

Another great reference is Coursera, specifically the sequence Applied Data Science with Python specialization (https://www.coursera.org/specializations/data-science-python). It contains the following 5 courses:

  • Introduction to Data Science with Python (covers pandas)
  • Applied Plotting, Charting & Data Representation in Python (mainly focuses on matplotlib)
  • Applied Machine Learning in Python
  • Applied Text Mining in Python
  • Applied Social Network Analysis in Python

All the courses are great, and also will give you some useful code you can re-use. All the courses are also conducted in Jupyter environment which is great practice.

Some other freely available resources that could be useful:

  • W3 Python Tutorial (https://www.w3schools.com/python/). A good free tutorial, which also makes for a nice easilly accessible reference.
  • An Introduction to Statistical Learning: https://link.springer.com/book/10.1007/978-1-4614-7138-7. Technically this one gives its examples in R, but it's a great overview of the math and theory of machine learning topics so can't make a list of Data Analysis references without it!
  • Kaggle Courses (https://www.kaggle.com/learn/overview) offers a variety of useful and succinct lessons covering many areas of data analysis in python. All courses are presented through interactive Jupyter notebooks, similar to this tutorial.

Any and all feedback welcome!
-Matt Smith, mdsmith44@gmail.com

About

Provide interactive notebooks to learn and practice python data analysis tools.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published