Skip to content

Datapolitan-Training/data-analysis-python

Repository files navigation

Binder

Summary

  • A full-day course covering the key concepts of how to leverage the Python programming language for data analysis using open data. The course will cover the basic syntax of Python as it relates to performing basic exploratory data analysis, as well as how to create impactful charts, graphs, and other information visualizations using NYC Open Data for operational decision making.

Terminal Learning Objectives

  • Participants will understand what Python is and why it's useful
  • Participants will understand how Python structures data, and why that's different than Excel
  • Participants will open a dataset in Python and shape into a usable structure for analysis
  • Participants will create a visualization and calculate summary statistics of a dataset in Python
  • Participants will be exposed to elementary programming concepts and supplementary programming libraries in Python
  • Participants will apply skills to conduct a simple analysis of a dataset from the NYC Open Data Portal
  • Participants will model how Python can be used to build a data-driven culture in their workplace

Key Audience

  • Analysts working in city government with basic programming knowledge and/or experience performing advanced analysis in Excel (nested formulas with conditionals, PivotTables, and macros)

Outline

  • Introduction
    • Class Schedule and Expectations
    • Housekeeping
    • What is Analysis?
  • Getting Started
    • Using Jupyter Notebook
    • Python Syntax
    • Functions and packages
  • Old Faithful
    • Import, inspect data
    • Referencing columns
    • Making a histogram
    • 5 Data Analytics tasks
    • Filtering rows
    • Sorting
  • Wrap-up
  • BREAK: 15 minutes
  • What is Python?
    • Python vs Excel
    • NOLA modeling example
  • 311 data
    • Open 311 Dataset in Python
    • Summary statistics on 311 data
    • Data Structures and Types
    • Making simple maps
  • Today's Analysis
    • Introduce 311 derelict vehicles question
  • Wrap-up
  • LUNCH: 1 hour
  • Aggregating data
    • Calculate Summary Statistics with groupby
    • Function chaining
  • More filtering
    • Selecting multiple columns
    • Row filters with multiple conditions
    • Row filters with fuzzy matching on text
  • BREAK: 15 minutes
  • Final exercise
    • Think about a question that could be answered with 311 data
    • Work on question in a new notebook
  • Debugging
    • Understand Difference Between Syntax and Semantic Errors
    • Take a look at someone else's code, try to understand it and help debug if possible
  • Show and tell
  • Wrap Up
    • How is Python Different Than Excel?
    • What have/haven't we covered?
  • Resources