CS 404/504 – Special Topics: Python Programming for Data Science

University of Idaho - Department of Computer Science

Instructor: Alex Vakanski (vakanski@uidaho.edu)

Teaching Assistant: Longze Li (li8975@vandals.uidaho.edu)

Semester: Fall 2023 (August 21 – December 15)

Course Syllabus

Course website: https://fall-2023-python-programming-for-data-science.readthedocs.io/en/latest/

GitHub repository: https://github.com/avakanski/Fall-2023-Python-Programming-for-Data-Science/blob/main/README.md

Lectures:

Lecture 1 - A Short History and Current State of Artificial Intelligence

Theme 1: Python Programming

Lecture 2 - Data Types in Python
Lecture 3 - Statements, Files
Lecture 4 - Functions, Iterators, Generators
Lecture 5 - Object-Oriented Programming
Lecture 6 - Exceptions, Modules and Packages
Tutorial 1 - Jupyter Notebooks
Tutorial 2 - Terminal and Command Line
Tutorial 3 - Python IDEs, VS Codes

Theme 2: Data Engineering Pipelines

Lecture 7 - NumPy for Array Operations
Lecture 8 - Data Manipulation with Pandas
Lecture 9 - Data Visualization with Matplotlib
Lecture 10 - Databases and SQL
Lecture 11 - Data Exploration and Preprocessing
Lecture 12 - Data Visualization with Seaborn
Tutorial 4 - Virtual Environments
Tutorial 5 - Web Scraping
Tutorial 6 - Google Colab

Theme 3: Model Engineering Pipelines

Lecture 13 - Scikit-Learn Library for Data Sceince
Lecture 14 - Ensemble Methods
Lecture 15 - Artificial Neural Networks
Lecture 16 - Convolutional Neural Networks with Keras and TensorFlow
Lecture 17 - Model Selection, Hyperparameter Tuning
Lecture 18 - Neural Networks with PyTorch
Lecture 19 - Natural Language Processing
Lecture 20 - Transformer Networks
Lecture 21 - NLP with Hugging Face
Lecture 22 - Diffusion Models for Text-to-Image Generation
Lecture 23 - Large Language Models
Tutorial 8 - TensorFlow
Tutorial 9 - PyTorch
Tutorial 10 - TensorFlow Datasets
Tutorial 11 - Experiment Monitoring with CometML
Tutorial 12 - GitHub

Theme 4: Model Deployment Pipelines

Lecture 24 - Introduction to Data Science Operations (DSOps)
Lecture 25 - Deploying Projects as Web Applications
Lecture 26 - Deploying Projects to the Cloud
Lecture 27 - Reproducible Data Science Projects
Tutorial 13 - GitHub Actions

Course Description

The course is designed to introduce students to Python tools and libraries that are commonly used by organizations for managing the various phases in the life cycle of data science projects. The content is divided into four main themes. The first theme reviews the fundamentals of Python programming. The second theme focuses on data engineering and explores Python tools for data collection, exploration, and visualization. The next theme covers model engineering and includes topics related to model design, selection, and evaluation for image processing, natural language processing, and time series analysis. The last theme introduces Data Science Operations (DSOps) and encompasses techniques for model serving, performance monitoring, diagnosis, and reproducibility of data science projects deployed in production. Throughout the course, students will gain hands-on experience with various Python libraries for data science workflow management.

Textbooks

Joel Grus, “Data Science from Scratch: First Principles with Python,” 2nd Edition, O'Reilly Media, 2019, ISBN: 9781492041139.
Chip Huyen, “Designing Machine Learning Systems,” O'Reilly Media, 2022, ISBN: 9781098107963.

Learning Outcomes

Upon the completion of the course, the students should demonstrate the ability to:

Attain proficiency with commonly used Python frameworks for managing the life cycle of data science projects.
Develop pipelines for integrating data from multiple sources, designing predictive models, and deploying the models.
Apply Python tools for data collection, analysis, and visualization, such as NumPy, Pandas, Matplotlib, and Seaborn, to real-world datasets.
Implement machine learning algorithms for image processing, natural language processing, and time series analysis using Python-based frameworks, such as Scikit-Learn, Keras, TensorFlow, and PyTorch.
Understand the principles of model selection and evaluation, including hyperparameter tuning, cross-validation, and regularization.
Understand the primary characteristics of current Python libraries for deployment, continuous integration, and monitoring of data science projects.
Deploy data science projects as web applications using Flask, and to cloud servers using Microsoft’s Azure platform.

Prerequisites

The course requires to have basic programming skills in Python. While having knowledge of data science methods would be advantageous, it is not mandatory.

Grading

Student assessment will be based on 6 homework assignments (worth 60 pts), 3 quizzes (worth 30 marks), and class participation and engagement (worth 10 marks).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS 404/504 – Special Topics: Python Programming for Data Science

Lectures:

Theme 1: Python Programming

Theme 2: Data Engineering Pipelines

Theme 3: Model Engineering Pipelines

Theme 4: Model Deployment Pipelines

Course Description

Textbooks

Learning Outcomes

Prerequisites

Grading

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS 404/504 – Special Topics: Python Programming for Data Science

Lectures:

Theme 1: Python Programming

Theme 2: Data Engineering Pipelines

Theme 3: Model Engineering Pipelines

Theme 4: Model Deployment Pipelines

Course Description

Textbooks

Learning Outcomes

Prerequisites

Grading