Markov chain Monte Carlo with PyMC3

Chris Fonnesbeck

PyData London 2019 Tutorial

Bayesian methods are powerful tools for data science applications, complimenting traditional statistical and machine learning methods. Importantly, Bayesian models generate predictions and inferences that fully account for uncertainty. The main tool for conducting Bayesian analysis is Markov chain Monte Carlo (MCMC), a computationally-intensive numerical approach that allows a wide variety of models to be estimated. MCMC algorithms are available in several Python libraries, including PyMC3. I will teach users a practical, effective workflow for applying Bayesian statistics using MCMC via PyMC3 using real-world examples.

This tutorial is intended for analysts, data scientists and machine learning practitioners. Anyone looking for effective ways of making predictions and obtaining inference from datasets should find it useful. The material will assume an intermediate level of Python familiarity. Ideally, attendees should be familiar with Numpy and Jupyter. There is no expectation of students having a statistical background. Having completed the tutorial, students should be able to build basic Bayesian statistical models using their own data, validate those models, and interpret their output.

Outline

Introduction to Bayes and PyMC3
- What is a Baysian statistical model?
- The Bayesian workflow in three steps
- A high-level introduction to the PyMC3 API
- Motivating examples
Markov chain Monte Carlo
- Why is Bayesian analysis hard?
- If you can't calculate, simulate!
- The Metropolis algorithm
- A better way: Hamiltonian Monte Carlo
Building and Fitting Models with PyMC3
- Stochastic variables
- Custom distributions
- Deterministic variables
- Factor potentials
- MCMC sampling with step methods
Model Checking and Diagnostics
- Convergence diagnostics
- Autocorrelation
- Diagnostics for gradient-based samplers
- Posterior predictive checks

Setup

This tutorial assumes that you have Anaconda (Python 3.7 version) setup and installed on your system.

The next step is to clone or download the tutorial materials in this repository. If you are familiar with Git, run the clone command:

git clone https://github.com/fonnesbeck/mcmc_pydata_london_2019.git

otherwise you can download a zip file of its contents, and unzip it on your computer.

The repository for this tutorial contains a file called environment.yml that includes a list of all the packages used for the tutorial. If you run:

conda env create

from the main tutorial directory, it will create the environment for you and install all of the packages listed. This environment can be enabled using:

conda activate mcmc_tutorial

Then, I recommend using JupyterLab to access the materials:

jupyter lab

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Markov chain Monte Carlo with PyMC3

Chris Fonnesbeck

PyData London 2019 Tutorial

Outline

Setup

About

Releases

Packages

Languages

License

fonnesbeck/mcmc_pydata_london_2019

Folders and files

Latest commit

History

Repository files navigation

Markov chain Monte Carlo with PyMC3

Chris Fonnesbeck

PyData London 2019 Tutorial

Outline

Setup

About

Resources

License

Stars

Watchers

Forks

Languages