Stein Thinning

This Python package implements an algorithm for optimally compressing sampling algorithm outputs by minimising a kernel Stein discrepancy. Please see the accompanying paper "Optimal Thinning of MCMC Output" (arXiv) for details of the algorithm.

Installing via Git

One can pip install the package directly from this repository:

pip install git+https://github.com/wilson-ye-chen/stein_thinning

Getting Started

For example, correlated samples from a posterior distribution are obtained using a MCMC algorithm and stored in the NumPy array smpl, and the corresponding gradients of the log-posterior are stored in another NumPy array grad. One can then perform Stein Thinning to obtain a subset of 40 sample points by running the following code:

from stein_thinning.thinning import thin
idx = thin(smpl, grad, 40)

The thin function returns a NumPy array containing the row indices in smpl (and grad) of the selected points. Please refer to demo.py as a starting example.

The default usage requires no additional user input and is based on the identity (id) preconditioning matrix and standardised sample. Alternatively, the user can choose to specify which heuristic to use for computing the preconditioning matrix by setting the option string to either id, med, sclmed, or smpcov. Standardisation can be disabled by setting stnd=False. For example, the default setting corresponds to:

idx = thin(smpl, grad, 40, stnd=True, pre='id')

The details for each of the heuristics are documented in Section 2.3 of the accompanying paper.

PyStan Example

As an illustration of how Stein Thinning can be used to post-process output from Stan, consider the following simple Stan script that produces correlated samples from a bivariate Gaussian model:

from pystan import StanModel
mc = """
parameters {vector[2] x;}
model {x ~ multi_normal([0, 0], [[1, 0.8], [0.8, 1]]);}
"""
sm = StanModel(model_code=mc)
fit = sm.sampling(iter=1000)

The bivariate Gaussian model is used for illustration, but regardless of the complexity of the model being sampled the output of Stan will always be a fit object (StanFit instance). The sampled points and the log-posterior gradients can be extracted from the returned fit object:

import numpy as np
smpl = fit['x']
grad = np.apply_along_axis(fit.grad_log_prob, 1, smpl)
idx = thin(smpl, grad, 40)

The above example can be found in pystan/demo.py.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
stein_thinning		stein_thinning
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stein_thinning

stein_thinning

.gitignore

.gitignore

README.md

README.md

setup.py

setup.py

Repository files navigation

Stein Thinning

Installing via Git

Getting Started

PyStan Example

About

Releases

Packages

Languages

wilson-ye-chen/stein_thinning

Folders and files

Latest commit

History

Repository files navigation

Stein Thinning

Installing via Git

Getting Started

PyStan Example

About

Resources

Stars

Watchers

Forks

Languages