Skip to content

Missing data amputation and exploration functions for Python

License

Notifications You must be signed in to change notification settings

RianneSchouten/pyampute

Repository files navigation

pyampute

made-with-python code-coverage

A Python library for generating missing values in complete datasets (i.e. amputation) and exploration of incomplete datasets.

Check out the documentation and find examples!

Features

Amputation is the opposite of imputation: the generation of missing values in complete datasets. This is useful for evaluating the effect of missing values in your model, mostly in experimental settings, but also as a preprocessing step in developing models.

Our MultivariateAmputation class is compatible with the scikit-learn-style fit and transform paradigm and can be used in a scikit-learn Pipeline.

The underlying methodology has been proposed by Schouten, Lugtig and Vink (2018) and has been implemented in an R-function as well: mice::ampute. Compared to ampute, pyampute's parameters are easier to specify and allow for more variation. See this blogpost to learn more.

import numpy as np
from pyampute.ampute import MultivariateAmputation
n = 1000
m = 10
rng = np.random.default_rng()
X_compl = rng.standard_normal((m, n))
ma = MultivariateAmputation()
X_incompl = ma.fit_transform(X_compl)

Among others, we also provide an mdPatterns class, which displays missing data patterns in incomplete datasets.

from pyampute.exploration.md_patterns import mdPatterns
mdp = mdPatterns()
patterns = mdp.get_patterns(X_incompl)

Installation

Python Package Index (PyPI)

pip install pyampute

From source

git clone https://github.com/RianneSchouten/pyampute.git
pip install ./pyampute

License

BSD 3-Clause License

Citation

@misc{schouten_rianne_m_2022_6946887,
author       = {Schouten, Rianne M and
               Zamanzadeh, Davina and
               Singh, Prabhant},
title        = {pyampute: a Python library for data amputation},
month        = aug,
year         = 2022,
publisher    = {Zenodo},
doi          = {10.25080/majora-212e5952-03e},
url          = {https://doi.org/10.25080/majora-212e5952-03e}
}

@article{Schouten2018,
title={Generating missing values for simulation purposes: {A} multivariate amputation procedure},
author={Schouten, Rianne M. and Lugtig, Peter and Vink, Gerko},
journal={Journal of Statistical Computation and Simulation},
volume={88},
number={15},
pages={2909--2930},
year={2018}
}

Watch our SciPy'22 presentation here.

Contact details

For questions, comments and if you would like to contribute, please do not hesitate to contact us. You can find our contact details here.

Cheers,

About

Missing data amputation and exploration functions for Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages