Python module to impute missing values by prediction using machine learning algorithms.
A full documentation can be found on ReadTheDocs or in docs/_build/html/index.html
. The symlink documentation.html
in the root directory leads to this file.
For additional tutorials and usage scenarios please head over to tutorials
where you'll find a static tutorial version as well as an interactive jupyter notebook.
One essential problem for any person dealing with data is missing values. There are several possibilities to deal with missing information, ranging from dropping data points to estimating the value based on other values in that column (i.e. average or median values). A more recent method involves machine-learning algorithms. This module offers a lightweight Python solution to calculate missing information based on the underlying relationship between data points.
Below are the most important files and a quick one line summary:
docs/
_build/html/index.html
- static documentation
impyte/
impyte.py
- contains main classes
requirements.txt
- requirements file, install dependencies withpip install -r requirements.txt
tests/
testing.ipynb
- interactive testing notebooktesting.html
- html version of jupyter notebooktest_impyte.py
- automated pytest script
tools/
- contains scripts for development (i.e. fake data generation)tutorials/
tutorials.ipynb
- notebook with common tutorial taskstutorials.html
- static html version of notebook
impyte focuses on two main goals:
- Easy to interpret visualization of missing patterns
- Easy imputation of missing values
df = pd.read_csv("missing_values.csv")
imp = impyte.Impyter(df)
# show nan-patterns of data in one data frame
imp.pattern() # shows nan-patterns
# imputation of all single-nans using random forest
imp.impute(estimator='rf')
# imputation of all nan-patterns
imp.impute(estimator='rf', multi_nans=True)
# use f1 and r2 thresholds
imp.impute(estimator='rf', threshold={"r2": .7, "f1_macro": .7})
The current version is a work in progress. If you discover any errors or bugs don't hesitate to reach out!