GitHub - UBC-MDS/PyPunisher: A Python package that performs stepwise forward and backward feature selection

PyPunisher

PyPunisher is a Python implementation of forward and backward feature selection. Feature selection, or stepwise regression, is a key step in the data science pipeline that reduces model complexity by selecting the most relevant features from the original dataset. This package implements two stepwise feature selection methods:

forward_selection(): starts with a null model and iteratively adds useful features
backward_elimination(): starts with a full model and iteratively removes the least useful feature at each step

These methods are greedy search algorithms that yield a nested subset of features. The size of the final feature subset depends on what you define as your "stopping criterion". The stopping criterion can be either a threshold that you define, or a pre-defined number of features to include in your model. For example, if you set your stopping criterion to be a threshold (min_change), then the feature selection process will stop when the AIC or BIC score no longer improves by that thresholded interval. Alternatively, if you want a specific number of features in your model, then the process will stop once it reaches n_features.

In order to measure model quality during the selection procedures, we have also implemented the Akaike and Bayesian Information Criterion, both of which punish complex models:

aic(): computes the Akaike Information Criterion (AIC)
bic(): computes the Bayesian Information Criterion (BIC)

In general, having more parameters in your model increases prediction accuracy but is highly susceptible to overfitting. AIC and BIC add a penalty for the number of features in a model. This penalty term is larger in BIC than in AIC. A lower AIC or BIC score indicates a better fit for the data, relative to competing models.

Installation

pip3 install git+git://github.com/UBC-MDS/PyPunisher@master

Requires Python 3.6+.

Documentation

The documentation for PyPunisher can be viewed here.

How to run unit tests

From root directory, run all test files in terminal:

python -m pytest

You also have the option to run individual test files by referencing its path. For example:

python -m pytest tests/test_forward_selection.py

Contributions

Instructions and guidelines on how to contribute can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
docs		docs
pypunisher		pypunisher
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
coverage.xml		coverage.xml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

pypunisher

pypunisher

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

CONDUCT.md

CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

coverage.xml

coverage.xml

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

PyPunisher

Installation

Documentation

How to run unit tests

Contributions

About

Releases 4

Packages

Contributors 3

Languages

License

UBC-MDS/PyPunisher

Folders and files

Latest commit

History

Repository files navigation

PyPunisher

Installation

Documentation

How to run unit tests

Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Languages