MIQUBO Method of Feature Selection

The demo illustrates the MIQUBO method by finding an optimal feature set for predicting survival of Titanic passengers. It uses records provided in file formatted_titanic.csv, which is a feature-engineered version of a public database of passenger information recorded by the ship's crew (in addition to a column showing survival for each passenger, it contains information on gender, title, class, port of embarkation, etc). Its output is a ranking of subsets of features that have high MI with the variable of interest (survival) and low redundancy.

Note: This example solves a binary quadratic model (BQM) directly on a D-Wave quantum computer's quantum processing unit (QPU). The Feature Selection for CQM example solves this same problem using a constrained quadratic model on a Leap™ quantum-classical hybrid solver.

Usage

python titanic.py

Code Overview

Statistical and machine-learning models use a set of input variables (features) to predict output variables of interest. Feature selection, which can be part of the model design process, simplifies the model and reduces dimensionality by selecting, from a given set of potential features, a subset of highly informative ones. One statistical criterion that can guide this selection is mutual information (MI).

Ideally, to select the k most relevant features, you might maximize I(Xs;Y), the MI between a set of k features, Xs, and the variable of interest, Y. This is a hard calculation because the number of states is exponential with k.

The Mutual Information QUBO (MIQUBO) method of feature selection formulates a quadratic unconstrained binary optimization (QUBO) based on an approximation for I(Xs; Y), which is submitted to the D-Wave quantum computer for solution.

Code Specifics

MIQUBO

There are different methods of approximating the hard calculation of optimally selecting k of n features to maximize MI. The approach followed here assumes conditional independence of features and limits conditional MI calculations to permutations of three features. The optimal set of features, S, is then approximated by:

The left-hand component, I(Xi;Y), represents MI between the variable of interest and a particular feature; maximizing selects features that best predict the variable of interest. The right-hand component, I(Xj;Y |Xi), represents conditional MI between the variable of interest and a feature given the prior selection of another feature; maximizing selects features that complement information about the variable of interest rather than provide redundant information.

This approximation is still a hard calculation. MIQUBO is a method for formulating it for solution on the D-Wave quantum computer based on the 2014 paper, Effective Global Approaches for Mutual Information Based Feature Selection, by Nguyen, Chan, Romano, and Bailey published in the Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining.

References

X. V. Nguyen, J. Chan, S. Romano, and J. Bailey, "Effective global approaches for mutual information based feature selection", https://dl.acm.org/citation.cfm?id=2623611

License

Released under the Apache License 2.0. See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.circleci		.circleci
.devcontainer		.devcontainer
data		data
readme_imgs		readme_imgs
tests		tests
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
titanic.py		titanic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.circleci

.circleci

.devcontainer

.devcontainer

data

data

readme_imgs

readme_imgs

tests

tests

CODEOWNERS

CODEOWNERS

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

titanic.py

titanic.py

Repository files navigation

MIQUBO Method of Feature Selection

Usage

Code Overview

Code Specifics

MIQUBO

References

License

About

Releases

Packages

Contributors 9

Languages

License

dwave-examples/mutual-information-feature-selection

Folders and files

Latest commit

History

Repository files navigation

MIQUBO Method of Feature Selection

Usage

Code Overview

Code Specifics

MIQUBO

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages