Skip to content

euxhenh/phenotype-cover

Repository files navigation

This repository provides two algorithms for the phenotype cover (PC) biomarker selection problem introduced in the paper: "Multiset multicover methods for discriminative marker selection". GreedyPC is based on the extended greedy algorithm to set cover, and CEM-PC is based on the cross-entropy-method.

Install via

pip install multiset-multicover
pip install phenotype-cover

Other packages that phenotype-cover depends on are numpy, matplotlib, and scikit-learn.

Import GreedyPC or CEMPC from phenotype_cover.

Example

>>> from phenotype_cover import GreedyPC >>> from sklearn.datasets import make_classification

>>> # You may need to log-transform X if working with raw counts >>> X, y = make_classification(1000, 200, n_informative=5, n_classes=5, scale=100)

>>> gpc = GreedyPC() >>> gpc.fit(X, y) >>> features = gpc.select(100) # coverage of 100

Some other functionality implemented in GreedyPC

>>> # Number of elements reamining and coverage attained after every iteration >>> gpc.plot_progress() >>> gpc.n_elements_remaining_per_iter >>> gpc.coverage_per_iter

>>> # Heatmap of the coverage provided by some feature i >>> gpc.feature_coverage(i)

>>> # Maximum possible coverage for evey class pair >>> gpc.max_coverage()

>>> # Pairs that could not be covered to the desired coverage >>> gpc.pairs_with_incomplete_cover