Marvin

This is a package implementing semi-supervised learning algorithms, which learn a binary classifier from independently drawn sets of labeled and unlabeled data. This package contains an implementation of the Marvin and Hedge-Mower algorithms in the paper:

Muffled Semi-Supervised Learning. Akshay Balsubramani and Yoav Freund. Link to arXiv.

The "muffled" formulation implemented here gives a rigorous theoretical justification for using the unlabeled data differently from typical approaches. Here, the guidance provided by the labeled data is muffled on the unlabeled data by hallucinating the opposite labels to the majority prediction.

This package is under constant revision and expansion, and so there will be implementation changes from the code used to generate the paper. The performance of the latest pulled version may be better (but not worse) than the paper's reported results.

System Requirements

Python v2.7.11, NumPy v1.11.2, SciPy v0.18.1 (All are standard latest stable releases. Earlier versions may work, as the algorithms only use basic matrix and linear algebra functionality.)
scikit-learn v0.18.0 (The latest stable release. Versions <=0.17 will not work without change; see the source code comments of composite_feature.py.)

Example Usage

Working examples are provided in scripts, each generating a basic CSV log file (details in header source comments):

slack_minimizer.py: This contains code for a generic classifier (feature) aggregator in the muffled formulation. Included is a class to minimize the slack function, and a working example of such aggregation.
marvin.py: This runs the Marvin family of algorithms for incrementally learning an ensemble of classifiers and simultaneously how to best aggregate them -- a similar concept to supervised boosting. The file contains a class to run such algorithms with, and a working example of its usage.

Implementation Notes:

ssb-benchmarks.py provides code for running the benchmarks we compare against, using scikit-learn to fit and evaluate ensemble and non-ensemble classification methods.

Further Information:

For more on the "muffled" approach to semi-supervised learning, please refer to the following papers:

Optimal Binary Classifier Aggregation for General Losses. Akshay Balsubramani and Yoav Freund. NIPS 2016. Link to arXiv. Scalable Semi-Supervised Aggregation of Classifiers. Akshay Balsubramani and Yoav Freund. NIPS 2015. Link to arXiv. Optimally Combining Classifiers Using Unlabeled Data. Akshay Balsubramani and Yoav Freund. COLT 2015. Link to arXiv.

Contact:

Akshay Balsubramani (email listed in the paper and on github).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
LICENSE		LICENSE
README.md		README.md
SlackMinimizerRunner.ipynb		SlackMinimizerRunner.ipynb
composite_feature.py		composite_feature.py
marvin.py		marvin.py
muffled_utils.py		muffled_utils.py
slack_minimizer.py		slack_minimizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

LICENSE

LICENSE

README.md

README.md

SlackMinimizerRunner.ipynb

SlackMinimizerRunner.ipynb

composite_feature.py

composite_feature.py

marvin.py

marvin.py

muffled_utils.py

muffled_utils.py

slack_minimizer.py

slack_minimizer.py

Repository files navigation

Marvin

System Requirements

Example Usage

Implementation Notes:

Further Information:

Contact:

About

Releases

Packages

Languages

License

b-akshay/marvin

Folders and files

Latest commit

History

Repository files navigation

Marvin

System Requirements

Example Usage

Implementation Notes:

Further Information:

Contact:

About

Resources

License

Stars

Watchers

Forks

Languages