Skip to content

b-akshay/marvin

Repository files navigation

Marvin

This is a package implementing semi-supervised learning algorithms, which learn a binary classifier from independently drawn sets of labeled and unlabeled data. This package contains an implementation of the Marvin and Hedge-Mower algorithms in the paper:

Muffled Semi-Supervised Learning. Akshay Balsubramani and Yoav Freund. Link to arXiv.

The "muffled" formulation implemented here gives a rigorous theoretical justification for using the unlabeled data differently from typical approaches. Here, the guidance provided by the labeled data is muffled on the unlabeled data by hallucinating the opposite labels to the majority prediction.

This package is under constant revision and expansion, and so there will be implementation changes from the code used to generate the paper. The performance of the latest pulled version may be better (but not worse) than the paper's reported results.

System Requirements

  • Python v2.7.11, NumPy v1.11.2, SciPy v0.18.1 (All are standard latest stable releases. Earlier versions may work, as the algorithms only use basic matrix and linear algebra functionality.)
  • scikit-learn v0.18.0 (The latest stable release. Versions <=0.17 will not work without change; see the source code comments of composite_feature.py.)

Example Usage

Working examples are provided in scripts, each generating a basic CSV log file (details in header source comments):

  1. slack_minimizer.py: This contains code for a generic classifier (feature) aggregator in the muffled formulation. Included is a class to minimize the slack function, and a working example of such aggregation.

  2. marvin.py: This runs the Marvin family of algorithms for incrementally learning an ensemble of classifiers and simultaneously how to best aggregate them -- a similar concept to supervised boosting. The file contains a class to run such algorithms with, and a working example of its usage.

Implementation Notes:

ssb-benchmarks.py provides code for running the benchmarks we compare against, using scikit-learn to fit and evaluate ensemble and non-ensemble classification methods.

Further Information:

  • For more on the "muffled" approach to semi-supervised learning, please refer to the following papers:

Optimal Binary Classifier Aggregation for General Losses. Akshay Balsubramani and Yoav Freund. NIPS 2016. Link to arXiv. Scalable Semi-Supervised Aggregation of Classifiers. Akshay Balsubramani and Yoav Freund. NIPS 2015. Link to arXiv. Optimally Combining Classifiers Using Unlabeled Data. Akshay Balsubramani and Yoav Freund. COLT 2015. Link to arXiv.

Contact:

Akshay Balsubramani (email listed in the paper and on github).

About

Marvin algorithm for semi-supervised classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published