Skip to content

GSoC_2018_project_modelselection

Heiko Strathmann edited this page Jan 22, 2018 · 2 revisions

Flexible modelselection

Following up on one of our very first GSoC (2011) projects, this project intends to clean up, unify, extend, and scale-up Shogun's modelselection and hyper-parameter tuning framework. It is a cool mixture of modernizing existing code, using multi-threaded (and potentially distributed) concepts, and playing with black-box optimization frameworks.

Mentors

Difficulty & Requirements

Medium to advanced. Depends on ambitions, but we are flexible on student's abilities.

You need to know about

  • Modelselection basics (x-validation, search-algorithms, implementation)
  • Shogun's modelselection framework
  • Shogun's parameter framework
  • C++
  • Optimisation frameworks like MOE or cma-es
  • Knowledge of other libraries' approaches (sklearn, MLPack)

Details

X-validation v2

Every learning algorithm (CMachine subclass) should work with x-validation ... fast! This is completely independent of any hyper-parameter tuning.

  • All model classes should be systematically tested with x-validation, see issue. This is similar to the trained model tests.
  • Identify models that do only perform read-only operations on the features (this will be all models later, depending on the progress offeatures-detox project).
  • Enable multi-core x-validation using openmp, via cloning of the underlying learning machine, but with shared features (memory efficiency!).
  • Carefully test the chosen models for race-conditions, memory errors, etc.
  • Add algorithms on a one-by-one basis.
  • Generalise code of the "trained model serialization" tests to a "trained model" tests, where multiple things can be checked for the trained models (serialization, x-validation for now).
  • Make sure model-selection has a progress bar, is stoppable, continue-able, etc. See also the black-box project

Cleaning up after the old parameter framework

We recently changed our internal parameter framework ... or more: we are in the progress of changing it. The new framework is cleaner, neater, and as such easier to handle.

Before we start tuning parameters in an automatic way, we need to remove the traces of the old framework. This is a messy task that requires diving deeply into the Shogun core, but don't worry -- we will help you :)

  • Remove the m_modelselection_parameters field from CSGObject
  • This will break many things (most of all the current model-selection framework). Fix the problems (good initial task)
  • Remove the TParameter construct and eventually the class Parameter

A clean API

We want to build a better way to specify free parameters to learn, which overlaps with the user experience project. The current way is to build parameter trees whose structure matches the learning machine, see e.g. here We would like to shop around other libraries for ideas on specifying this.

Sergey, could you put some API ideas here?

Some steps:

  • Review and compare other libraries ' approaches
  • Collect the most common use cases (random search, grid-search, gradient search (e.g. in our Gaussian Process framework))
  • Come up with a set of clean API examples / user stories for those cases
  • Draft code how to implement this API. This will include ways to annotate the spaces that parameters live in, as well as whether gradients are available.
  • Implement and test systematically
  • Make sure it works nicely in all target languages.

Black box optimisation

Bayesian optimisation and stochastic optimisation are powerful frameworks for blackbox optimisation. We aim to integrate bindings for both during the project. There is plenty of external libraries that do the algorithms for us, so this task is mostly about designing interfaces that tell Shogun to cross-validate the algorithm on the next set of parameters and reporting its performance. We aim for both MOE and CMA-ES.

Why this is cool

There is hardly any algorithm without free parameters. Currently Shogun only has brute force search to tune them automatically. While this works for SVMs, it it hopeless for anything more than 2 parameters. Certainly, a clean and easy way to quickly tune parameters would massively boost Shogun's usability. The project spans a huge range on topics within and outside of Shogun, including framework internals as well as cutting edge algorithms for optimisation. Super interesting even for ourselves. Be ready to learn a lot.

Useful resources

Clone this wiki locally