-
-
Notifications
You must be signed in to change notification settings - Fork 1k
GSoC_2017_applications
This year, we would like to try something new: application based projects that are more focusing on using Shogun rather than modifying it. In practice, they might be a mix of the two. The idea is that the projects are stand-alone and the result is something really cool.
Totally depends on what you are after.
Most important are
- that you are extremely motivated and ready to work independently
- some Shogun and Machine Learning basics
- Knowledge of the context of the chosen application
Every line of code in SHOGUN has a long history and have gone through many brains and hands. This made SHOGUN what it is today: a powerful toolbox with a lot of features. But most of the code has been written by researchers for their studies. Usually the focus is on "getting things done", proving awesome ideas and optimize them "as fast as possible".
As a drawback, people didn't care too much about software engineering aspects. In addition, lots of new technologies have shown up since some parts of the code have been written, which allows us to do even cooler things with less code now.
We want this project to improve maintainability, stability, and beauty:
- Making heavily used base classes more lightweight to improve performance and memory consumption.
- Use new and cool technologies
- New language features (think of C++1x)
- and more
The target group of this project are people with C/C++ background, an idea about "good software" engineering, and reliable software. In return we offer that you'll learn a lot about basic machine learning algorithms; of course there are some low-hanging fruits, but if you're an advanced hacker, we have a lot of great ideas how to push the project forward.
GSoC is a marathon, not a sprint. We expect "good" performance over the whole project and to stay in contact with us. Get on board and commit to contribute actively and we'll promise to bring you on speed with magic internals that are hidden in SHOGUN. :)
Here are some sub-projects. We are open for more:
tl;dr: Dirty work with binary data. Beat the NIH out of here!
- Working title: Dirty deeds done with with binary data.
- Alternative working title: Beat the NIH out of here!
Last year, we implemented a new cerealisation framework, which needs some love. And the old one needs to die! Deep-copy of objects? Checking equality? Dump objects to disk and get 'em back? All done used to be done in here. Thousands lines of code, uncountable many switch-case statements, and more special per-class and per-data-type code than we want to maintain. Only one good reason why we didn't tackle it yet: It used to work working.
tl;dr: We want to stop making use of SG_REF
, but use c++11 magic instead.
tl;dr: We want to get rid of the old threading code that is: unusable, unmaintainable, and uncool. Replace with openmp or similar.
tl;dr: We want to have unified progress bars in Shogun (using SG_PROGRESS
). It should be possible to prematurely stop algorithms in Shogun (and still getting some results if that makes sense).
Shogun has many many bugs, we could actually fix some of them. Pick your favourite! https://github.com/shogun-toolbox/shogun/labels/BUG https://github.com/shogun-toolbox/shogun/labels/bugfixing
tl;dr: File IO and parsing done right using modern C++.
SHOGUN contains tons (how many lines?) of code to just parse input data formats. The code is basically working, some of parsers have minor bugs, most of them read like "C89 with classes", and static code analysis tells us we need to do something here.
Lot things possible here: refactorings, deduplication, new API, make it less code, make it less NIH.
tl;dr: Being a software architect.
The foundation of every learning problem is data structures to be used by all algorithms. Dense/Sparse Features, for instance, or Dense/Sparse Streaming... duplicated functionality, special handling of feature classes in algorithm code; online algorithms not possible on non-stream features.
Buzzword bingo: Separation of concerns; finding invariants in the existing classes; redesign of features APIs; going back to the board and analyze what's really needed; gain flexibility.
What's to be done here depends on you. The minimal goal is a small prototype to prove the idea of the topic you are working on. The full-fletched solution is, well, you guessed it: Hard work and lot of fame.
Whatever you can imagine
It attempts to improve one of the biggest open problems we have in Shogun: Being unable to move because of being chained the framework. A modern, slim Shogun is the dream of every of our developers :)
- All core developers
Github issues, in particular
- https://github.com/shogun-toolbox/shogun/issues/1991
- https://github.com/shogun-toolbox/shogun/issues/2824
- https://github.com/shogun-toolbox/shogun/issues/3605
- https://github.com/shogun-toolbox/shogun/issues/3509
- ...
Data structures:
- CSGObject
- openmp and pthreads
- tags project from last year
- Parameter.cpp
- TParameter
- all FileReader instances
- CFeatures
Get back to the main projects page.