Skip to content

Hackathon 2017 base api

Heiko Strathmann edited this page Jan 21, 2018 · 5 revisions

Machine

Supervised learning, unsupervised learning

Methods

  • Machine fit(Features, Labels)

  • Machine fit(Features) # might do "empty" labels

  • Labels predict(Features)

  • no predict_log_proba

  • no fit_predict

Examples With labels

  • SVM
  • KRR
  • KNN
  • Metric Learning

No labels

  • GMM
  • KDE
  • KMeans

Transformer

  • Transformer fit(Features)
  • might have labels here as well
  • Features transform(Features)

Examples

  • PCA

Features

No more splitting of various feautres types. global factory method that generates

  • Features features(Matrix)
  • Features features(SparseMatrix)
  • Features features(FileStream)
  • Features features(ArrowBuffer)
  • Features features(Strings)

Option:

This impacts all downstream API calls for features

  • Features add_transformer(Transformer)

Meta Machines

Wrapper every Machine type in Shogun

Pipeline

To chain preprocessors and machines

Pipeline : Machine

  • Pipeline with(Transformer)
  • Composite composite()
  • Machine then(Machine) # accepts the thing that should be wrapped
trans = ZeroMean()
trans.fit(feats)
svm = SVM()
svm.C = ...

pipeline().with(trans, IS_FITTED).then(svm) # this returns a Machine interface

some cool stuff

Composite : Pipeline

  • Composite with(Machine)
  • Machine then(Machine) # accepts the thing that should be wrapped
pipeline().with(trans)
          .composite()
             .with(kernel_machine('LibSVM'))
             .with(distance_machine('NearestNeighbor)) # averaging multiple predictions
          .then(Bagging) # returns Machine API

Kernel

Stateless

  • matrix(Feautures, Features)
  • matrix(Feautures, Features, idx a, idx b)

Distance

As kernel

Testing

float64_t test(Features, labels) # two/three sample test, independence test - via labels

Not part of interface

  • optimization
  • NNs (just expose fit/preict, but they are actually keras), we have a Machine/Transformer that wraps keras, cool GSoC project. Delete NN code (apart from RBM, DBN)

Distribution

  • fit(Features)
  • log_pdf()
  • score() # gradient of log density
gmm = GMM()
gmm.fit(feats) # runs RM
gmm.predict(feats_trest) # returns cluster index (multiclass)
gmm.as(Distribution).log_pdf(feats_test) + return probabilities

Lazy evaluation of auxiliary methods, e.g. Gaussian process probabilities that are not computed during "fit"

gp.sets("param1", ...)
gp.sets("param2", ...)
gp.train(feats, labels).gets("crazy_covariance")

in jupyter notebook

gp.fit()
>>> GaussianProcessesRegressor(kernel=GaussianKernel(sigma=1), crazy_covariance=Lazy())

hidden in train(): setsMatrix('crazy_covariance', this.computeCrazyCovariance, param1, param2)

GP method

  • Matrix computeCrazyCovariance(param, param)
Clone this wiki locally