Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the undocumented graphical python examples #4989

Merged
merged 25 commits into from
Apr 11, 2020

Conversation

vinnik-dmitry07
Copy link
Contributor

No description provided.

pos = np.array([x_pos, y_pos])
neg = np.array([x_neg, y_neg])
features = RealFeatures( np.array(np.concatenate([pos, neg], 1)) )

features_ = features(np.array(np.concatenate([pos, neg], 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf maybe we should stop from shogun import * to avoid this kind of thing happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes absolutely, we should use import shogun as sg and then sg.features (actually also in meta examples, Ill open an issue)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,32 +1,31 @@
from pylab import figure,pcolor,scatter,contour,colorbar,show,subplot,plot,connect
import matplotlib.pyplot as plt
Copy link
Member

@karlnapf karlnapf Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put changes to python graphical example in a different PR than the C++ code below. EDIT. sorry I only now realised that those are cleanups....fine to have a single PR then

Also, note that graphical python examples are pretty much the bottom of our priority list, so let's get these changes in but let's not do more of these. Instead, I suggest you focus on sending some C++ code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made these changes for vigsterkr can run this example, because there, and I guessed in some other places, was also the problem with const std::shared_ptr<DenseFeatures<float64_t>>&. I mean I did this, not because of the priorities. But now I have understood that it is a SWIG problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing. the patch is welcome, let's get it in, just wanted to make sure you are on the same page as us

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it does make sense to improve the whole listing and make it ready for the new api, so then we don't have to touch it again anytime soon.

import numpy as np
import util
from shogun import features
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I just looked through the changes here, the only functional change is replacing RealFeatures with Features.
The file above has no functional changes as far as I can see.

@@ -1,63 +1,57 @@
from shogun import RealFeatures
from shogun import MulticlassLabels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have a import shogun as sg as per the issue I referenced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and remove all the other imports?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye

@@ -1,10 +1,14 @@
""" Utilities for matplotlib examples """

import pylab
from numpy import ones, array, double, meshgrid, reshape, linspace, \
concatenate, ravel, pi, sinc
from numpy import ones, array, double, meshgrid, linspace, concatenate, ravel, pi, sinc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import numpy as np

@@ -50,8 +50,8 @@ enum EEvaluationMode

/** @brief This class implements the kernel density estimation technique. Kernel density estimation is a non-parametric
* way to estimate an unknown pdf. The pdf at a query point given finite training samples is calculated using the
* following formula : \\
* \f$pdf(x')= \frac{1}{nh} \sum_{i=1}^n K(\frac{||x-x_i||}{h})\f$ \\
* following formula: \n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are centred math tex commands in doxygen, so you dont have to do newlines by hand. See e.g. GaussianKernel.h

 * \f[
 * k({\bf x},{\bf x'})= exp(-\frac{||{\bf x}-{\bf x'}||^2}{\tau})
 * \f]

@@ -40,7 +40,7 @@ namespace shogun
{

/** @brief This class implements KD-Tree.
* cf. http://www.autonlab.org/autonweb/14665/version/2/part/5/data/moore-tutorial.pdf
* cf. https://web.archive.org/web/20140327060654if_/http://www.autonlab.org:80/autonweb/14665/version/2/part/5/data/moore-tutorial.pdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really a good idea? Why not put a reference as a citation (a different/better reference, or wikipedia)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not the author of code, so I decided to put what the author meant when was writing this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that link from the archive works, I do not really investigate content. Sorry, will do better next time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries at all. I just think dead-links should be removed rather than pulled out from the internet archive ;) wikepedia is probably fine here anyways.

from shogun import *
import util

util.set_title('LDA')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the record. next time please make a smaller PR title. This one is way too epic ;)

features=util.get_realfeatures(pos, neg)
lda=LDA(gamma, features, labels)
labels = util.get_labels()
features = util.get_realfeatures(pos, neg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are at it get_real_features

lda=LDA(gamma, features, labels)
labels = util.get_labels()
features = util.get_realfeatures(pos, neg)
lda = LDA(gamma, features, labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

white we are at it. Could you port this to the new API usage, something in the lines of

lda = machine("LDA", gamma=gamma, labels=labels)
lda.train(features)


# train qda
labels = MulticlassLabels( np.concatenate([np.zeros(N), np.ones(N)]) )
labels = MulticlassLabels(np.concatenate([np.zeros(N), np.ones(N)]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sg.labels factory (new api)

pos = np.array([x_pos, y_pos])
neg = np.array([x_neg, y_neg])
features = RealFeatures( np.array(np.concatenate([pos, neg], 1)) )

features_ = features(np.array(np.concatenate([pos, neg], 1)))

lda = MCLDA()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new API

pos = np.array([x_pos, y_pos])
neg = np.array([x_neg, y_neg])
features = RealFeatures( np.array(np.concatenate([pos, neg], 1)) )

features_ = features(np.array(np.concatenate([pos, neg], 1)))

lda = MCLDA()
lda.set_labels(labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use put (or rather a kwarg when using the machine factory)


x1 = np.linspace(x1_min, x1_max, size)
x2 = np.linspace(x2_min, x2_max, size)

x, y = np.meshgrid(x1, x2)

dense = RealFeatures( np.array((np.ravel(x), np.ravel(y))) )
dense = features(np.array((np.ravel(x), np.ravel(y))))
dense_labels = lda.apply(dense).get_labels()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use get

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lda.apply(dense).get('labels')

@vinnik-dmitry07 vinnik-dmitry07 changed the title Fix k-d tree link. Fix formula in KernelDensity description. Remove RealFeatures from util.py, mclda.py. Update matplotlib import in mclda.py. Minor change in mclda.py: reformat. Update matplotlib calls in lda.py. Minor changes to lda.py: reformat. util.py, mclda.py, lda.py recovery Apr 3, 2020
train_features = RealFeatures(X_train)
train_labels = BinaryLabels(y_train)
train_features = sg.features(X_train)
train_labels = sg.BinaryLabels(y_train)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg.labels should work


# create zero mean function
mean = ZeroMean()
mean = sg.ZeroMean()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a factory for mean functions? I think so as we have a gp meta example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find.


# create and train GP classifier, which uses Laplace approximation
gp = GaussianProcessClassification(inf)
gp = sg.GaussianProcessClassification(inf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all those should have a factory, meta examples should guide you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to run this method from the machine? (get_probabilities)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked, it is not possible currently, and we need to refactor this stuff (meta example is also not yet ported to new api I realised). Ignore my comment then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.
I'd like to encourage you (again) to focus on c++ code though. These graphical example are not super important. But since you have made the effort in polishing them, let's finish the job (on those you touched), but please don't add new ones to the list for now.
I made a few comments, mostly regarding factories. Try to use them where you can. If none exist, open an issue for it (you don't have to do it here)

Finally, I think we should add those examples to the CI (with a fake matplotlib), so that they get at least executed. Thoughts @vigsterkr ?

@@ -7,12 +7,9 @@
Kevin Hughes 2013
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: could we group all the ica examples into one? The code is more or less the same. We could just have a loop over all the different ica methods

return data
else:
if type == 'binary':
return sg.BinaryLabels(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels factory pls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then you shouldn't need the if then else, as the factory does that

@@ -29,19 +28,19 @@ void StochasticProximityEmbedding::init()
{
SG_ADD(&m_k, "m_k", "Number of neighbors");
SG_ADD(&m_tolerance, "m_tolerance", "Regularization parameter");
SG_ADD(&m_max_iteration, "max_iteration", "maximum number of iterations");
SG_ADD(&m_num_updates, "m_num_updates", "SPE number of updates");
SG_ADD(&m_max_iteration, "max_iteration", "Maximum number of iterations");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gf712 are we still using SG_ADD; or is the templated watch already there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is mostly ready I think. I can have a look tomorrow!


/** number of apdates per SPE iteration */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"apdates" 😆

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, few more minor fixes and then we can merge it

@vinnik-dmitry07
Copy link
Contributor Author

Sorry, I remember that I do not have to touch more files but to test nothing breaks after changing this: #4989 (comment) I had to update kernel_ridge_regression.

@vinnik-dmitry07
Copy link
Contributor Author

Accidentally-closed

(cherry picked from commit ab53d98)
(cherry picked from commit 65e3ca2)
@vinnik-dmitry07 vinnik-dmitry07 changed the title util.py, mclda.py, lda.py recovery Update the undocumented graphical python examples Apr 4, 2020
lab = sg.labels(Y.flatten())
gk = sg.kernel('GaussianKernel', log_width=width)
gk.init(feat, feat)
krr = sg.machine('KernelRidgeRegression')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use python **kwargs, i.e.
krr = sg.machine('KernelRidgeRregression", labels=lab, kernel=gk, tau=1e-3) (and for all the other instantiations as well). Makes everything look more like Python :)

show()
plt.plot(XE[0], YE, label='test output')
plt.plot([XE[0, 200]], [YE200], '+')
# print(YE[200], YE200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

/** constructor */
StochasticProximityEmbedding();

/** destructor */
virtual ~StochasticProximityEmbedding();
~StochasticProximityEmbedding() override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no virtual destructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but this is a destructor. You are overriding the base class destructor so there will be leaks..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"override" keyword simply means "this function is marked as virtual in some base class"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go for it if you think that is worth it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinnik-dmitry07 could you hold your horses for now and just follow shogun's style? We don't want to mix up ways we do things, as this leads to redundant discussions and compiler warnings. If you are keen in changing the way this is done in shogun, please open an issue and a separate PR that applies what you want to do to all of shogun.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course such a PR is extremely welcome. (Although I suggest you defer this to later and focus on sending something related to your project proposal for now)


/** setter for the maximum number of iterations
*
* @param max_iteration the maximum number of iterations
*/
void set_max_iteration(const int32_t max_iteration);
void set_max_iteration(int32_t max_iteration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the const it causing issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. OK!

@@ -135,39 +137,36 @@ class StochasticProximityEmbedding : public EmbeddingConverter
int32_t get_max_iteration() const;

/** get name */
virtual const char* get_name() const;
const char* get_name() const override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure what our policy here is, we used to keep the virtual. But it is not needed with override I guess.
@gf712 @vigsterkr thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct way to do it, but I think it causes lots of compiler warnings... because we always use virtual... We should write a libtooling script to replace all the virtual get_name with override, or even just use sed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinnik-dmitry07 whilst you are at it can you also fix all of these, please?

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
thanks a lot for the tidy up!

@karlnapf
Copy link
Member

karlnapf commented Apr 8, 2020

somebody should eyeball this and then we can merge it

@gf712
Copy link
Member

gf712 commented Apr 11, 2020

To be on the safe side can you trigger CI, please? Now all the builds are working again!

@gf712 gf712 merged commit 9d625d8 into shogun-toolbox:develop Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants