Add callback to monitor the training process #55

shrit · 2020-02-20T13:46:24Z

Signed-off-by: Omar Shrit shrit@lri.fr

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

shrit · 2020-02-23T14:55:40Z

@zoq Would you think that it is better to have some program options in this example, just to allow the users to pass the path of the model dynamically instead of compiling the model each time. It seems that model zoo requires boost program options as a dependency. If it is the case, I can use it to so that, or I can do it old fashin (argc, argv) function?

zoq · 2020-02-24T00:01:32Z

I think that is a great idea, I guess, we can also use mlpack's CLI functionality for the same. I guess that would bring us one step closer to expose those models to other languages as well (bindings for python, julia, etc.).

shrit · 2020-02-24T20:19:24Z

Great, I will push as soon as possible with mlpack's CLI

kartikdutt18 · 2020-02-26T14:42:35Z

@shrit, will you be pursing the same for LSTM or do you mind if I take that up?

shrit · 2020-02-26T17:34:48Z

I am going to finish only the Kaggle models. You can start working on LSTM

kartikdutt18 · 2020-02-26T17:35:54Z

Great, I hope to open a PR soon.

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

shrit · 2020-02-29T23:23:01Z

@zoq I think there is something missing with the actual CMake. Since I have added the mlpack:CLI It was not possible to link with boost program option, (my version is 1.71.0). CMake was able to find the boost headers, but not the implementations.
In order to compile on my machine, I had to use set(program_options "/usr/lib64/libboost_program_options.so") and link with program_options
Also travis is complaining about the same issue.

rcurtin · 2020-03-05T14:48:46Z

Ahhh, sorry to be a little late to the party here and reply after a lot of the code has already been done. At least my understanding of the models repo is that it contained self-contained examples that people could use and adapt for their tasks. Using the mlpack CLI framework, in my opinion, makes it a good bit harder to actually do that---instead of being able to just copy-paste the code in the example that concerns the model training, now they have to understand what all the CLI stuff is. Also, it would be a lot harder to compile the program by hand, since the BINDING_TYPE macro would need to be defined correctly.

I guess, what I mean is, the use of the CLI class here does mean that we can nicely compile into a program that can do digit recognition with various parameters. But I think that more people are using this as a simple example, moreso than actually doing digit recognition with it.

I think that it would be better to have a "generic" binding in the main mlpack repository that can work with neural networks of arbitrary shape and size (defined in a text file or something). There's an issue open for that: mlpack/mlpack#1254. Then, the models repository can primarily be a set of minimal-code examples that people can adapt into their own work.

shrit · 2020-03-05T15:48:19Z

@rcurtin I understand.
No problem, I can remove mlpack::CLI. However, it is a little bit strange to put the path to the training dataset directly in the code, the user will have to re-compile each time for a new path, no forgetting that the compilation time is considerable.
We can at least use the old C++ style (argc, argv) for these paths. Or we can leave it just like that. Please tell me what you find the best option for the user.
For the above issue mlpack/mlpack#1254, creating something similar might be possible, but there are some points that need be detailed:

The issue seems to be related to Create a parser to parse the model from a json mlpack#1837. Does @sreenikSS still working on it?
@kunakl07 Are you working on it?
@sreenikSS stated that the issue Create a parser to parse the model from a json mlpack#1837 can go to stale since it would be part of model converter repository, does this repository exist? if yes, is it public? is there any issue opened detailing this converter?

Finally, if the converter is ready, we will have only to add mlpack::CLI and write the mlpack_ann main example which should be easier since the hard part (the parser) have been accomplished

rcurtin

Ok, I took a look through the code; thanks for making these changes, I think they make the example a lot more understandable. 👍 I left out any comments on the input parsing for now until we get through #61 and figure out what to do. In any case, whatever we decide, it should be pretty easy to adapt that for here.

I'm not sure of the status of mlpack/mlpack#1837 or the model converter repository; perhaps @sreenikSS can add a bit more. What I do know is that I am excited about that effort. :)

rcurtin · 2020-03-16T23:25:52Z

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

+            << std::endl;
+}
+
+int main(int argc, char* argv[])


Thanks for going through and changing from the CLI framework; I hope it wasn't too much effort. Let's see where the discussion on #61 goes and I think that will clarify what the best option is here. So for now I won't add any comments there. :)

I agree, let us wait for the discussion in mlpack/models#61 to see what to do with argc, argv
You do not have to worry about it, It was not much effort to do remove all the CLI code.

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

rcurtin · 2020-03-16T23:26:54Z

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp


-using namespace mlpack;


Hmm, removing this seems to make the later code a bit more verbose with the mlpack:: namespacing bits. Would it be good to leave using namespace mlpack; in, or did I overlook the reason for removing it? :)

In fact, I do not know mlpack convention for namespaces. If you would like I will keep all of them. However, I would not use any using namespace in his case since there are several libraries, Actually, when I started copying the code as a simple user, I had a problem knowing to which library each function belongs, especially there are the Kaggles functions that do not belong to any namespace.
For example, I would only keep namespace if we were using only mlpack. What do you think?

@rcurtin Tell me what do you think about namespaces. I would remove all of them if it is fine. What do you think?

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

rcurtin · 2020-03-16T23:30:20Z

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

 using namespace mlpack::ann;

 using namespace arma;
-using namespace std;


I'd also be okay with it if you wanted to leave this in. It would at least make the std::cout calls a few characters shorter. :)

Same as before, even I would declare a namespace for these helpers https://github.com/mlpack/models/blob/master/Kaggle/kaggle_utils.hpp if we decide to keep them. In this file it defines a save function, which can be ambiguous, since there are mlpack::data::save and arma::save

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

shrit · 2020-03-20T13:37:46Z

@rcurtin I will remove argc, argv from this code. In order to be compatible with examples.
I left some comments on some questions considering the namespaces.
I will apply callbacks on vae. Also, I will re-apply the same pull request on the new model repository since the title only refer to callbacks.
We will look into details later in the model repository considering all things related to mlpack::CLI

rcurtin

Awesome, thanks @shrit for continuing to work on this. I think it's basically ready. Do you want to open another version in the new models/ repository that can turn these into bindings? (No hurry of course.)

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

vae/vae_mnist/src/vae.cpp

shrit · 2020-03-24T11:34:41Z

@rcurtin Yes I going to open a new pull request for models with all the CLI binding for the MNIST models

Signed-off-by: Omar Shrit <shrit@lri.fr>

shrit · 2020-03-28T21:46:33Z

@rcurtin I think you can merge this one now.
If there is any problem we can handle it in a new pull request.
The objective of this pull request is accomplished.

rcurtin

Agreed, we can go ahead and merge it. I have one comment left, so if you want to handle that before merge, that would be great; otherwise, I can open an issue for it. 👍

rcurtin · 2020-03-31T00:08:33Z

mnist_simple/mnist_simple.cpp

-  // As before, it's necessary to get rid of header.
+  // should be sent to kaggle website).
+  arma::mat testingDataset;
+  mlpack::data::Load("../Kaggle/data/test.csv", testingDataset, true);


What should we do about the data? Now it's in ../data/kaggle_train_test_dataset.zip. I guess we could assume that the user has unpacked the zip in that directory? (Or maybe we provide a quick script to unpack all the data or something?)

I prefer removing the entire data directory and provide a small script that downloads the necessary data for each example. We do not have to keep data tracked by git. This will increase the size of the git repository for no obvious reason.
In addition, we might use bigger datasets in the future, which can not be added to this repository.

Sounds good, that works for me. Do you want to open an issue pointing out that we should add a script to download any datasets?

I will handle the issue with a pull request directly.
I will try to found the sources for the dataset and download them using wget or curl.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

kartikdutt18 · 2020-04-06T10:18:58Z

mnist_cnn/mnist_cnn.cpp

+  model.Predict(validX, predOut);
+  // Calculate accuracy on validating data points.
+  predLabels = getLabels(predOut);
+  double validAccuracy = arma::accu(predLabels == validY) / validY.n_elem;


Hey @shrit, I think this might not be correct. As They will never will be exactly equal. Do you mind running some test on this. Thanks a lot.

I think I am wrong here, they are labels so they should be correct. However when I tried them I got an accuracy of 0 and with existing function I got an accuracy of 96%. Maybe I did something wrong.

@kartikdutt18 You are right, I missed this file, I need to remove getLabels functions as I have removed the kaggle_utils.hpp file. I rebased and I did not build locally.
I was working on a set of makefiles for all examples. The problem is that if we use only Makefilesinstead of CMake, we will lose support for Visual Studio what do you think @rcurtin @kartikdutt18

@kartikdutt18 the code here looks right to me, but casting can make a difference here---size_t divided by size_t will be 0 in this case, but if one of the types is double (or if the result type is double?) then I think it will work fine. If the current code gives good accuracy numbers then I think it's fine. Also, I realized, we should multiply trainAccuracy and validAccuracy by 100. :)

@shrit good point about Visual Studio. The real complexity of build systems like CMake is just in getting all the dependencies right and everything. The basic Makefile we'd provide would just assume that everything was available on the system, so even that is not "bulletproof"---the user will still have to make sure that their system is set up right. For Visual Studio, I think that if the user sets up a new project correctly (like in the first part of https://www.mlpack.org/doc/mlpack-3.2.2/doxygen/sample_ml_app.html), then they can copy-paste this code directly and it should work.

Also we will be adding notebooks so users can use that and we can instructions for gcc etc. So maybe that isn't an issue. Not sure though.

Signed-off-by: Omar Shrit <shrit@lri.fr>

This function is required to convert from classification probability into labels Signed-off-by: Omar Shrit <shrit@lri.fr>

rcurtin · 2020-04-06T16:59:29Z

@shrit @kartikdutt18 let me know when you guys are good with merging this, and I'm happy to hit the button. :)

kartikdutt18 · 2020-04-06T17:16:15Z

@shrit @kartikdutt18 let me know when you guys are good with merging this, and I'm happy to hit the button. :)

Just one last thing, If an @shrit could add some results in comments about accuracy, I am good to go as well.

shrit · 2020-04-06T17:20:51Z

@kartikdutt18 Agreed, I will try to test this before tomorrow. (I will multipliy by 100 for accuracy).

kartikdutt18 · 2020-04-06T17:21:44Z

Great. Thanks a lot.

Signed-off-by: Omar Shrit <shrit@lri.fr>

shrit · 2020-04-08T10:56:42Z

@kartikdutt18 This is part of the output related to mnist_batch_norm output training process. While it seems that accuracies are 0, also the model does not learn well. I do not know if these examples have given better results before.
The getLables function seems to work, since the output is between 0->9. I need to double check everything in this code.

Epoch 114/0
0.0137785====================================================================================================] 100% - ETA: 0s - loss: 0.0137465
591/591 [====================================================================================================] 100% - 10s 17ms/step - loss: 0.0137785
Validation loss: 1113.26
Epoch 115/0
0.0162865====================================================================================================] 100% - ETA: 0s - loss: 0.0162486
591/591 [====================================================================================================] 100% - 10s 17ms/step - loss: 0.0162865
Validation loss: 1096.42
Epoch 116/0
0.0172006====================================================================================================] 100% - ETA: 0s - loss: 0.0171606
591/591 [====================================================================================================] 100% - 10s 17ms/step - loss: 0.0172006
Validation loss: 1072.54
Epoch 117/0
0.018189[====================================================================================================] 100% - ETA: 0s - loss: 0.0181468
591/591 [====================================================================================================] 100% - 10s 18ms/step - loss: 0.018189
Validation loss: 1066.69
Epoch 118/0
0.017634[====================================================================================================] 100% - ETA: 0s - loss: 0.017593
591/591 [====================================================================================================] 100% - 10s 18ms/step - loss: 0.017634
Validation loss: 1040.55
Epoch 119/0
0.016979[====================================================================================================] 100% - ETA: 0s - loss: 0.0169395
591/591 [====================================================================================================] 100% - 12s 20ms/step - loss: 0.016979
Validation loss: 1048.35
Epoch 120/0
0.0186793====================================================================================================] 100% - ETA: 0s - loss: 0.0186359
591/591 [====================================================================================================] 100% - 10s 18ms/step - loss: 0.0186793
Validation loss: 1142.12
Accuracy: train = 0%, valid = 0%
Predicting ...
Saving predicted labels to "results.csv"
Neural network model is saved to "model.bin"
Finished

kartikdutt18 · 2020-04-08T11:06:14Z

Hey @shrit, I have a notebook with mnist example using CNN and it gives decent results. Here is the link. Also the results can also be seen from #74 examples. I will try BatchNorm example as well and get back to you.

shrit · 2020-04-08T11:47:47Z

Yes, but your example uses the old functions from kaggle_utils.hpp. I am using also using early stop callback on the validation set.

kartikdutt18 · 2020-04-08T11:49:34Z

I think I also used ensmallen callbacks. But yes I used Kaggle_utils functions.

rcurtin · 2020-04-09T02:33:12Z

@shrit maybe you need to cast trainY.n_elem to double? I wonder if the division being done is integer division still. (I don't remember the C++ type promotion rules for mathematical operations.)

Signed-off-by: Omar Shrit <shrit@lri.fr>

shrit · 2020-04-09T19:45:35Z

@rcurtin, Yeap, it was related to double casting, I think we can hit the magic button now.

Epoch 18/0
0.0479495====================================================================================================] 100% - ETA: 0s - loss: 0.0478381
591/591 [====================================================================================================] 100% - 18s 30ms/step - loss: 0.0479495
Validation loss: 798.352
Accuracy: train = 99.5026%,      valid = 97.381%
Predicting ...
Saving predicted labels to "results.csv" ...
Neural network model is saved to "model.bin"
Finished

rcurtin · 2020-04-11T01:10:07Z

Awesome. Magic button pressed. :)

mlpack-bot bot added s: unlabeled s: unanswered s: needs review labels Feb 20, 2020

zoq reviewed Feb 21, 2020

View reviewed changes

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved

shrit requested a review from zoq February 22, 2020 11:35

zoq added t: added feature and removed s: unanswered s: unlabeled labels Feb 22, 2020

gaurav-singh1998 reviewed Feb 27, 2020

View reviewed changes

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved

rcurtin mentioned this pull request Mar 9, 2020

What is this repository for? #61

Closed

rcurtin reviewed Mar 17, 2020

View reviewed changes

shrit requested a review from rcurtin March 20, 2020 13:30

kartikdutt18 mentioned this pull request Mar 21, 2020

Simplifying the Examples repository. #66

Closed

7 tasks

rcurtin reviewed Mar 24, 2020

View reviewed changes

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved

vae/vae_mnist/src/vae.cpp Outdated Show resolved Hide resolved

vae/vae_mnist/src/vae.cpp Outdated Show resolved Hide resolved

shrit requested a review from rcurtin March 25, 2020 16:28

kartikdutt18 mentioned this pull request Mar 26, 2020

Strip CMake from all examples and restructure each example into its own directory #69

Merged

shrit added 3 commits March 28, 2020 22:28

Add callback to monitor the training process

9efc215

Signed-off-by: Omar Shrit <shrit@lri.fr>

Capital letter starting comments

b8fa909

Signed-off-by: Omar Shrit <shrit@lri.fr>

Fix some optimizer parameters

2764924

Signed-off-by: Omar Shrit <shrit@lri.fr>

shrit added 5 commits March 28, 2020 22:33

Remove non used accuracy function.

b34955a

Signed-off-by: Omar Shrit <shrit@lri.fr>

Adding comments, accuracy, save models

91474b6

Signed-off-by: Omar Shrit <shrit@lri.fr>

Apply the callbacks for vae_cnn

67fe76a

Signed-off-by: Omar Shrit <shrit@lri.fr>

Remove kaggle utils file

227081b

Signed-off-by: Omar Shrit <shrit@lri.fr>

Remove kaggle util from the file

7fddd16

Signed-off-by: Omar Shrit <shrit@lri.fr>

shrit force-pushed the digit branch from c99fb8d to 7fddd16 Compare March 28, 2020 21:41

rcurtin approved these changes Mar 31, 2020

View reviewed changes

mlpack-bot bot approved these changes Apr 1, 2020

View reviewed changes

mlpack-bot bot removed the s: needs review label Apr 1, 2020

kartikdutt18 reviewed Apr 6, 2020

View reviewed changes

shrit added 3 commits April 6, 2020 15:41

Remove kaggle_utils header, assume datasets are in ../data

79f199e

Signed-off-by: Omar Shrit <shrit@lri.fr>

Apply same comments for all the mnist cases

c89eb53

Signed-off-by: Omar Shrit <shrit@lri.fr>

Adding getLabel function.

5ce87ca

This function is required to convert from classification probability into labels Signed-off-by: Omar Shrit <shrit@lri.fr>

rcurtin mentioned this pull request Apr 7, 2020

Example notebooks #74

Merged

Multiply by 100 to get accuracy

dada05e

Signed-off-by: Omar Shrit <shrit@lri.fr>

Cast to double in order to get accuracy

0673516

Signed-off-by: Omar Shrit <shrit@lri.fr>

rcurtin merged commit bd5f9ef into mlpack:master Apr 11, 2020

shrit deleted the digit branch April 11, 2020 10:39

Add callback to monitor the training process #55

Add callback to monitor the training process #55

Conversation

shrit commented Feb 20, 2020

shrit commented Feb 23, 2020

zoq commented Feb 24, 2020

shrit commented Feb 24, 2020

kartikdutt18 commented Feb 26, 2020

shrit commented Feb 26, 2020

kartikdutt18 commented Feb 26, 2020

shrit commented Feb 29, 2020 • edited

rcurtin commented Mar 5, 2020

shrit commented Mar 5, 2020 • edited

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shrit commented Mar 20, 2020

rcurtin left a comment

Choose a reason for hiding this comment

shrit commented Mar 24, 2020

shrit commented Mar 28, 2020

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin commented Apr 6, 2020

kartikdutt18 commented Apr 6, 2020

shrit commented Apr 6, 2020

kartikdutt18 commented Apr 6, 2020

shrit commented Apr 8, 2020

kartikdutt18 commented Apr 8, 2020

shrit commented Apr 8, 2020

kartikdutt18 commented Apr 8, 2020

rcurtin commented Apr 9, 2020

shrit commented Apr 9, 2020

rcurtin commented Apr 11, 2020

shrit commented Feb 29, 2020 •

edited

shrit commented Mar 5, 2020 •

edited