Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add callback to monitor the training process #55

Merged
merged 25 commits into from Apr 11, 2020
Merged

Conversation

shrit
Copy link
Member

@shrit shrit commented Feb 20, 2020

Signed-off-by: Omar Shrit shrit@lri.fr

@shrit
Copy link
Member Author

shrit commented Feb 23, 2020

@zoq Would you think that it is better to have some program options in this example, just to allow the users to pass the path of the model dynamically instead of compiling the model each time. It seems that model zoo requires boost program options as a dependency. If it is the case, I can use it to so that, or I can do it old fashin (argc, argv) function?

@zoq
Copy link
Member

zoq commented Feb 24, 2020

I think that is a great idea, I guess, we can also use mlpack's CLI functionality for the same. I guess that would bring us one step closer to expose those models to other languages as well (bindings for python, julia, etc.).

@shrit
Copy link
Member Author

shrit commented Feb 24, 2020

Great, I will push as soon as possible with mlpack's CLI

@kartikdutt18
Copy link
Member

@shrit, will you be pursing the same for LSTM or do you mind if I take that up?

@shrit
Copy link
Member Author

shrit commented Feb 26, 2020

I am going to finish only the Kaggle models. You can start working on LSTM

@kartikdutt18
Copy link
Member

Great, I hope to open a PR soon.

@shrit
Copy link
Member Author

shrit commented Feb 29, 2020

@zoq I think there is something missing with the actual CMake. Since I have added the mlpack:CLI It was not possible to link with boost program option, (my version is 1.71.0). CMake was able to find the boost headers, but not the implementations.
In order to compile on my machine, I had to use set(program_options "/usr/lib64/libboost_program_options.so") and link with program_options
Also travis is complaining about the same issue.

@rcurtin
Copy link
Member

rcurtin commented Mar 5, 2020

Ahhh, sorry to be a little late to the party here and reply after a lot of the code has already been done. At least my understanding of the models repo is that it contained self-contained examples that people could use and adapt for their tasks. Using the mlpack CLI framework, in my opinion, makes it a good bit harder to actually do that---instead of being able to just copy-paste the code in the example that concerns the model training, now they have to understand what all the CLI stuff is. Also, it would be a lot harder to compile the program by hand, since the BINDING_TYPE macro would need to be defined correctly.

I guess, what I mean is, the use of the CLI class here does mean that we can nicely compile into a program that can do digit recognition with various parameters. But I think that more people are using this as a simple example, moreso than actually doing digit recognition with it.

I think that it would be better to have a "generic" binding in the main mlpack repository that can work with neural networks of arbitrary shape and size (defined in a text file or something). There's an issue open for that: mlpack/mlpack#1254. Then, the models repository can primarily be a set of minimal-code examples that people can adapt into their own work.

@shrit
Copy link
Member Author

shrit commented Mar 5, 2020

@rcurtin I understand.
No problem, I can remove mlpack::CLI. However, it is a little bit strange to put the path to the training dataset directly in the code, the user will have to re-compile each time for a new path, no forgetting that the compilation time is considerable.
We can at least use the old C++ style (argc, argv) for these paths. Or we can leave it just like that. Please tell me what you find the best option for the user.
For the above issue mlpack/mlpack#1254, creating something similar might be possible, but there are some points that need be detailed:

  1. The issue seems to be related to Create a parser to parse the model from a json mlpack#1837. Does @sreenikSS still working on it?
  2. @kunakl07 Are you working on it?
  3. @sreenikSS stated that the issue Create a parser to parse the model from a json mlpack#1837 can go to stale since it would be part of model converter repository, does this repository exist? if yes, is it public? is there any issue opened detailing this converter?

Finally, if the converter is ready, we will have only to add mlpack::CLI and write the mlpack_ann main example which should be easier since the hard part (the parser) have been accomplished

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I took a look through the code; thanks for making these changes, I think they make the example a lot more understandable. 👍 I left out any comments on the input parsing for now until we get through #61 and figure out what to do. In any case, whatever we decide, it should be pretty easy to adapt that for here.

I'm not sure of the status of mlpack/mlpack#1837 or the model converter repository; perhaps @sreenikSS can add a bit more. What I do know is that I am excited about that effort. :)

<< std::endl;
}

int main(int argc, char* argv[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going through and changing from the CLI framework; I hope it wasn't too much effort. Let's see where the discussion on #61 goes and I think that will clarify what the best option is here. So for now I won't add any comments there. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let us wait for the discussion in mlpack/models#61 to see what to do with argc, argv
You do not have to worry about it, It was not much effort to do remove all the CLI code.

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved

using namespace mlpack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, removing this seems to make the later code a bit more verbose with the mlpack:: namespacing bits. Would it be good to leave using namespace mlpack; in, or did I overlook the reason for removing it? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I do not know mlpack convention for namespaces. If you would like I will keep all of them. However, I would not use any using namespace in his case since there are several libraries, Actually, when I started copying the code as a simple user, I had a problem knowing to which library each function belongs, especially there are the Kaggles functions that do not belong to any namespace.
For example, I would only keep namespace if we were using only mlpack. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin Tell me what do you think about namespaces. I would remove all of them if it is fine. What do you think?

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved
Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved
using namespace mlpack::ann;

using namespace arma;
using namespace std;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be okay with it if you wanted to leave this in. It would at least make the std::cout calls a few characters shorter. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, even I would declare a namespace for these helpers https://github.com/mlpack/models/blob/master/Kaggle/kaggle_utils.hpp if we decide to keep them. In this file it defines a save function, which can be ambiguous, since there are mlpack::data::save and arma::save

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved
@shrit shrit requested a review from rcurtin March 20, 2020 13:30
@shrit
Copy link
Member Author

shrit commented Mar 20, 2020

@rcurtin I will remove argc, argv from this code. In order to be compatible with examples.
I left some comments on some questions considering the namespaces.
I will apply callbacks on vae. Also, I will re-apply the same pull request on the new model repository since the title only refer to callbacks.
We will look into details later in the model repository considering all things related to mlpack::CLI

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @shrit for continuing to work on this. I think it's basically ready. Do you want to open another version in the new models/ repository that can turn these into bindings? (No hurry of course.)

Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved
Kaggle/DigitRecognizer/src/DigitRecognizer.cpp Outdated Show resolved Hide resolved
vae/vae_mnist/src/vae.cpp Outdated Show resolved Hide resolved
vae/vae_mnist/src/vae.cpp Outdated Show resolved Hide resolved
@shrit
Copy link
Member Author

shrit commented Mar 24, 2020

@rcurtin Yes I going to open a new pull request for models with all the CLI binding for the MNIST models

Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
@shrit
Copy link
Member Author

shrit commented Mar 28, 2020

@rcurtin I think you can merge this one now.
If there is any problem we can handle it in a new pull request.
The objective of this pull request is accomplished.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can go ahead and merge it. I have one comment left, so if you want to handle that before merge, that would be great; otherwise, I can open an issue for it. 👍

// As before, it's necessary to get rid of header.
// should be sent to kaggle website).
arma::mat testingDataset;
mlpack::data::Load("../Kaggle/data/test.csv", testingDataset, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do about the data? Now it's in ../data/kaggle_train_test_dataset.zip. I guess we could assume that the user has unpacked the zip in that directory? (Or maybe we provide a quick script to unpack all the data or something?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer removing the entire data directory and provide a small script that downloads the necessary data for each example. We do not have to keep data tracked by git. This will increase the size of the git repository for no obvious reason.
In addition, we might use bigger datasets in the future, which can not be added to this repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that works for me. Do you want to open an issue pointing out that we should add a script to download any datasets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle the issue with a pull request directly.
I will try to found the sources for the dataset and download them using wget or curl.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

model.Predict(validX, predOut);
// Calculate accuracy on validating data points.
predLabels = getLabels(predOut);
double validAccuracy = arma::accu(predLabels == validY) / validY.n_elem;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shrit, I think this might not be correct. As They will never will be exactly equal. Do you mind running some test on this. Thanks a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am wrong here, they are labels so they should be correct. However when I tried them I got an accuracy of 0 and with existing function I got an accuracy of 96%. Maybe I did something wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kartikdutt18 You are right, I missed this file, I need to remove getLabels functions as I have removed the kaggle_utils.hpp file. I rebased and I did not build locally.
I was working on a set of makefiles for all examples. The problem is that if we use only Makefilesinstead of CMake, we will lose support for Visual Studio what do you think @rcurtin @kartikdutt18

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kartikdutt18 the code here looks right to me, but casting can make a difference here---size_t divided by size_t will be 0 in this case, but if one of the types is double (or if the result type is double?) then I think it will work fine. If the current code gives good accuracy numbers then I think it's fine. Also, I realized, we should multiply trainAccuracy and validAccuracy by 100. :)

@shrit good point about Visual Studio. The real complexity of build systems like CMake is just in getting all the dependencies right and everything. The basic Makefile we'd provide would just assume that everything was available on the system, so even that is not "bulletproof"---the user will still have to make sure that their system is set up right. For Visual Studio, I think that if the user sets up a new project correctly (like in the first part of https://www.mlpack.org/doc/mlpack-3.2.2/doxygen/sample_ml_app.html), then they can copy-paste this code directly and it should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we will be adding notebooks so users can use that and we can instructions for gcc etc. So maybe that isn't an issue. Not sure though.

Signed-off-by: Omar Shrit <shrit@lri.fr>
Signed-off-by: Omar Shrit <shrit@lri.fr>
This function is required to convert from classification
probability into labels

Signed-off-by: Omar Shrit <shrit@lri.fr>
@rcurtin
Copy link
Member

rcurtin commented Apr 6, 2020

@shrit @kartikdutt18 let me know when you guys are good with merging this, and I'm happy to hit the button. :)

@kartikdutt18
Copy link
Member

@shrit @kartikdutt18 let me know when you guys are good with merging this, and I'm happy to hit the button. :)

Just one last thing, If an @shrit could add some results in comments about accuracy, I am good to go as well.

@shrit
Copy link
Member Author

shrit commented Apr 6, 2020

@kartikdutt18 Agreed, I will try to test this before tomorrow. (I will multipliy by 100 for accuracy).

@kartikdutt18
Copy link
Member

Great. Thanks a lot.

@rcurtin rcurtin mentioned this pull request Apr 7, 2020
Signed-off-by: Omar Shrit <shrit@lri.fr>
@shrit
Copy link
Member Author

shrit commented Apr 8, 2020

@kartikdutt18 This is part of the output related to mnist_batch_norm output training process. While it seems that accuracies are 0, also the model does not learn well. I do not know if these examples have given better results before.
The getLables function seems to work, since the output is between 0->9. I need to double check everything in this code.

Epoch 114/0
0.0137785====================================================================================================] 100% - ETA: 0s - loss: 0.0137465
591/591 [====================================================================================================] 100% - 10s 17ms/step - loss: 0.0137785
Validation loss: 1113.26
Epoch 115/0
0.0162865====================================================================================================] 100% - ETA: 0s - loss: 0.0162486
591/591 [====================================================================================================] 100% - 10s 17ms/step - loss: 0.0162865
Validation loss: 1096.42
Epoch 116/0
0.0172006====================================================================================================] 100% - ETA: 0s - loss: 0.0171606
591/591 [====================================================================================================] 100% - 10s 17ms/step - loss: 0.0172006
Validation loss: 1072.54
Epoch 117/0
0.018189[====================================================================================================] 100% - ETA: 0s - loss: 0.0181468
591/591 [====================================================================================================] 100% - 10s 18ms/step - loss: 0.018189
Validation loss: 1066.69
Epoch 118/0
0.017634[====================================================================================================] 100% - ETA: 0s - loss: 0.017593
591/591 [====================================================================================================] 100% - 10s 18ms/step - loss: 0.017634
Validation loss: 1040.55
Epoch 119/0
0.016979[====================================================================================================] 100% - ETA: 0s - loss: 0.0169395
591/591 [====================================================================================================] 100% - 12s 20ms/step - loss: 0.016979
Validation loss: 1048.35
Epoch 120/0
0.0186793====================================================================================================] 100% - ETA: 0s - loss: 0.0186359
591/591 [====================================================================================================] 100% - 10s 18ms/step - loss: 0.0186793
Validation loss: 1142.12
Accuracy: train = 0%, valid = 0%
Predicting ...
Saving predicted labels to "results.csv"
Neural network model is saved to "model.bin"
Finished

@kartikdutt18
Copy link
Member

Hey @shrit, I have a notebook with mnist example using CNN and it gives decent results. Here is the link. Also the results can also be seen from #74 examples. I will try BatchNorm example as well and get back to you.

@shrit
Copy link
Member Author

shrit commented Apr 8, 2020

Yes, but your example uses the old functions from kaggle_utils.hpp. I am using also using early stop callback on the validation set.

@kartikdutt18
Copy link
Member

I think I also used ensmallen callbacks. But yes I used Kaggle_utils functions.

@rcurtin
Copy link
Member

rcurtin commented Apr 9, 2020

@shrit maybe you need to cast trainY.n_elem to double? I wonder if the division being done is integer division still. (I don't remember the C++ type promotion rules for mathematical operations.)

Signed-off-by: Omar Shrit <shrit@lri.fr>
@shrit
Copy link
Member Author

shrit commented Apr 9, 2020

@rcurtin, Yeap, it was related to double casting, I think we can hit the magic button now.

Epoch 18/0
0.0479495====================================================================================================] 100% - ETA: 0s - loss: 0.0478381
591/591 [====================================================================================================] 100% - 18s 30ms/step - loss: 0.0479495
Validation loss: 798.352
Accuracy: train = 99.5026%,      valid = 97.381%
Predicting ...
Saving predicted labels to "results.csv" ...
Neural network model is saved to "model.bin"
Finished

@rcurtin rcurtin merged commit bd5f9ef into mlpack:master Apr 11, 2020
@rcurtin
Copy link
Member

rcurtin commented Apr 11, 2020

Awesome. Magic button pressed. :)

@shrit shrit deleted the digit branch April 11, 2020 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants