Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI for ANN #1254

Open
arjunmenon opened this issue Feb 17, 2018 · 57 comments
Open

CLI for ANN #1254

arjunmenon opened this issue Feb 17, 2018 · 57 comments

Comments

@arjunmenon
Copy link

Hey
Are ANN classes available from the CLI? From the docs it isn't apparent.
How can we use it otherwise?

@zoq
Copy link
Member

zoq commented Feb 18, 2018

You are right there doesn't exist an CLI for the ANN classes. You would have to use it from within your C++ code. I think it would be great to have an executable for the network code, however since there is no single architecture it's somewhat complex to provide all the necessary settings and at the same time make it easy to be used.

@arjunmenon
Copy link
Author

I am not proficient with C++. I was hoping to add a Ruby wrapper to the CLI. So two things,

  1. Is there support planned for a Ruby binding
  2. If I have to make an attempt, where do I start.

@zoq
Copy link
Member

zoq commented Feb 19, 2018

For more information about how to add bindings to other languages please take a look at: http://www.mlpack.org/docs/mlpack-git/doxygen/bindings.html, let us know if we should clarify anything.

@rcurtin
Copy link
Member

rcurtin commented Feb 20, 2018

Hi Arjun,

There isn't currently any planned support for Ruby. If you are interested in writing a Ruby binding generator, it would be great and @zoq has given a link to some useful documentation. But unfortunately proficiency with C++ is going to be necessary to write this binding generator, so you may want to also study C++ a little more closely before diving too deep.

@desai-aditya
Copy link

Hi rcurtin,

I was planning to work on this in small pieces. Could I first implement a CLI that achieves this -
https://github.com/mlpack/models/blob/master/Kaggle/DigitRecognizer/src/DigitRecognizer.cpp
and then later improve it?

@rcurtin
Copy link
Member

rcurtin commented Feb 21, 2018

We need to wait for @zoq's input also, but my opinion is that any CLI program needs the following functionality:

  • specify a neural network architecture arbitrarily (possibly by loading it from a YAML file or something? We should probably discuss that part)
  • train on some given training data, with the user able to specify the optimizer and optimizer parameters
  • evaluate a trained model on some given testing data
  • print performance measures on training/test data
  • load an existing model or save a newly trained model

So writing something for a digit recognizer might be a good warmup exercise, but anything that we merge into mlpack should at least have the requirements above, in my opinion.

@akhandait
Copy link
Member

akhandait commented Feb 21, 2018

@zoq, since we don't have any CLI for ANN classes, do you think it will be a good idea to start with a CLI for fully connected ANN of required configuration with options for all other required parameters?
If yes, I would like to work on it.
@rcurtin I think it will be possible to incorporate all the other functionality mentioned above.
Maybe we could further develop it to support any neural network architecture.

@desai-aditya
Copy link

desai-aditya commented Feb 21, 2018

@rcurtin :
What do you mean by 'loading it from a YAML file or something'?
Also how about adding more activation functions?
why is leakyReLU in the layers directory?

@zoq
Copy link
Member

zoq commented Feb 21, 2018

I agree with @rcurtin, we should not merge anything that doesn't support:

  • specify a neural network architecture arbitrarily (possibly by loading it from a YAML file or something? We should probably discuss that part)
  • train on some given training data, with the user able to specify the optimizer and optimizer parameters
  • evaluate a trained model on some given testing data
  • print performance measures on training/test data
  • load an existing model or save a newly trained model

YAML for the config file sounds like a good idea to me, however parsing isn't as simple as e.g. csv and I don't like to include another dependency (perhaps boost spirit could be helpful).

What do you mean by 'loading it from a YAML file or something'?

Instead of specifying the architecture e.g. layers via command line parameters we use a config file, YAML is just the format, we could also use JSON or XML.

Also how about adding more activation functions?

If you have something in mind, we are open for suggestions.

why is leakyReLU in the layers directory?

The LeakyReLU layer doesn't follow the same interface as e.g. the TanH since it holds an extra parameter alpha.

@rcurtin
Copy link
Member

rcurtin commented Feb 21, 2018

YAML for the config file sounds like a good idea to me, however parsing isn't as simple as e.g. csv and I don't like to include another dependency (perhaps boost spirit could be helpful).

What do you mean by 'loading it from a YAML file or something'?

Instead of specifying the architecture e.g. layers via command line parameters we use a config file, YAML is just the format, we could also use JSON or XML.

Agreed, I want to avoid new dependencies also. It's also possible we could come up with a very simple custom format that we can parse ourselves. I do think boost::spirit could be used to parse YAML but it looks like implementing a YAML parser might be complex:

https://github.com/cierelabs/yaml_spirit

So maybe some plain text file is best?

linear 50 50
sigmoid
linear 50 50
sigmoid
linear 50 10
sigmoid

could be one way to do it for a 3-layer linear network with 50 hidden neurons in each hidden layer and 10 output neurons. I don't have much of a preference; personally I think we could do just about anything here and as long as it is well documented and simple enough then nobody will have a problem with it.

@zoq
Copy link
Member

zoq commented Feb 21, 2018

Agreed, that's super simple and easy to read.

@akhandait
Copy link
Member

I really like this idea. Also, I have worked with neural networks before and would love to write a CLI for it.
I think it will be really good and neat to input the network configuration as a txt file.
Can I go ahead and start implementing this, keeping all the 5 requirements in mind? :)

@akhandait
Copy link
Member

akhandait commented Feb 21, 2018

@desai-aditya Do you want to work on this as well? Because you did comment on this first.
It also doesn't seem to be a small project, will you like it if we worked on it together?

@desai-aditya
Copy link

Yes we can surely collaborate @akhandait . How do you suggest we split the work?

@akhandait
Copy link
Member

Okay, so I think we will need a day or two just to get really comfortable with the ANN method. The next step according to me will be to come with a concrete model for how we will implement it. If you have already dived into it and are comfortable maybe you could look at the CLI bindings we already have and come up with a basic structure for our implementation.
After that maybe we can discuss how to split the work.
Let's keep each other updated here on this issue.
Let's do this!
@rcurtin @zoq If you feel like we could do this a better way, please tell.

@rcurtin
Copy link
Member

rcurtin commented Feb 21, 2018

This all sounds good to me. Probably we will have to flesh out the file configuration format somewhat, but if we can keep it of the form

<layer type> <input size> <output size> <other parameters...>

I think that would be simplest to parse and work with. Sometimes the input or output size may not be needed. It'll also be really important to document exactly what the format is so that people can assemble simple networks just by looking at the documentation.

@akhandait
Copy link
Member

Yes,
@desai-aditya Documenting it well should be one of our top priorities.
Maybe we could also add some tutorials but that is thinking quite into the future.

@desai-aditya
Copy link

desai-aditya commented Feb 21, 2018

so help me if I forgot something -
first we'll just focus on linear models and then later add rnns
batch size: any int (-b)
train-test split ratio> : double between 0 and 1 (-r)
optimizer : type of optimizer: sgd , I dont know which others can fit need to read docs (-op)
step size of sgd : double (-s)
test file : filename (-t)
output file : filename (-o)
train file : filename (-i)
print performance : (-v)
num epochs : any int

and the following in network.conf file
for each layer
layer type : linear
input size : any int but the first layer must match the dimensions of the dataset
output size : any int

is this fine? @zoq, @rcurtin ,@akhandait

@zoq
Copy link
Member

zoq commented Feb 21, 2018

Looks good to me, and I agree let's start with the linear models. About the network.conf file, ideally we interpret everything after the layer name as parameter, like for the linear layer this is input and output size, but for the convolution layer, there are more parameter we could set.

We could easily split this up into two parts: writing the parser and writing the cli. The cli could just pretend the parser already works and work with some artificial settings.

@desai-aditya
Copy link

@akhandait , @zoq, @rcurtin - I think the parser will parse the file and return the values to the main CLI program. The values will then be used to make the model and train the dataset inside the CLI . There are some parameters that will be used in general that will be needed as arguments to the CLI. The parser should only parse the values and not compile the model. It could be that the user supplies the model too. This is what I understand. I have already started working on the CLI . @akhandait You can go ahead with the parser and join me as soon as you finish it.

@akhandait
Copy link
Member

@desai-aditya Can you please have a look at the CLIs we already have. You can find them here:
src/mlpack/methods/<method_name>/<method_name_main.cpp>.
All of them do the following job:

  1. Info about the program - to be shown when somebody uses --help.
  2. Define all the parameters that the program will take from the terminal / python.
  3. Throw appropriate errors/warnings for invalid parameters.
  4. Pass the required parameters to a model of that method.
  5. Run the model.
  6. Return/store the required parameters/model.
    Hope this helps. :)

@desai-aditya
Copy link

desai-aditya commented Feb 22, 2018

@akhandait Don't worry I have already taken a look at how CLI's work and experimented with them.
The only part where the trouble might start could be running and assembling the model since there's a huge variety of different ways in which it can be built. Maybe you could help me with that once you've built the parser. Also you'll need to tell me in what form the parser will return the input - the interface that is.

@zoq
Copy link
Member

zoq commented Feb 22, 2018

@desai-aditya is right, the parser will just parse the file that defines the structure and the CLI will build the model and run the model. So let's say we follow @rcurtin's idea of <layer type> <input size> <output size> <other parameters...> a simple example could be:

Linear 10 10
Sigmoid
Linear 10 5
Softmax

the cli builds the network based on the provided information:

FFN<NegativeLogLikelihood<> > model;
model.Add<Linear<> >(inputSizeA, outputSizeA);
model.Add<SigmoidLayer<> >();
model.Add<Linear<> >(inputSizeB, outputSizeB);
model.Add<LogSoftMax<> >();

and performs the requested action.

I hope this makes sense, let me know if I should clarify anything. A good starting point for the parser is: https://github.com/mlpack/mlpack/blob/master/src/mlpack/core/data/load_csv.hpp

@akhandait
Copy link
Member

akhandait commented Feb 22, 2018

@desai-aditya it's good you have already started with the CLI program, I have my exams over the next week, I will still try to take as much time out as I can and work on the parser, I am also working on another issue, so I won't claim too many tasks as it will just delay necessary tasks.
When I am done with the parser and the other issue, I will be happy to help with the CLI program. :)

@zoq
Copy link
Member

zoq commented Feb 22, 2018

@akhandait don't worry, and best of luck with your exams.

@Namrata96
Copy link
Contributor

Hi @desai-aditya @akhandait , are you still working on this? If yes, is there any way I could help?

@akhandait
Copy link
Member

Hi @Namrata96. Yeah, I have been a little slow with regard to this issue but am working on it. I think I will open a WIP pull request in coming days and maybe you can help me by reviewing it extensively.

@desai-aditya
Copy link

@akhandait @zoq I am extremely sorry for delay in replying. I had my exams in the past week. I am free now and will be continuing back on the CLI.
@akhandait Hows the parser working?
@Namrata96 As @akhandait said , you could help in reviewing the PR.

@sreenikSS
Copy link
Contributor

sreenikSS commented Mar 22, 2019

I thought about it a bit more; it may be useful to think about using YAML instead of JSON first since it's a lot easier to handwrite YAML. But I am not too picky---both work, and we can also write other tools on top of it all that produce JSON/YAML from an "easier" representation.

@rcurtin you are right, YAML is more user-friendly but I have just finished the JSON implementation with a boost property tree storing it and categorising the information into a number of maps storing the data in a dictionary like format. It am all for YAML but I currently want to focus on making the whole thing work; I shall provide YAML support (not much code though) after the main part is done. What do you say?

Do you think the new http://www.mlpack.org/doc/mlpack-3.0.4/cli_documentation.html might be a better thing to do here? We could embed a full description of the language and format either in the Detailed documentation section of the binding, like this:

http://mlpack.org/doc/mlpack-3.0.4/cli_documentation.html#approx_kfn_detailed-documentation

Or have a couple excerpts of "common" usage, and then refer to a tutorial that's written somewhere else. What do you think?

I have tried following the format of the original documentation, but there are some exceptions (for better readability), for example the linear layer has the original documentation:
Linear(const size_t inSize, const size_t outSize)
But we have abstracted the inSize, so the user only needs to specify outSize, which is better expressed as:
"type": "linear", "units": 100
because "units" sounds more appropriate than "outSize". But these instances are less in number, so maintaining a separate set of documentation is probably not needed. A separate tutorial mentioning those instances should be sufficient.

I'd agree, I feel like we can just have some C++ code that, perhaps, extracts the allowed layer types from LayerTypes in src/mlpack/methods/ann/layer/layer_types.hpp. This might be worth thinking about a little bit more: given the template type LayerTypes, can we extract a list of the members, and then also extract related documentation for each of the parameters to that layer? Or will we have to maintain that separately?

You'll be glad to hear, that part is already done (the code, not the doc part though) and I have done it exactly as you have mentioned here. I am structuring it in such a way that bindings and other stuff would be much easier to add. Actually I am learning a lot (A lot actually!!) while building this CLI.

One more thing, regarding the documentation are we planning to participate in Google's newly announced Season of Docs this year?

@rcurtin
Copy link
Member

rcurtin commented Mar 24, 2019

It am all for YAML but I currently want to focus on making the whole thing work; I shall provide YAML support (not much code though) after the main part is done. What do you say?

That's fine with me. 👍

I have tried following the format of the original documentation, but there are some exceptions (for better readability), for example the linear layer has the original documentation:
Linear(const size_t inSize, const size_t outSize)
But we have abstracted the inSize, so the user only needs to specify outSize, which is better expressed as:
"type": "linear", "units": 100
because "units" sounds more appropriate than "outSize". But these instances are less in number, so maintaining a separate set of documentation is probably not needed. A separate tutorial mentioning those instances should be sufficient.

Ah, what I was suggesting here is automatically generating the documentation for each layer, as opposed to manually writing it. It may be complex to come up with a good solution for that.

You'll be glad to hear, that part is already done (the code, not the doc part though) and I have done it exactly as you have mentioned here. I am structuring it in such a way that bindings and other stuff would be much easier to add. Actually I am learning a lot (A lot actually!!) while building this CLI.

Great! If there is any part of the code I can explain, just let me know. There are a lot of complex pieces that fit together in complex ways. :)

One more thing, regarding the documentation are we planning to participate in Google's newly announced Season of Docs this year?

Not sure, I think there is some interest in it in the community, so we'll see. 👍 I think it is a cool program, the issue for me at least is always time. (That said I don't have to run the effort, so if there's critical mass to do it, I think we should!) :)

@sreenikSS
Copy link
Contributor

Thanks. I shall let you know if I need any further clarification anywhere.

@sreenikSS
Copy link
Contributor

@rcurtin the parser is almost done. There are a number of critical parameters that need to be addressed. I shall post them tomorrow (extremely tired right now :( )

The PR is over here:
#1837

@sreenikSS
Copy link
Contributor

Critical comments on it:

  1. The init type doesn't take the given parameters into account because it is passed as a template parameter but not as an argument. The fix is quite simple. I am a little tight on schedule but will fix it as soon as I find some time.
  2. For model training and testing the code needs to be modified every time one switches from a classification problem to a regression problem. This can be solved by either taking a user input (when the CLI is complete) or by determining it from the last layer and given labels (i.e., the matrix 'trainY').
  3. Not all features have been implemented yet. Additionally, some init types and layer types have been commented because my mlpack installation somehow ignored them so it won't find the required files and hence I cannot compile and test it.
  4. Programs in general have multiple methods and they are called in sequence from a separate method (main() in most standalone applications). But here due to the absence of suitable return types, the functions call one another in a chain like fashion, i.e., say main() calls A(), A() calls B(), B() calls C() and so on. Maybe an inheritance based solution can help by having a superclass to all init types, optimizer types and loss types respectively just like LayerTypes is a superclass to all layers (does it already exist?)
  5. The functions are named like getters, like getLossType() and getInitType() whereas they are simple void functions and are even part of the chain discussed in point 4. These names are given to keep the essence of these methods in mind and the ideal behaviour they could exhibit (if a suitable return type is available).
  6. The style checks do not currently pass, I hope to update the code soon.

I will update this if something else comes to my mind.

@mlpack-bot
Copy link

mlpack-bot bot commented Jul 19, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@kunakl07
Copy link

I would like to work on this issue. Can I take this issue?

@bkmgit
Copy link
Contributor

bkmgit commented Jul 12, 2020

This would be a useful addition. A CLI is already part of Neural Network Libraries

Is anybody still working on this?

@RishabhGarg108
Copy link
Member

Hey, I was recently looking at this issue. After reading the discussion here, I figured out that we need to do two things here -

  1. A utility to parse the config file to build the model.
  2. An actual CLI that would implement that model.

In my opinion, a simple text file would not suffice, because it is easier to think of the model in a structured way (json/xml)

Then, key question is "How do we parse that structured config file?"

@sreenikSS has almost completed the parser in #1837 , but it is done using boost::property_tree. Since, we are trying to reduce our dependence on boost libraries, I thought we need to do it in some other way. I discussed it on IRC with @zoq . He suggested me to use cereal for it. So, I gave it a shot and tried to see if that works out or not, because cereal can be used to save or load objects into different formats like json/xml.

So, to test it out, I created a small model and saved it in the following way.
FFN<> model;
model.Add<Linear<> >(20, 3);
model.Add<SigmoidLayer<> >();
model.Add<LogSoftMax<> >();

data::Save("model.json", "model", model, false);

The model.json generated looks like this. This looks pretty bad because this actually saves all the parameters and everything associated to the model. To use cereal, the user actually needs to make a config file similar to it, This is practically not possible for user to write it.

So, now we are back to the same question i.e. how do we parse config file.
We can do it two ways -

  1. By writing our own json/xml parser. (which in my opinion would be a tedious task)
  2. By using some external dependency. (which would increase external dependence)

So, I want to ask what should be a good way to approach this? Thanks.

@zoq
Copy link
Member

zoq commented Dec 1, 2020

What I had in my mind was to have some sort of config class that we could serialize and use it as an intermediate format to construct the model. Isn't that what the parse would do as well? I would expect that a config class something like:

class Config
{
 public:
   std::vector<struct{std::string name, inSize, outSize}> layerName;
}

Would look easy enough.

@bkmgit
Copy link
Contributor

bkmgit commented Dec 1, 2020

@RishabhGarg108
Copy link
Member

@bkmgit , thanks for these resources. I will definitely look at them.
The ONNX thing looks great, because that way we can import models from all those other libraries that ONNX support.

I have no experience with parsing and stuff. So currently, I am trying and exploring various things, like what @zoq mentioned about a Config class etc.. I will also give ONNX a try and we will see what works best. 👍

@RishabhGarg108
Copy link
Member

@RishabhGarg108 The following may be relevant:
https://www.khronos.org/nnef/
https://github.com/onnx/onnx/blob/master/docs/IR.md

@bkmgit , I looked at both of these. It turns out that both of these tools are for interoperability and optimization of trained models across different frameworks and devices for making inferences. This is not exactly what we are looking for. Infact, all we want is just a simple way to define the architecture for an ann model and then be able to translate that into a mlpack ann model. I hope that makes sense.
Correct me if I overlooked something :D

@rcurtin
Copy link
Member

rcurtin commented Dec 28, 2021

We should leave this open---it is still important functionality we should add at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests