Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANN Input/Output Size #1267

Closed
zoq opened this issue Feb 21, 2018 · 30 comments
Closed

ANN Input/Output Size #1267

zoq opened this issue Feb 21, 2018 · 30 comments

Comments

@zoq
Copy link
Member

zoq commented Feb 21, 2018

Currently, if we build a network we have to specify the input and output size for each layer, ideally a user should only specify the input/output size for the first layer, and only specify the output size for the upcoming layer. Here is a simple example:

FFN<NegativeLogLikelihood<> > model;
model.Add<Linear<> >(inputSizeA, outputSizeA);
model.Add<SigmoidLayer<> >();
model.Add<Linear<> >(inputSizeB, outputSizeB);
model.Add<LogSoftMax<> >();

which should transforms to this:

FFN<NegativeLogLikelihood<> > model;
model.Add<Linear<> >(inputSize, outputSize);
model.Add<SigmoidLayer<> >();
model.Add<Linear<> >(layerSize);
model.Add<LogSoftMax<> >();

There is something similar already in place to set the input and output width for the convolution operator:

if (boost::apply_visitor(outputWidthVisitor, network.front()) != 0)
{
width = boost::apply_visitor(outputWidthVisitor, network.front());
}
if (boost::apply_visitor(outputHeightVisitor, network.front()) != 0)
{
height = boost::apply_visitor(outputHeightVisitor, network.front());
}
}
for (size_t i = 1; i < network.size(); ++i)
{
if (!reset)
{
// Set the input width.
boost::apply_visitor(SetInputWidthVisitor(width), network[i]);
// Set the input height.
boost::apply_visitor(SetInputHeightVisitor(height), network[i]);
}
boost::apply_visitor(ForwardVisitor(std::move(boost::apply_visitor(
outputParameterVisitor, network[i - 1])), std::move(
boost::apply_visitor(outputParameterVisitor, network[i]))), network[i]);
if (!reset)
{
// Get the output width.
if (boost::apply_visitor(outputWidthVisitor, network[i]) != 0)
{
width = boost::apply_visitor(outputWidthVisitor, network[i]);
}
// Get the output height.
if (boost::apply_visitor(outputHeightVisitor, network[i]) != 0)
{
height = boost::apply_visitor(outputHeightVisitor, network[i]);

What this code does is to read and set the input/output width in the first forward pass. So once we know the output size of the previous layer we know the input size of the current layer and so on.

A first step would be to look at the https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/visitor/output_height_visitor.hpp and https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/visitor/set_input_height_visitor_impl.hpp, which should be helpful to implement another visitor for the input/output size.

@desai-aditya
Copy link

desai-aditya commented Feb 22, 2018

Instead of getting this changed at the codebase , why not change it at the main program of CLI? I mean the user could directly specify the following in the network.conf file:
linear 50
linear 20
linear 100

which would essentially be transformed by the CLI to :
linear dataset dimension 50
linear 50 20
linear 20 100

@akhandait
Copy link
Member

@desai-aditya I think it's very important to make changes in the codebase first. Not everybody will use the CLI. The changes should be universal so that even when we use this within c++, it should work.

@zoq
Copy link
Member Author

zoq commented Feb 22, 2018

I agree with @akhandait, this should be implemented in the ann codebase.

@daiviknema
Copy link
Contributor

I'd like to take this please

@zoq
Copy link
Member Author

zoq commented Feb 24, 2018

I think @desai-aditya and @akhandait are already working on the issue.

@akhandait
Copy link
Member

Sadly, it has been very difficult for me to take time out due to tighly scheduled exams.
@desai-aditya please leave a comment if you are working on this.
@PlantsAndBuildings maybe you could start on this if he doesn’t. Good luck.:)

@daiviknema
Copy link
Contributor

@akhandait : Thanks for letting me know.

@zoq : should I wait for @desai-aditya ? Or should I go ahead and start working on this?

@desai-aditya
Copy link

@PlantsAndBuildings : you too can start working with me.
@zoq ,@akhandait : sorry for a late reply , I was sick. I'll get back to work right away.
@akhandait : best luck for exams.

@zoq
Copy link
Member Author

zoq commented Feb 26, 2018

@desai-aditya take all the time you need, we don't have to rush anything here.

@PlantsAndBuildings I think @akhandait was working on the parser and @akhandait on the cli?

@Manthan-R-Sheth
Copy link
Contributor

Manthan-R-Sheth commented Feb 27, 2018

can it be made clear as to who all are working on this issue. I am interested to work on this as well.
@PlantsAndBuildings @desai-aditya @akhandait .
I guess if @PlantsAndBuildings and @desai-aditya are taking up this issue, may be I can work on the #1254 (CLI for ANN) issue.

@akhandait
Copy link
Member

I am working on CLI for ANN along with @desai-aditya.

@Manthan-R-Sheth
Copy link
Contributor

@PlantsAndBuildings, can we collaborate over this issue then?

@desai-aditya
Copy link

I have completed nearly half of the CLI approximately .
I am also working on this issue simultaneously but anyone wanting to join is more than welcome.

@akhandait How's the parser going? Also if possible make it so that the input to the parser is a csv file in the same format instead of a space separated one. It's a lot easier to handle it that way.

Also this issue and CLI for ANN are very closely related. If this issue is solved the parser will need to be updated. So we'll have to see how things go regarding that.

Any inputs from @zoq and @rcurtin would be appreciated .

@zoq
Copy link
Member Author

zoq commented Feb 27, 2018

In the best case, the parser is agnostic of the number of parameters. Let say the layer is defined as:

LayerName(parameterA, parameterB, parameterC, parameterD)

the config would look something like

LayerName 10 20 30 40

if the layer requires only two parameter we just pass those two.

@daiviknema
Copy link
Contributor

I think that there are already enough people wanting to work on this, I'll find another issue. @Manthan-R-Sheth please ask @desai-aditya if he needs help

@desai-aditya
Copy link

Also I need a little bit of help for compiling the main using make -
I have to put the main file in CMakeLists of the ann folder.
And where else? Also there's something called add_cli_executable() and add_python_binding() in the end of other CMakeLists files, do I have to do something with those?

@Manthan-R-Sheth
Copy link
Contributor

Manthan-R-Sheth commented Feb 27, 2018

Based on my understanding (correct me in case i am wrong),
add_cli_executable("name") adds a command line executable with the corresponding "name". This enables the command mlpack_name from CLI.
add_python_binding("name") builds a python binding for the corresponding "name".(builds a .pyx file for python to run).
These assume that the file containing the main() is in "name"_main.cpp and accordingly makes the executables/.pyx files.

So given that you are making CLI for ANN module, you will have to do add_cli_executable(ann) and make a ann_main.cpp where the main() function is defined that parses the input and builds the model.

@desai-aditya
Copy link

I mean other than whatever you described @Manthan-R-Sheth . I understand that but do we need to make any other changes to any other file? Also what about cmake changes?

@Manthan-R-Sheth
Copy link
Contributor

Manthan-R-Sheth commented Feb 27, 2018

I think, just addition of ann_main.cpp in the folder and add_cli_executable(ann) should do the work.

@zoq
Copy link
Member Author

zoq commented Feb 27, 2018

Yes, that should do the trick.

@daiviknema
Copy link
Contributor

It has been a while since I've seen any activity on this issue - and no open PRs as well. Is someone working on this, or can I take this one up?

@akhandait
Copy link
Member

@PlantsAndBuildings I am working on this. Even I saw the inactivity here and hence started working on this.

@akhandait
Copy link
Member

If we implement a new visitor for setting the input size of the layers in the first forward pass, how do we deal with the constructor of the layer objects.
For example :
model.Add<Linear<> >(layerSize);
In this case, how do we construct the Linear object with only one parameter? Will this require code modifications for all the layer objects?

@zoq
Copy link
Member Author

zoq commented Apr 1, 2018

An easy option would be to implement another constructor for each layer that needs the input/output size.

@akhandait
Copy link
Member

In the Linear layer implementation, or any other layer for that matter, the weights are initialized (size is set) in the constructor.

template <typename InputDataType, typename OutputDataType>
Linear<InputDataType, OutputDataType>::Linear(
    const size_t inSize,
    const size_t outSize) :
    inSize(inSize),
    outSize(outSize)
{
  weights.set_size(outSize * inSize + outSize, 1);
}

In our case, if we define a constructor which only takes the output size, how do we deal with this? For each layer, the input size will only be known during the first forward pass.

@zoq
Copy link
Member Author

zoq commented Apr 4, 2018

We could move the initialization in the Reset function which is called after the parameters are set.

@akhandait
Copy link
Member

akhandait commented Apr 11, 2018

Sorry for being late, I implemented the required visitors but am facing some issues. The Reset function is called after the parameters are set but the inSize for the second layer onwards is set during the first forward pass. So, the Reset function ends up initializing weights with inSize as 0.
EDIT:
I got the solution, we can call the Reset function after setting the inSize in set_input_size_visitor.hpp. I will open a PR.

@zoq
Copy link
Member Author

zoq commented Apr 13, 2018

Thanks for looking into it.

@akhandait
Copy link
Member

@zoq moving the initialization to the Reset function doesn't work. It throws error: addition: incompatible matrix dimensions: while training, the parameters are not getting initialized properly. We need another solution to initialize the weights because we have to initialize the weights of each layer before the first forward pass. Do you think we need a function which just initializes parameters without actually doing a forward pass?

@mlpack-bot
Copy link

mlpack-bot bot commented Feb 18, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants