Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convolutional network, annealing and epochs #135

Open
CherryGoose opened this issue Jan 17, 2019 · 15 comments
Open

Convolutional network, annealing and epochs #135

CherryGoose opened this issue Jan 17, 2019 · 15 comments

Comments

@CherryGoose
Copy link

CherryGoose commented Jan 17, 2019

Im trying to create a convolutional network. What am i doing wrong? it seems that there is no difference between training net with larger or smaller number of examples. Also can you tell me what kind of methods of training used for every type of network? I using your framework for research purposes and if you can give me references to papers or algorithms that you used that would be great.

net.AddLayer(new InputLayer(UserData[0].GetLength(0), 1, 1));

for (int i = 0; i < NumberOfHiddenLayers; i++)
{
  int size;
  if (UserData[0].GetLength(0) < NumberOfHiddenLayers)
  {
    size = UserData[0].GetLength(0);
  }
  else
  {
    size = UserData[0].GetLength(0) / NumberOfHiddenLayers;
  }

  if (size < 2)
    size = 2;

  net.AddLayer(new ConvLayer((UserData[0].GetLength(0) - i * size), 1, 1));
  net.AddLayer(new ReluLayer());
}

net.AddLayer(new ConvLayer(2, 1, 1));
net.AddLayer(new SoftmaxLayer(2));
@cbovar
Copy link
Owner

cbovar commented Jan 19, 2019

  1. It seems that size can be computed outside the for loop
  2. I understand your training accuracy doesn't get better when you provide more training data. Have you tried with a simpler network ? I'm not sure I understand the way you compute the kernel size of convolution layers
  3. If you have a full code source that I could run, that would be easier for me to help you
  4. The training algorithms (Sgd and Adam) are inspired by original implementation of ConvNetJS. You could look at https://arxiv.org/pdf/1609.04747.pdf

@CherryGoose
Copy link
Author

CherryGoose commented Jan 19, 2019

I have an array of doubles that comes from processed features of subjects. right now im trying to run this code but the training accuracy is off.

SgdTrainer Tr = new SgdTrainer(net)
{
    LearningRate = 0.01,
    BatchSize = 500,
    L2Decay = 0.001,
    Momentum = 0.9
};
net.AddLayer(new InputLayer(28, 28, 1));
net.AddLayer(new ConvLayer(5, 5, 8) { Stride = 1, Pad = 2 });
net.AddLayer(new ReluLayer());
net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
net.AddLayer(new ConvLayer(5, 5, 16) { Stride = 1, Pad = 2 });
net.AddLayer(new ReluLayer());
net.AddLayer(new PoolLayer(3, 3) { Stride = 3 });
net.AddLayer(new FullyConnLayer(10));
net.AddLayer(new SoftmaxLayer(10));

double[] d = new double[12 * 63];
for (int k = 0; k < 10; k++)
{
    int count = 0;
    for (int i = 0; i < 12; i++)
    {
        for (int j = 0; j < UserDATA[k].GetLength(1); j++)
        {
            d[count] = UserDATA[k][i, j];
            count++;
        }
    }
    var x = BuilderInstance.Volume.From(d, new Shape(12, 63, 1));
    double[] z = new double[10];
    for (int t = 0; t < z.Length; t++)
    {
        z[t] = 0.0;
    }
    z[k] = 1.0;
    var zx = BuilderInstance.Volume.From(z, new Shape(1, 1, 10, 1));
    for (int g = 0; g < Convert.ToInt32(NumberOfTrainingSteps.Text); g++)
    {
        Tr.Train(x, zx); // train the network, specifying that x is class zero
    }
}

double[] ts = new double[12 * 63];
double[] testd = new double[12 * 63];

for (int k = 0; k < 10; k++)
{
    int count = 0;
    for (int i = 0; i < 12; i++)
    {
        for (int j = 0; j < UserDATA[k].GetLength(1); j++)
        {
            testd[count] = UserDATA[k][i, j];

            if (k == 0)
                ts[count] = UserDATA[k][i, j];
            count++;
        }
    }

    var x = BuilderInstance.Volume.From(testd, new Shape(12, 63, 1));

    var prob = net.Forward(x);
    TestCON.Text += "\r\n" + " " + k + "            " + prob.Get(k);
    TestCON.Text += "\r\n" + k + " cl 0 prob " + prob.Get(0);
}

it seems that NumberOfTrainingSteps does not give me any increase in accuracy. but that can be expected because im not feeding any new data to the network. Thing is, even if i do train it on other examples nothing changes. Also, what is the BatchSize in trainer responsible for? Also, as i understand the input layer size should correspond with the amount of data points i feed to the network i.e. 28x28x1 should take no more than 784 data points?

@cbovar
Copy link
Owner

cbovar commented Jan 19, 2019

  1. You should present a different input every time you call Train method. It seems you call NumberOfTrainingSteps times Train with the same data. This will make the network forget previous data in the dataset.
  2. BatchSize is used in the trainers to normalize the gradients. It seems the BatchSize information is duplicated: in the trainer and in the 4th input volume dimension. I think it is possible to get rid of the one on the trainer but haven't done it yet (it's some relics from original ConvNetJS implem)
  3. You should feed the network during training and inference with volume of shape 28x28x1xBatchSize. With BatchSize = 1 it should take 784 data points exactly (it seems you feed less data than that)

@CherryGoose
Copy link
Author

Can you tell me what learningRate, L2Decay, Momentum represent? Also, is it possible to use the same data samples to train the network? Do you have functions that mutate the weights(simulated annealing, freezing, evolution multidimensional optimisation) or functions that separate epochs in training of the network? Also, if i use different trainers to train the network on the same data samples will it change anything performance wise?

@cbovar
Copy link
Owner

cbovar commented Feb 2, 2019

The learning rate determines the size of the steps we take to reach a (local) minimum. Basically the gradients are multiplied by the learning rate before being used to update the parameters to optimize. (see here in the code)

L1Decay and L2Decay are supposed to be used for regularization. You've made me realize that I still haven't implemented them. So these parameters are useless. I will get rid of them in the meanwhile.

Momentum is a method that helps accelerate SGD. You can look at section 4.1 of https://arxiv.org/pdf/1609.04747.pdf. (see here in the code)

The functions that mutate weights are called Trainers in ConvNetSharp. SgdTrainer / AdamTrainer for ConvNetSharp.Core and SgdTrainer / AdamTrainer for ConverNetSharp.Flow

Using different trainers will impact the performance of the network: Some training algorithms are more adapted to some kind of tasks.

I am not sure I understand "functions that separate epochs in training of the network". If it's a function to split data set in training / testing / validating, there is no such function in this library.

@CherryGoose
Copy link
Author

Thank you for information! I have another question: im testing network after every training step and having problems with probability output. 100 different test samples outputs the same probability. As i understand the output should be different with every new test sample. What may cause that? It seems to me that network forgets previous training data or i simply can't see errors in my code. Here is the code: Net net = new Net();

            AdamTrainer Trex = new AdamTrainer(net)
            {
                LearningRate = LR,
                BatchSize=1
            };

        net.AddLayer(new InputLayer(999, 705, 1));
        net.AddLayer(new ConvLayer(11, 11, 5) { Stride = 1, Pad = 2 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
        net.AddLayer(new ConvLayer(5, 5, 16) { Stride = 1, Pad = 2 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
        net.AddLayer(new ConvLayer(3, 3, 20) { Stride = 1, Pad = 2 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new PoolLayer(2, 2) { Stride = 2 });
        net.AddLayer(new ConvLayer(2, 2, 30) { Stride = 1, Pad = 1 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new FullyConnLayer(2));
        net.AddLayer(new SoftmaxLayer(2));




        for (int yh = 0; yh < 100; yh++)
        {
            var x = BuilderInstance.Volume.From(TrueSamp[yh], new Shape(999, 705, 1));
            var y = BuilderInstance.Volume.From(FalseSamp[yh], new Shape(999, 705, 1));

            var zx = BuilderInstance.Volume.From(new[] { 1.0, 0.0 }, new Shape(1, 1, 2, 1));
            var zy = BuilderInstance.Volume.From(new[] { 0.0, 1.0 }, new Shape(1, 1, 2, 1));

            Trex.Train(x, zx); // train the network, specifying that x is class 0
            avloss += Trex.Loss;
            loss += "\r\n" + Trex.Loss;
            Trex.Train(y, zy); // train the network, specifying that y is class 1
            avloss += Trex.Loss;
            loss += "\r\n" + Trex.Loss;
            Random rand = new Random();

            double[] truesamp = new double[shit];
            for (int i = 0; i < 100; i++)
            {
                truesamp = TrueSampTest[rand.Next(0, 100)];
                var rq = BuilderInstance.Volume.From(truesamp, new Shape(999 * 705, 1, 1));
                var probq = net.Forward(rq);
                trueoutput += "\r\n" + probq.Get(0);
            }
            trueoutput += "yh ="+yh ;
            for (int i = 0; i < 100; i++)
            {
                double[] falsesamples = new double[shit];
                falsesamples = FalseSampTest[rand.Next(0, 100)];
                var rx = BuilderInstance.Volume.From(falsesamples, new Shape(999 * 705, 1, 1));
                var proby = net.Forward(rx);
                falseoutput += "\r\n" + +proby.Get(0);
            }
            falseoutput += "yh =" + yh;
        }

@cbovar
Copy link
Owner

cbovar commented Apr 30, 2019

Does the loss decrease?
It should output the same proba when yh is low but should not when yh start to grow.

@CherryGoose
Copy link
Author

CherryGoose commented May 1, 2019

loss decreases as it should. According to the paper you сited.
image

@cbovar
Copy link
Owner

cbovar commented May 1, 2019

The input shape you use for testing seems odd: new Shape (999 * 705, 1, 1) instead of new shape(999, 705, 1). I'm not sure that's the source of the problem but it'd be interesting to fix that.

@cbovar
Copy link
Owner

cbovar commented May 5, 2019

Also, could you try decreasing the learning rate and post a new plot the loss? Maybe divide it by 10.

@CherryGoose
Copy link
Author

I've changed the shape in testing method. No changes in proba occurred. Also tried decreasing learning rate, here is the plot.
image

@cbovar
Copy link
Owner

cbovar commented May 5, 2019

What is the value of LR?

Any chance to have the full code so I can run it ? I think I just need FalseSamp, TrueSamp, FalseSampTest, TrueSampTest

@CherryGoose
Copy link
Author

CherryGoose commented May 6, 2019

Right now LR is 0.001. Here is the main code. Im reading data from files like the one i attached. Each line in file is a set of coordinates with corresponding value. One file is one training sample. I've changed architecture of layers a bit as it gives slightly better results.

1.zip

Net net = new Net();
string[] arr = Directory.GetFiles("C:\Users\USER\Desktop\Norm_podp\YST", ".");
string[] arrTest = Directory.GetFiles("C:\Users\USER\Desktop\BaS_PFS\NORM", ".");
string[] arrSig = Directory.GetFiles("C:\Users\USER\Desktop\Norm_podp\NORM", ".");

    double[,,] parce = new double[200, 1000, 1000];
    double[,,] parceTest = new double[100, 1000, 1000];
    double[,,] parceSig = new double[200, 1000, 1000];


    double[][] TrueSamp = new double[200][];
    double[][] TrueSampTEst = new double[100][];
    double[][] FalseSamp = new double[200][];
    int dimentionSize = (999 * 705);

for (int k = 0; k < 200; k++)
{
for (int i = 0; i < 1000; i++)
{
for (int j = 0; j < 1000; j++)
{

                   parce[k, i, j] = -1;
                    if (k < 100)
                        parceTest[k, i, j] = -1;
                   parceSig[k, i, j] = -1;
              
                }
            }
        }

        int count = 0;
        foreach (string file in arr)
        {
            string[] text = System.IO.File.ReadAllLines(file);
            for (int i = 0; i < text.Length; i++)
            {
                string[] sp = text[i].Split(';');
                int x = Convert.ToInt32(sp[0]);
                int y = Convert.ToInt32(sp[1]);
                parce[count, x, y] = Convert.ToInt32(sp[2]);
            }
            count++;
        }
        count = 0;
        foreach (string file in arrTest)
        {
            string[] text = System.IO.File.ReadAllLines(file);

            for (int i = 0; i < text.Length; i++)
            {
                string[] sp = text[i].Split(';');
                int x = Convert.ToInt32(sp[0]);
                int y = Convert.ToInt32(sp[1]);
                parceTest[count, x, y] = Convert.ToInt32(sp[2]);
            }
            count++;
        }
       
        count = 0;
        foreach (string file in arrSig)
        {
            string[] text = System.IO.File.ReadAllLines(file);
            for (int i = 0; i < text.Length; i++)
            {
                string[] sp = text[i].Split(';');
                int x = Convert.ToInt32(sp[0]);
                int y = Convert.ToInt32(sp[1]);
                parceSig[count, x, y] = Convert.ToInt32(sp[2]);
            }
            count++;
        }
        TestCON.Text += "Files loaded \r\n";
        count = 0;




        int numberoftrainingex = 200; 
        double LR = 0.001;
       


        string trueoutput = "";
        string trueoutputfalsesamp = "";
        string falseoutput = "";
        string falseoutputfalsesamp = "";


        int com = 0;
        for (int k = 0; k < numberoftrainingex; k++)
        {
            TrueSamp[k] = new double[shit];
            if(k<100)
            TrueSampTEst[k] = new double[shit];
            FalseSamp[k] = new double[shit];
      }

        for (int k = 0; k < numberoftrainingex; k++)
        {
            for (int i = 0; i < 999; i++)
            {
                for (int j = 0; j < 705; j++)
                {
                    TrueSamp[k][com] = parce[k, i, j];
                    if (k < 100)
                        TrueSampTEst[k][com]= parceTest[k, i, j];
                    FalseSamp[k][com] = parceSig[k, i, j];
                     com++;
                }

            }
            com = 0;
        }

     
        
        var avloss = 0.0;
        string loss="";
        
           
            Net<double> net = new Net<double>();
     
            AdamTrainer Trex = new AdamTrainer(net)
            {
                LearningRate = LR,
                BatchSize=1
            };

        net.AddLayer(new InputLayer(999, 705, 1));
        net.AddLayer(new ConvLayer(11, 11, 5) { Stride = 1, Pad = 2 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new PoolLayer(3, 3) { Stride = 2 });
        net.AddLayer(new ConvLayer(5, 5, 16) { Stride = 1, Pad = 2 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new PoolLayer(3, 3) { Stride = 2 });
        net.AddLayer(new ConvLayer(3, 3, 40) { Stride = 1, Pad = 2 });
        net.AddLayer(new ReluLayer());
        net.AddLayer(new PoolLayer(3, 3) { Stride = 2 });
        net.AddLayer(new ConvLayer(3, 3, 80) { Stride = 1, Pad = 1 });
        net.AddLayer(new FullyConnLayer(2));
        net.AddLayer(new SoftmaxLayer(2));




        for (int yh = 0; yh < 100; yh++)
        {
            var x = BuilderInstance.Volume.From(TrueSamp[yh], new Shape(999, 705));
            var y = BuilderInstance.Volume.From(FalseSamp[yh], new Shape(999, 705));

            var zx = BuilderInstance.Volume.From(new[] { 1.0, 0.0 }, new Shape(1, 1, 2, 1));
            var zy = BuilderInstance.Volume.From(new[] { 0.0, 1.0 }, new Shape(1, 1, 2, 1));

            Trex.Train(x, zx); // train the network, specifying that x is class zero
            avloss += Trex.Loss;
            loss += "\r\n" + Trex.Loss;
     
            Trex.Train(y, zy); // train the network, specifying that x is class zero
            avloss += Trex.Loss;
            loss += "\r\n" + Trex.Loss;
        
            Random rand = new Random();
            double[] truesamp = new double[shit];
            truesamp = TrueSamp[yh];
            var rq = BuilderInstance.Volume.From(truesamp, new Shape(999 * 705));
            var probq = net.Forward(rq);
            trueoutput += "\r\n" + probq.Get(0);
            trueoutputfalsesamp += "\r\n" + probq.Get(1);

            double[] falsesamples = new double[shit];
            falsesamples = FalseSamp[yh];
            var rx = BuilderInstance.Volume.From(falsesamples, new Shape(999 * 705));
            var proby = net.Forward(rx);
            falseoutputfalsesamp += "\r\n" +proby.Get(0);
            falseoutput += "\r\n" +proby.Get(1);
                        
        }
        avloss= avloss/200;
        TestCON.Text += "av loss" + avloss;
        TestCON.Text += "loss by steps" + "\r\n" + loss;
        TestCON.Text += " test samples of class1 beeing class1" + "\r\n";
        TestCON.Text += trueoutput;
        TestCON.Text += " test samples of class1  beeing class2" + "\r\n";
        TestCON.Text += trueoutputfalsesamp;
        TestCON.Text += " test samples of class2 beeing class1" + "\r\n";
        TestCON.Text += falseoutput;
        TestCON.Text += "test samples of class2 beeing class2" + "\r\n";
        TestCON.Text += falseoutputfalsesamp;
        com = 0;
        TestCON.Text +="TEST 1";
        for (int j = 0; j < 100; j++)
        {
            Random rand = new Random();
            double[] truesamp = new double[shit];
            truesamp = TrueSamp[j+100];
            var rq = BuilderInstance.Volume.From(truesamp, new Shape(999 * 705));
            var probq = net.Forward(rq);
            TestCON.Text += "\r\n" + probq.Get(0);
        }
        TestCON.Text += "TEST 1 1";
        for (int j = 0; j < 100; j++)
        {
            Random rand = new Random();
            double[] truesamp = new double[shit];
            truesamp = TrueSamp[j+100];
            var rq = BuilderInstance.Volume.From(truesamp, new Shape(999 * 705));
            var probq = net.Forward(rq);
            TestCON.Text += "\r\n" + probq.Get(1);
        }
        TestCON.Text += "TEST 2";
        for (int j = 0; j < 100; j++)
        {
            Random rand = new Random();
            double[] truesamp = new double[shit];
            truesamp = TrueSampTEst[j];
            var rq = BuilderInstance.Volume.From(truesamp, new Shape(999 * 705));
            var probq = net.Forward(rq);
            TestCON.Text += "\r\n" + probq.Get(0);
        }
        TestCON.Text += "TEST 2 2";
        for (int j = 0; j < 100; j++)
        {
            Random rand = new Random();
            double[] truesamp = new double[shit];
            truesamp = TrueSampTEst[j];
            var rq = BuilderInstance.Volume.From(truesamp, new Shape(999 * 705));
            var probq = net.Forward(rq);
            TestCON.Text += "\r\n" + probq.Get(1);
        }

@CherryGoose
Copy link
Author

Here is the loss plot and probability plot after each epoch
image
image

@ren85
Copy link

ren85 commented Jul 14, 2019

I had simillar problems and net started working when I normalized input (0 - 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants