Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion about "pretraining" #130

Open
mikerabat opened this issue Feb 14, 2024 · 13 comments
Open

discussion about "pretraining" #130

mikerabat opened this issue Feb 14, 2024 · 13 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mikerabat
Copy link

A colleague of mine made the comment I should pretraing the models to yield more robust models and
better accuracy. Now... how can I do that - or ... what are possible avenues here?

My models are all based on ecg (1 dim, up to 3 channels which I basically encode as "RGB" ).
He hinted that one should first train the dataset differently... basically use a few convolutional layers to
"compress" the signal and then "expand" the layers again. The pretraining goal is to "reconstruct" the input signal (am I right here???)
so input is output. (I know that this can be done in an old fashioned 3 layer NN approach which yields to the PCA - is that here the same?)

After pretraining - cut off some of the layers (how many is good?) - add new ones so the real classification task can be achieved and train again with the real classification task.

Is this something reasonable? And is there maybe an example out there that shows an efficent way on how to do that?

@joaopauloschuler
Copy link
Owner

Hello @mikerabat ,
There is a curse in computer science: algorithms can always be improved making the job never ending.

Regarding "A colleague of mine made the comment I should pretraining the models to yield more robust models and
better accuracy. Now... how can I do that - or ... what are possible avenues here?
", it sounds like a concept called "Transfer Learning":
https://machinelearningmastery.com/transfer-learning-for-deep-learning/

Regarding "He hinted that one should first train the dataset differently... basically use a few convolutional layers to
"compress" the signal and then "expand" the layers again. The pretraining goal is to "reconstruct" the input signal (am I right here???) so input is output. (I know that this can be done in an old fashioned 3 layer NN approach which yields to the PCA - is that here the same?)
", he is talking about using first an autoencoder before the actual classification:
https://github.com/joaopauloschuler/neural-api/tree/master/examples/VisualAutoencoder

After training the autoencoder, the "decoder" is then removed and the NN is trained for classification.

Although I'm skeptical about a single solution that works well for all problems, if you google for solutions using transfer learning and autoencoding, you'll find it. From experience, speculating "this will improve" is easy. If it really improves after days of work and retraining for a particular application, it's completely different. I think that it comes to the question what is the accuracy that you need and how much you intend to spend improving neural models.

In the case that your colleague is certain about what he is saying, you could ask for the actual scientific papers (blog posts/web pages are not sufficient scientific evidence) and why does he believe that these solutions are applicable to your use case.

I did experimentation myself comparing some convolutional layers with a PCA output and I found high similarity among convolution and PCA except for the activation function.

If you like, I can code an example using an autoencoder for image classification.

@joaopauloschuler joaopauloschuler self-assigned this Feb 17, 2024
@joaopauloschuler joaopauloschuler added the documentation Improvements or additions to documentation label Feb 17, 2024
@mikerabat
Copy link
Author

mikerabat commented Mar 14, 2024

Dear Joaopaulo!

Thanks for the valuable input!
Actually we would like to do ECG classification... We have 1 to 12 "channels" (like RGB in images) and the positions of "beats".
The idea is to use +-2 seconds (more or less... that is still up for discussion) around that beat as input for the beat classifier.

The data is here currently organized as
1024x1x3 (for 3 channels) as input (X x Y x Depth)

So the idea was to first do some pretraining - then split of the Encoder part and use these as "first stage" in
the actual classifier that then later on does the a classification in 3 to 4 classes (depending on what we decide to do...)
I also created a little class that contains the ecg and shuffels the data to the training process via the event handlers...

So here I am currently (stripped version)

```
 numNeurons := 128;
 // Default: 70% training, 15% validation 15% test
 trainSetPercent := 70;
 evalSetPercent := 15; // rest is test set..

 if FindCmdLineSwitch('trainset', value, true) then
    TryStrToInt(value, trainSetPercent);
 if FindCmdLineSwitch('evalset', value, true) then
    TryStrToInt(value, evalSetPercent);
 if FindCmdLineSwitch('numepochs', value, true ) then
    TryStrToInt(value, numEpochs );
 if FindCmdLineSwitch('batchsize', value, true) then
    TryStrToInt(value, batchSize);
 if FindCmdLineSwitch('NumNeurons', value, true) then
    TryStrToInt(value, numNeurons);

 trainSetPercent := Max(0, Min(90, trainSetPercent));
 evalSetPercent := Max(0, Min(100 - trainSetPercent, evalSetPercent));

 useOpenCL := False; // not FindCmdLineSwitch('noopencl', ['-'], True);

 Writeln('dB Description: ' + beatDB.Description );

 Writeln('Extracting beat data set feature set');

 Writeln('Finished reading db');


 // ###########################################
 // #### Create an autoencoder net mapping the same
 // features as the corresponding net on BeatAnalysisNeralNet64.dpr
 clFile := ParamStr(2);

 // ###########################################
 // #### Create neural net
 WriteLn('Creating Neural Network...');

 NN := THistoricalNets.Create;

 if (ParamCount > 1) and FindCmdLineSwitch('Net:Beat1', ['-'], True) then
 begin
      inputLayer := TNNetInput.Create(numFeatures, 1, numChanPerRow);       // either single chan or 3 chan (encoded as "RGB")
      NN.AddLayer( inputLayer );
      if not isNormalized then
         NN.AddMovingNorm(False, inputLayer);

      // Encoder
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 48, 0, 1 ) );           //
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 48, 0, 2 ) );           //
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 2 ) );        // stride 2 maybe can be exchanged by maxpool
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 48, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 48, 0, 2 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 48, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 48, 0, 2 ) );

      // middle layer
      // that does not work... starting with just upsampling...
      // NN.AddLayer( TNNetFullConnectSigmoid.Create( numNeurons, 1, NN.Layers[ NN.Layers.Count - 1].Output.Depth ) );

      // decoder upsample until we have enough layers
      NN.AddLayer( TNNetUpsample.Create() );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 48, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 48, 0, 1 ) );
      NN.AddLayer( TNNetUpsample.Create() );

      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 48, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 48, 0, 1 ) );
      NN.AddLayer( TNNetUpsample.Create() );

      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 1 ) );
      NN.AddLayer( TNNetUpsample.Create() );

      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 1 ) );

      if numNeurons * (2 shl (5 - 2*Integer( (numNeurons >= 100) ) ) ) < numFeatures then
         NN.AddLayer( TNNetUpsample.Create() );

      // final layer that recreates the original output.
      NN.AddLayer( TNNetFullConnectLinear.Create( numFeatures, 1, numChanPerRow ) );
      NN.AddLayer( TNNetReLUL.Create(-10, +10, 0) ); // Protection against overflow
 end;

 NN.InitWeights;

 NN.DebugStructure();

 Writeln( 'Training for ', numEpochs, ' epochs');
 Writeln( 'Press <ENTER> to proceed...');
 readln;
 TotalStart := GetTickCount;

 // ###########################################
 // #### Now create the specialized ecg fitting class
 NeuralFit := TAutoEncoderECGLoading.Create( beatDB, trainSetPercent, evalSetPercent );

 {$IFDEF DEBUG}
 //NeuralFit.MaxThreadNum := 1;
 {$ENDIF}

 Writeln('Train Examples ', NeuralFit.NumTrain );
 Writeln('Validation Examples ', NeuralFit.NumEval );
 Writeln('Test Examples ', NeuralFit.NumTest );

 NeuralFit.FileNameBase :=
       'AutEncoder_' + FormatDateTime( 'ddmmyy_hhnn', now );

 // params from the visual autoencoder example:
 // https://github.com/joaopauloschuler/neural-api/blob/master/examples/VisualAutoencoder/uvisualautoencodertinyimagenet.pas
 NeuralFit.InferHitFn := @LocalFloatCompare;
 NeuralFit.LearningRateDecay := 0.0;
 NeuralFit.L2Decay := 0.0;
 NeuralFit.AvgWeightEpochCount := 1;
 NeuralFit.InitialLearningRate := 0.0001;
 NeuralFit.ClipDelta := 0.01;
 NeuralFit.EnableBipolar99HitComparison;

 //NeuralFit.MinLearnRate := 0.00001;
 //NeuralFit.StaircaseEpochs := 5;
 NeuralFit.MaxCropSize := 0;

 //NeuralFit.LossFn := TNeuralFitHack(NeuralFit).DefaultLossFn;

 netSave := TNotifyEvtObj.Create(NeuralFit);
 NeuralFit.OnAfterEpoch := netSave.OnAfterEpoch;

 EasyOpenCL := TEasyOpenCL.Create();
 if useOpenCL then
    EasyOpenCL.LoadPlatforms;

 if EasyOpenCL.GetPlatformCount() > 0 then
 begin
      WriteLn('Setting platform to: ', EasyOpenCL.PlatformNames[0]);
      EasyOpenCL.SetCurrentPlatform(EasyOpenCL.PlatformIds[0]);
      if EasyOpenCL.GetDeviceCount() > 0 then
      begin
           EasyOpenCL.SetCurrentDevice(EasyOpenCL.Devices[0]);
           WriteLn('Setting device to: ', EasyOpenCL.DeviceNames[0]);
           NeuralFit.EnableOpenCL(EasyOpenCL.PlatformIds[0], EasyOpenCL.Devices[0]);
      end
      else
      begin
           WriteLn('No OpenCL capable device has been found for platform ',EasyOpenCL.PlatformNames[0]);
           WriteLn('Falling back to CPU.');
      end;
 end
 else
 begin
      WriteLn('No OpenCL platform has been found. Falling back to CPU.');
 end;

 NeuralFit.Fit(NN, {batchsize=}batchSize, {epochs=}numEpochs);
 NeuralFit.Free;
 EasyopenCL.Free;
 netSave.Free;

 NN.SaveToFile('NN.dat');
 NN.Free;
 beatDB.Free;

Actually I think I have a bit of a misunderstanding here...
I thought that there could be a middle layer (e.g. a sigmoid one) so I tried some full connect layers.
So how would I need to setup this?
Also could I reduce the convolutional layers to either max or average pooling ones that do the same?

Also is it a good idea to change the filter lengths to non power of 2s? My initial beat analysis tests had some made up
sizes that I basically took from a few other papers...

Also the opencl code does not work - the kernels can be created but every time the calculaton gets called there is an error -30...







@joaopauloschuler
Copy link
Owner

Dear @mikerabat !

Feel free to use 512 input channels (instead of 3 for RGB) and beyond if you need. I have already used more than 512 channels with hyperspectral images.

In the following code:

      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 48, 0, 1 ) );           //
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 48, 0, 2 ) );           //
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 32, 0, 2 ) );        // stride 2 maybe can be exchanged by maxpool
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 48, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 48, 0, 2 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 48, 0, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 48, 0, 2 ) );

I would decrease the filter size to 3 (as per https://medium.com/@siddheshb008/vgg-net-architecture-explained-71179310050f ) .
Using stride instead of MaxPool is a good idea. Keep it.
I would use padding. The code would look like this:

      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 3, 1, 1 ) );        //
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 3, 1, 2 ) );        //
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 3, 1, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 32, 3, 1, 2 ) );        // stride 2 maybe can be exchanged by maxpool
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 3, 1, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 64, 3, 1, 2 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 3, 1, 1 ) );
      NN.AddLayer( TNNetConvolutionReLU.Create( 128, 3, 1, 2 ) );

Using NN.AddLayer( TNNetUpsample.Create() ); is also a good idea. Keep it.

While decoding, use the same filter sizes and padding of the encoding.

I would first code a standard image classifier and use it as a baseline. When you come with a more or less good architecture, you can use it as a benchmark to compare against the improvement to be made with the encoder/decoder architecture.

I'll code an example showing how to load just the encoder of a trained neural network and post here along weekend.

@joaopauloschuler
Copy link
Owner

Doubling the filter size at each maxpool or stride is a good idea.

@joaopauloschuler
Copy link
Owner

I would use fully connected layers with maxpool only for the actual image classification (not for the autoencoder).

@mikerabat
Copy link
Author

Dear Jaopaulo! MANY thanks for the valuabel input!

First I need to deal with ECG so... 1d signals with at max up to 12 channels (we mostly deal with 3 hence the RGB equivalent ;) )
Regarding the filter size - My intuition was to have kernels the size of the features a human would be interested in. In ECG that would be the QRS complex, the T wave and the P waves. These are around 80 - 120 ms (aka around 32samples) or the whole complex which would have 400 to 500 ms (so around 120samples). Is that reasonable or are these networks really that powerfull so we can use lower sized kernels....

@joaopauloschuler
Copy link
Owner

joaopauloschuler commented Mar 16, 2024

Dear @mikerabat ,
have a look at this link please: https://poe.com/s/zm7ERt1OgV8SsRw4WWX8 .

The reply was given by Claude (not me - although Claude signed as me).

In my opinion, my first attempt would be starting with 3x3 kernels and let the NN to learn the features in deeper layers. Each time that you do a stride, you also double the receptive field of the neurons in deeper layers. I would expect the last layers of the NN to learn humanly meaningful concepts only.

@mikerabat
Copy link
Author

mikerabat commented Mar 18, 2024

Ok this makes sense... Thank you very much...

I now played around a bit to start the auto encoder learning process:
Here is one of the simpiest nets I could imagine:

         inputLayer := TNNetInput.Create(numFeatures, 1, numChanPerRow);       // either single chan or 3 chan (encoded as "RGB")
         NN.AddLayer( inputLayer );

         // Test: onelayer encoder one layer decoder
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 2, 1 ) );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );

         NN.AddLayer( TNNetUpsample.Create() );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );
         NN.AddLayer( TNNetConvolutionLinear.Create( 3, 1, 0, 1, 0 ) );

         // final layer that recreates the original output.
        // NN.AddLayer( TNNetFullConnectLinear.Create( numFeatures, 1, numChanPerRow ) );
         NN.AddLayer( TNNetReLUL.Create(-10, +10, 0) ); // Protection against overflow

NumFeatures = 1024 (around 4 seconds of ecg), y is 1 since I have 1 dim data and 3 ecg channels (numChanPerRow)

Forgive my stupid question here but
my Problem is actually that the output layer size does not match the input size so the standard auto encoder approach in the trainging Input = Output cannot be applied. I always needed to introduce a fully connected (linear) layer that represents the input
dimension.

Maybe my problem is the understanding of the upsample class... As far as I can see the upsample class assumes that x and y dimensions of the input space are the same and it doubles in the y direction as well....

@joaopauloschuler
Copy link
Owner

I would change this:

         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );
         NN.AddLayer( TNNetUpsample.Create() );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );

to this:

         NN.AddLayer( TNNetConvolutionReLU.Create( 8*4, 3, 1, 1, 1 ) );
         NN.AddLayer( TNNetUpsample.Create() );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );

I would multiply the number of input channels by 4 before each TNNetUpsample. Does this reply solve the question?

@mikerabat
Copy link
Author

Unfortunately not...

my problem is tha the input dimensions are 1024x1x3 (aka 4sec of ecg x 1 dim input x 3 channels of ecg)
and that the output after the layers:

         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 2, 1 ) );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );

         NN.AddLayer( TNNetUpsample.Create() );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );
         NN.AddLayer( TNNetConvolutionReLU.Create( 8, 3, 1, 1, 1 ) );
         NN.AddLayer( TNNetConvolutionLinear.Create( 3, 1, 0, 1, 0 ) );

actually is:

1024 x 6 x 8 instead of the 1024 x 1 x 3 as anticipated....

I found a resize layer but that I think only works if the product of all dimension would be the same like
512 x 2 x 3 could be resized to 1024 x 1 x3 right?

but 1024 x 6 x 8 cannot be resized to 1024 x 1 x 3 since that would lose some data points right?

I also tried a fully connected linear layer but that actually would result in a quite large parameter space....

@joaopauloschuler
Copy link
Owner

AH! I now understand what you say. I see: the padding is making the second dimension to grow.

In the encoder side, what you can do is:

  • In the encoder side, before the convolution, you'll add padding into the X axis only with: "TNNetPadXY.Create(1,0)". Then, you'll remove the padding in the convolutional layer. I do this trick with NLP. This is an example:
      NN.AddLayer([
        TNNetPadXY.Create(1,0),
        TNNetConvolution.Create(8, 3, 0, 1, 1)
      ]); 

For the decoder side, I don't have 1D upsampler. BUT, you could do something like this:

NN.AddLayer( TNNetPadXY.Create(1,0) );
PreviousLayer := NN.AddLayer( TNNetConvolutionReLU.Create( 8*2, 3, 0, 1, 1 ) );
NN.AddLayer( TNNetReshape.Create(PreviousLayer.Output.SizeX*2, PreviousLayer.Output.SizeY, PreviousLayer.Output.Depth div 2) );

Do you think that I it could work?

@joaopauloschuler
Copy link
Owner

joaopauloschuler commented Mar 21, 2024

In this message, I'm not suggesting anything. I'm just bringing to attention existing layers.

Via transposing outputs and running pointwise convolutions, you can transform a 2D output into a cube.

Assume that your input (1024, 1, 3) could be transposed into (1024, 3, 1) via TNNetTransposeYD.Create(). Then, via a standard ReLU convolution, you can create 32 channels with 32 neurons into (1024, 3, 32). Then, if you transpose with TNNetTransposeXD.Create(), you'll get (32,3,1024). If you then run a pointwise convolution with 32 neurons, you'll get (32, 3, 32). If you then call TNNetTransposeYD.Create() and do another pointwise convolution with 32 neurons, you'll end with (32, 32, 32). I'm not saying that this is useful in your case. I'm just saying that it's possible.

Another super crazy idea would be calling (still in beta version / not fully tested) THistoricalNets.AddTransformerBlock(false, 2). Thinking well, better not to use THistoricalNets.AddTransformerBlock(false, 2) until it's fully tested.

Another idea: if you get "overflows" while training, you can call NN.MulWeights(0.1) just before calling fit. This method will multiply by 0.1 all weights of the NN.

This is just a brainstorm. I have never experimented an autoencoder with transposes... But, resizing the 1024,1,3 into 32,32,3 could work...

@mikerabat
Copy link
Author

omg.. thank you for your valuable input :)
I will check as soon I find time ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants