Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

when reshape the input layer of net in forward pass, segmentation fault will occurred, even the net only consists of conv and pool layers. #282

Open
yflv-yanxia opened this issue May 13, 2019 · 4 comments

Comments

@yflv-yanxia
Copy link

yflv-yanxia commented May 13, 2019

I compiled intel/caffe with
USE_MKLDNN_AS_DEFAULT_ENGINE := 1
CPU_ONLY := 1

for example:
first forward pass:
Blob* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1,1,32,100);
net_->Reshape();
...
net_->Forward();

second forward pass:
Blob* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1,1,32,200);
net_->Reshape();
...
net_->Forward();

then, segmentation fault occurred in the second forward pass.

If using MKL2017, i.e. compiled caffe with
USE_MKL2017_AS_DEFAULT_ENGINE := 1
CPU_ONLY := 1
this problem will not exist.

@ftian1
Copy link
Contributor

ftian1 commented May 16, 2019

MKL2017 is deprecated and should be not used.

as for net_->Reshape(), it will be invoked at net_->Forward(). your code makes this func invoked twice and bring state machine wrong. (The second time invocation thinks there is no reshape happened and would not create new mkldnn primitive according to correct shape and bring crash)

@yflv-yanxia

@yflv-yanxia
Copy link
Author

yflv-yanxia commented May 16, 2019

Thank you so much! It works now.
Another question:
If I set DISABLE_CONV_RELU_FUSION:= 1 in the Makefile.cofig, I can get correct results.
However if I set DISABLE_CONV_RELU_FUSION:= 0, my results will not always be right.
So I think maybe there are some bugs in the Conv + Relu fusion.
@ftian1

@ftian1
Copy link
Contributor

ftian1 commented May 16, 2019

@yflv-yanxia could you let me know the detail steps of reproducing the accuracy issue?

@yflv-yanxia
Copy link
Author

yflv-yanxia commented May 17, 2019

Yeah, I will extract some enssential codes and show the details in some days. Meanwhile, I have another problem:
There is a "global_pooling" layer in my net.prototxt, if I use mkldnn in this layer, i will get an error:

F0517 12:36:37.562144 12851 blob.cpp:73] Check failed: shape[i] >= 0 (-125 vs. 0)
*** Check failure stack trace: ***
Aborted (core dumped)

Then I set engine: CAFFE in this layer, there are no longer core dumped errros, but the final result of net is still not correct. So I replaced the intel-caffe/src/caffe/layers/pooling_layer.cpp with the same file in lastest master branch of bvlc caffe(https://github.com/BVLC/caffe), and also set engine: CAFFE, then I got correct result. So I think maybe there are some bugs in both intel-caffe/src/caffe/layers/mkldnn_pooling_layer.cpp and intel-caffe/src/caffe/layers/pooling_layer.cpp.
Here are my net.prototxt, net.caffemodel and intput sample:
Link:https://pan.baidu.com/s/1RkZh3m7PoujWr-EXg22soA
code:55hp
Maybe could help you to solver these problems.
@ftian1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants