You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read in your Readme that you're having issues with getting NaNs as result.
I've had similar issues and found that after the first iteration, sometimes my activations would be negative. This causes an issue in the next iteration since the activations are used in the calculation of the variance in the MStep, which becomes negative. I then take the square root of that number to calculate the standard deviation and get a "nan".
I believe the issues went away when I fiddled with initial poses and transformation matrices and initialized them differently (make sure the transformation matrices are initialized randomly and the initial poses aren't all the same). I have to investigate further but I believe the trouble is when all votes lie close together.
All of this might be a bug in my code, but I thought I'd mention it in case your nan's come from a similar issue.
About the initialization of the poses: The paper uses a simple convolution to produce the initial poses in the primary layer, which I find confusing. Why throw away the spatial information if you can use it? I'm currently testing an initialization which is just the pose expressed as a transformation matrix:
1
0
0
x/w - 0.5
0
1
0
y/h - 0.5
0
0
1
0
0
0
0
1
(Ignore the bold formatting please, github doesn't support tables without headers.)
w: image width
h: image height
x: x-component of pixel's position in image (0...w)
y: y-component of pixel's position in image (0...h)
I then initialize the transformations as random values (similar to the weights in a normal convolutional network).
The text was updated successfully, but these errors were encountered:
@Germanunkol Thank you for the comments and suggestions.
I indeed found that the NaN is due to the activations from the primary capsules, e.g., capsule activations initialized from regular convolution layer. From my understanding, the activations should always be >= 0, because they were results of tf.sigmoid(). The issue, however, could be too many activations became zero.
The above issue is partially confirmed:
I run a test by initializing all primary capsule layer activations as fixed value 1, and there is no NaN issue up to 120K steps.
I did see the Nan issue even with num_iterations=1, occasionally.
I am still working on find better solutions than initializing all primary capsule layer activations as fixed value 1.
I read in your Readme that you're having issues with getting NaNs as result.
I've had similar issues and found that after the first iteration, sometimes my activations would be negative. This causes an issue in the next iteration since the activations are used in the calculation of the variance in the MStep, which becomes negative. I then take the square root of that number to calculate the standard deviation and get a "nan".
I believe the issues went away when I fiddled with initial poses and transformation matrices and initialized them differently (make sure the transformation matrices are initialized randomly and the initial poses aren't all the same). I have to investigate further but I believe the trouble is when all votes lie close together.
All of this might be a bug in my code, but I thought I'd mention it in case your nan's come from a similar issue.
About the initialization of the poses: The paper uses a simple convolution to produce the initial poses in the primary layer, which I find confusing. Why throw away the spatial information if you can use it? I'm currently testing an initialization which is just the pose expressed as a transformation matrix:
(Ignore the bold formatting please, github doesn't support tables without headers.)
w: image width
h: image height
x: x-component of pixel's position in image (0...w)
y: y-component of pixel's position in image (0...h)
I then initialize the transformations as random values (similar to the weights in a normal convolutional network).
The text was updated successfully, but these errors were encountered: