Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"NaN" and the initialization of poses and transformation matrices #2

Open
Germanunkol opened this issue Dec 6, 2017 · 1 comment
Open

Comments

@Germanunkol
Copy link

Germanunkol commented Dec 6, 2017

I read in your Readme that you're having issues with getting NaNs as result.
I've had similar issues and found that after the first iteration, sometimes my activations would be negative. This causes an issue in the next iteration since the activations are used in the calculation of the variance in the MStep, which becomes negative. I then take the square root of that number to calculate the standard deviation and get a "nan".

I believe the issues went away when I fiddled with initial poses and transformation matrices and initialized them differently (make sure the transformation matrices are initialized randomly and the initial poses aren't all the same). I have to investigate further but I believe the trouble is when all votes lie close together.

All of this might be a bug in my code, but I thought I'd mention it in case your nan's come from a similar issue.

About the initialization of the poses: The paper uses a simple convolution to produce the initial poses in the primary layer, which I find confusing. Why throw away the spatial information if you can use it? I'm currently testing an initialization which is just the pose expressed as a transformation matrix:

1 0 0 x/w - 0.5
0 1 0 y/h - 0.5
0 0 1 0
0 0 0 1

(Ignore the bold formatting please, github doesn't support tables without headers.)
w: image width
h: image height
x: x-component of pixel's position in image (0...w)
y: y-component of pixel's position in image (0...h)

I then initialize the transformations as random values (similar to the weights in a normal convolutional network).

@gyang274
Copy link
Owner

gyang274 commented Dec 19, 2017

@Germanunkol Thank you for the comments and suggestions.

I indeed found that the NaN is due to the activations from the primary capsules, e.g., capsule activations initialized from regular convolution layer. From my understanding, the activations should always be >= 0, because they were results of tf.sigmoid(). The issue, however, could be too many activations became zero.

The above issue is partially confirmed:

  1. I run a test by initializing all primary capsule layer activations as fixed value 1, and there is no NaN issue up to 120K steps.
  2. I did see the Nan issue even with num_iterations=1, occasionally.

I am still working on find better solutions than initializing all primary capsule layer activations as fixed value 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants