Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have question about loss function. something wrong, i think....? #71

Open
hhjung1202 opened this issue May 16, 2018 · 3 comments
Open

Comments

@hhjung1202
Copy link

There seems to be a problem in V_k when calculating the loss in the function.

in capsNet.py at loss function (line 106?)

max_l = tf.square(tf.maximum(0., cfg.m_plus - self.v_length))
max_r = tf.square(tf.maximum(0., self.v_length - cfg.m_minus))

you use self.v_length when calculating max_l and max_r.
self.v_length = tf.sqrt(reduce_sum(tf.square(self.caps2),
axis=2, keepdims=True) + epsilon)
self.caps2 is V_J and V_J's Dim is [batch_size, 10, 16, 1]
V_J = squash(S_J), so all of value in V_J is (-1, 1)

if it calculate self.v_length, the dim of self.v_length is [batch_size, 10, 1, 1] and interval of value is (-4, 4)

so i think, it has to change like this
max_l = tf.square(tf.maximum(0., cfg.m_plus - (self.v_length)/4))
max_r = tf.square(tf.maximum(0., (self.v_length)/4 - cfg.m_minus))

if it is wrong, Can you tell me what 's wrong?

@hhjung1202
Copy link
Author

im sorry i know that vj is like unit vector of 16 D

@naturomics
Copy link
Owner

You do not understand squashing correctly. Squashing ensures the length (euclidean norm) of its output vector is in [0, 1], not about the element in the vector. Have fun with the following code:

import numpy as np
from matplotlib import pyplot as plt

def squash(vector, axis=None):

norm = np.linalg.norm(vector, axis=axis, keepdims=True)`
norm_squared = np.square(norm)
scalar_factor = norm_squared / (1 + norm_squared)
return scalar_factor * (vector / norm)

#Create 10000 samples each with 3 elements
num_samples = 10000
c1 = np.random.uniform(size=(num_samples, 1))
c2 = np.random.normal(size=(num_samples, 1))
c3 = np.random.logistic(size=(num_samples, 1))
vectors = np.hstack((c1, c2, c3)) # [num_samples, 3]
squashed_vector = squash(vectors, axis=1)
length = np.sqrt(np.sum(np.square(squashed_vector), axis=1))
plt.plot(np.sort(length))

@parinaya-007
Copy link

Look "reduce_mean" function in utils.py carefully and also after this function is applied using axis=2, we are taking the mean of the sum of squared sum of vector so the range is not (-4, 4) it is (0, 1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants