Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense layer without activation? #8

Open
owen8877 opened this issue Nov 24, 2020 · 7 comments
Open

Dense layer without activation? #8

owen8877 opened this issue Nov 24, 2020 · 7 comments

Comments

@owen8877
Copy link

In

global_features2 = keras.layers.Dense(1024)(global_features2)

there is no activation parameter designated in the Dense layer. In keras 2.3.1, the default activation is linear activation (i.e. no activation):

tf.keras.layers.Dense(
    units,
    activation=None,
    ...
@motionlife
Copy link

motionlife commented Dec 17, 2020

In

global_features2 = keras.layers.Dense(1024)(global_features2)

there is no activation parameter designated in the Dense layer. In keras 2.3.1, the default activation is linear activation (i.e. no activation):

tf.keras.layers.Dense(
    units,
    activation=None,
    ...

Just want to send the class semantics for colorization head(a.k.a the decoder), so only logits is enough, probably more stable than a normalized(softmax) result.

@owen8877
Copy link
Author

But it makes no sense to have multiple fully-connected hidden layers without activation - that is equivalent to (or potentially less than) just one fully-connected layer.

@motionlife
Copy link

motionlife commented Dec 17, 2020

@owen8877 yes, your are right, just take another look at the code, even the classification head's dense layers didn't have activation, consider that orignal vgg has relu for both the 4096-dense layers.

@owen8877
Copy link
Author

@motionlife That's true. I hope they can fix the mistake and probably yield better performance.

@motionlife
Copy link

@owen8877 Or just use one dense layer to make model smaller

@owen8877
Copy link
Author

@motionlife Well, we might as well stick to the vanilla VGG16 design since there is a classification loss against the pre-trained VGG16 model.

outputs=[ predAB, classVector, discPredAB])

I doubt there might be a performance recession if we cut the FC layers thin.

@motionlife
Copy link

@motionlife yes understand, I mean if there is no performance boost when add back activation. As you said, why not just use one dense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants