Skip to content
This repository has been archived by the owner on Feb 7, 2023. It is now read-only.

XavierFill correctness #2531

Open
Tezirg-Wrld3D opened this issue Jun 26, 2018 · 0 comments
Open

XavierFill correctness #2531

Tezirg-Wrld3D opened this issue Jun 26, 2018 · 0 comments

Comments

@Tezirg-Wrld3D
Copy link

I was looking at the implementation of the XavierFill operator : https://github.com/caffe2/caffe2/blob/0dd3284525079f3870df92f61bed3b94eb45ff53/caffe2/operators/filler_op.h#L434

But if we look at the formula 16 in the original paper: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
We can see that the values SHOULD be uniformly sampled in

[
- sqrt(6.0 / (output->size() +  output->dim32(0))), 
sqrt(6.0 / (output->size() + output->dim32(0)))
]

For my particular use case, I have an approxiamte factor 10^2 difference. Can someone clarify why this XavierFill implementation doesnt look correct ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant