Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mistake in Bayes by Backprop from scratch #564

Open
Toooodd opened this issue Apr 25, 2019 · 6 comments
Open

mistake in Bayes by Backprop from scratch #564

Toooodd opened this issue Apr 25, 2019 · 6 comments

Comments

@Toooodd
Copy link

Toooodd commented Apr 25, 2019

def evaluate_accuracy(data_iterator, net, layer_params): numerator = 0. denominator = 0. for i, (data, label) in enumerate(data_iterator): data = data.as_in_context(ctx).reshape((-1, 784)) label = label.as_in_context(ctx) output = net(data, layer_params) predictions = nd.argmax(output, axis=1) numerator += nd.sum(predictions == label) denominator += data.shape[0] return (numerator / denominator).asscalar()

I think that layer_params should not be a fixed value when u predict the model, it would change every time u predict

@ykun91
Copy link

ykun91 commented Apr 25, 2019

I guess the author did that purposely. since when evaluating the accuracy of an clustering problem, we just take the maximum value of output mean as the network's answer, and ignore output variance information. so the author did not sampling the weight and make the network just output the mean value.

@Toooodd
Copy link
Author

Toooodd commented Apr 25, 2019

you are right, Yang. But I still think that when we predict, we should consider the disturbance term of W and compute the results multiple times and then average. This is more in line with the original intent of the article.

@ykun91
Copy link

ykun91 commented Apr 25, 2019

yeah but in my opinion, the mean of network output is decided by the μ parameter in that network, and the variance of output will be decided by σ parameter which σ=log(1+exp(ρ)). We can simply get the average by disable weight sampling and only use the μ to predict once.

If you take μ + σ・ε to make predict multiple times and then average, I think the average will finally converge to the μ. so it maybe nonsense to do like that, I think...

@Toooodd
Copy link
Author

Toooodd commented Apr 25, 2019

yeah, you are absolutely right, and I also recognize your opinion. But what I want to talk about is that it may be more in line with the original intent of the article, and represent the advantages of this method when predict the unseen data and plot it.
haha, nice to meet u, yang! you are so active, :)

@ykun91
Copy link

ykun91 commented Apr 25, 2019

nice to meet u too. :) and, I think the problem is, if you want to take the advantages of the σ・ε in an clustering problem, you need to make a method to evaluate accuracy which consider about the variance. for example, if the network output a best answer with large variance and a better answer with small variance, take the better as the finally answer.

@Toooodd
Copy link
Author

Toooodd commented Apr 25, 2019

that's great solution. and I suddenly realized you are right from practical and academic perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants