mistake in Bayes by Backprop from scratch #564

Toooodd · 2019-04-25T09:45:09Z

def evaluate_accuracy(data_iterator, net, layer_params): numerator = 0. denominator = 0. for i, (data, label) in enumerate(data_iterator): data = data.as_in_context(ctx).reshape((-1, 784)) label = label.as_in_context(ctx) output = net(data, layer_params) predictions = nd.argmax(output, axis=1) numerator += nd.sum(predictions == label) denominator += data.shape[0] return (numerator / denominator).asscalar()

I think that layer_params should not be a fixed value when u predict the model, it would change every time u predict

The text was updated successfully, but these errors were encountered:

ykun91 · 2019-04-25T11:40:02Z

I guess the author did that purposely. since when evaluating the accuracy of an clustering problem, we just take the maximum value of output mean as the network's answer, and ignore output variance information. so the author did not sampling the weight and make the network just output the mean value.

Toooodd · 2019-04-25T11:53:47Z

you are right, Yang. But I still think that when we predict, we should consider the disturbance term of W and compute the results multiple times and then average. This is more in line with the original intent of the article.

ykun91 · 2019-04-25T12:36:20Z

yeah but in my opinion, the mean of network output is decided by the μ parameter in that network, and the variance of output will be decided by σ parameter which σ=log(1+exp(ρ)). We can simply get the average by disable weight sampling and only use the μ to predict once.

If you take μ + σ・ε to make predict multiple times and then average, I think the average will finally converge to the μ. so it maybe nonsense to do like that, I think...

Toooodd · 2019-04-25T12:45:45Z

yeah, you are absolutely right, and I also recognize your opinion. But what I want to talk about is that it may be more in line with the original intent of the article, and represent the advantages of this method when predict the unseen data and plot it.
haha, nice to meet u, yang! you are so active, :)

ykun91 · 2019-04-25T13:01:06Z

nice to meet u too. :) and, I think the problem is, if you want to take the advantages of the σ・ε in an clustering problem, you need to make a method to evaluate accuracy which consider about the variance. for example, if the network output a best answer with large variance and a better answer with small variance, take the better as the finally answer.

Toooodd · 2019-04-25T13:07:42Z

that's great solution. and I suddenly realized you are right from practical and academic perspective.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mistake in Bayes by Backprop from scratch #564

mistake in Bayes by Backprop from scratch #564

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019 •

edited

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019

Toooodd commented Apr 25, 2019

mistake in Bayes by Backprop from scratch #564

mistake in Bayes by Backprop from scratch #564

Comments

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019 • edited

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019

Toooodd commented Apr 25, 2019

ykun91 commented Apr 25, 2019 •

edited