Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test error results reported are actually loss figures and are not comparable with paper reported accuracies. #11

Open
geefer opened this issue Jun 13, 2018 · 3 comments

Comments

@geefer
Copy link

geefer commented Jun 13, 2018

Thanks for a very nicely presented implementation of Capsule Networks. I especially appreciate the tensorboard plots.

Unfortunately I believe you have mixed up "test error" with "test loss" when reporting your best results and comparing with the results from the paper.

The paper shows a table of test classification accuracy (Table 1) and reports a best error of 0.25%. This will have been calculated as:

(number of incorrectly classified test images) / (total number of test images) * 100%

Thus since there are 10,000 test images this equates to 25 mis-classified images for 0.25% error.

This is equivalent to an accuracy of 99.75%

Unfortunately you list test accuracy and test error figures that do not sum to 100% because you are listing the test loss figure which is not a useful measure of the classification accuracy of the network.

Although I have not seen an independent implementation on the net that claims to achieve this 99.75% figure, I have seen several that achieve greater than 99.6% (my own implementation has achieved 99.68% in 50 epochs). Since your best test accuracy is 99.32% it is possible that you have some error in your implementation as this is quite a way from the 99.75% achieved by the authors of the paper.

@cedrickchee
Copy link
Owner

Hi,

Apologize for the delayed response. Was really tied up with my studies. Nevertheless, a late reply is better than no reply. 🙂

very nicely presented implementation of Capsule Networks. I especially appreciate the tensorboard plots.

Thank you.

Unfortunately I believe you have mixed up "test error" with "test loss" when reporting your best results and comparing with the results from the paper.

The paper shows a table of test classification accuracy (Table 1) and reports a best error of 0.25%.

I am 😕 See my analysis of the paper below:

screenshot_capsnet_paper

Just to be very clear, my understanding of test error is, usually it's use interchangeably with test loss or validation/test error. What this means is, test error is the same as test loss.

Now, I am not sure is it me or the paper got mixed up test error and test accuracy.

Please correct me if I am wrong. BTW, my background is not in research and the project goal is for educational purposes. I am not trying to be rigorous when replicating the results. So, the 99.XX% accuracy is not a big deal for this project.

@geefer
Copy link
Author

geefer commented Jun 28, 2018

Hi,

Thanks for explaining your thinking. However I do not think the test error reported in the paper and test loss are the same thing at all (though I do understand why when looking at the loss values in the context of SGD some people refer to it measuring the error - but in that case it is a somewhat imprecise use of the term).

I believe that the test error reported by the paper is a measure of the fraction of incorrectly classified images (and thus can be sensibly represented as a percentage). That is why in my way of thinking accuracy and error are essentially two ways of looking at the same thing (accuracy being the fraction of correctly classified images)

The test loss that you are referring to (and which is shown on your tensorboard plots) is the result of applying the loss function to the output of the network - and is minimised using the back propagation process. The loss function is somewhat arbitrary and different loss functions could be chosen (they may or may not work well). For instance, if the reconstruction loss downscaling factor (set at 0.0005 in the paper) was changed, then the absolute loss value would change. Note that the loss is a number and not a fraction - and so it would not make sense to list it in the paper's performance table as a percentage.

It does not make sense that a paper would provide performance results of a classification network as a loss value as these do not actually give any absolute information about how well the network performs. Neither are the loss results comparable with results from other people's networks on the same dataset, as their loss functions could be utterly different. For instance, in the authors' paper "Matrix Capsules with EM Routing" they use a completely different loss function but still present Test Error results which should be able to be directly compared with the results in the "Dynamic routing Between Capsules" paper and with other networks built by other researchers - as the error measurement is an absolute measure of how well the network performs on the classification task.

I hope this clarifies my thinking better for you. I am no expert in this field but I believe my explanation seems sensible.

@mightydeveloper
Copy link

mightydeveloper commented Feb 6, 2019

I agree with @geefer
Error rate should be simply (1 - accuracy) and (test error) != (test loss)
The following sentence in README.md was very confusing for me.

The current test error is 0.21% and the best test error is 0.20%. The current test accuracy is 99.31% and the best test accuracy is 99.32%.

I suggest reporting the best error rate for this implementation is 0.68% while the paper is 0.25%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants