Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VQA: Is LSTM Q+I info correct? #49

Open
CalvinLeGassick opened this issue Jul 12, 2017 · 2 comments
Open

VQA: Is LSTM Q+I info correct? #49

CalvinLeGassick opened this issue Jul 12, 2017 · 2 comments

Comments

@CalvinLeGassick
Copy link

CalvinLeGassick commented Jul 12, 2017

I'm concerned that the reported LSTM Q + I data may be incorrect. Looking at the different versions, I don't think LSTM Q + I was included in the analysis until v3 of the initial VQA dataset paper, published on October 15th 2015 (https://arxiv.org/abs/1505.00468v3). Additionally, v3 is the first version of the paper that explicitly reports any result on the test-standard used throughout the VQA 1.0 evaluations. LSTM Q + I is reported (because of it's superior performance on test-dev), and it achieves 54.06% accuracy in the open-answer framework. The 58.2% currently reported in the repo for LSTM Q + I seems to come from v5 onward, not reported until March 7 2016 on arxiv.

On the multiple choice side of things, it is not until v5 (https://arxiv.org/abs/1511.05756) that the initial VQA dataset paper explicitly states a best result on "test-standard". The 63.1% figure that is currently reported in this repo lines up with the results in the v5 paper, but again, the v5 paper was not submitted to arxiv until 7 Mar 2016.

I believe the following actions should be taken:

  • v3 of the initial VQA paper should be the starting point of the open-ended evaluations (15 Oct 2015 on arxiv).
  • It should be reflected that this initial v3 VQA paper reports an accuracy of 54.06% on test-standard, not 58.2%.
  • LSTM Q + I should not be the starting point of the multiple-choice evaluations, but it should be included in the graph on March 7th 2016. The earliest results I have for test-standard results in the multiple choice framework are from the DPPnet paper: https://arxiv.org/abs/1511.05756. Just one version on arxiv, November 18 2015.
@pde
Copy link
Member

pde commented Jul 12, 2017

I think this is a duplicate of issue #48

@pde pde closed this as completed Jul 12, 2017
@CalvinLeGassick
Copy link
Author

Got it. I think it is still worth noting that all of the other data points correctly reflect the accuracies/dates of the papers with respect to the open ended (oe) and multiple choice (mc) evaluation metric on test-std/test-standard.

From what I can see, the only place where confusion about real images / abstract scenes messed up the reported results is on the starting points for both the oe and mc graphs. I think if my above suggestions are implemented, the graphs will be correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants