VQA: Is LSTM Q+I info correct? #49

CalvinLeGassick · 2017-07-12T06:30:36Z

I'm concerned that the reported LSTM Q + I data may be incorrect. Looking at the different versions, I don't think LSTM Q + I was included in the analysis until v3 of the initial VQA dataset paper, published on October 15th 2015 (https://arxiv.org/abs/1505.00468v3). Additionally, v3 is the first version of the paper that explicitly reports any result on the test-standard used throughout the VQA 1.0 evaluations. LSTM Q + I is reported (because of it's superior performance on test-dev), and it achieves 54.06% accuracy in the open-answer framework. The 58.2% currently reported in the repo for LSTM Q + I seems to come from v5 onward, not reported until March 7 2016 on arxiv.

On the multiple choice side of things, it is not until v5 (https://arxiv.org/abs/1511.05756) that the initial VQA dataset paper explicitly states a best result on "test-standard". The 63.1% figure that is currently reported in this repo lines up with the results in the v5 paper, but again, the v5 paper was not submitted to arxiv until 7 Mar 2016.

I believe the following actions should be taken:

v3 of the initial VQA paper should be the starting point of the open-ended evaluations (15 Oct 2015 on arxiv).
It should be reflected that this initial v3 VQA paper reports an accuracy of 54.06% on test-standard, not 58.2%.
LSTM Q + I should not be the starting point of the multiple-choice evaluations, but it should be included in the graph on March 7th 2016. The earliest results I have for test-standard results in the multiple choice framework are from the DPPnet paper: https://arxiv.org/abs/1511.05756. Just one version on arxiv, November 18 2015.

The text was updated successfully, but these errors were encountered:

pde · 2017-07-12T23:56:18Z

I think this is a duplicate of issue #48

CalvinLeGassick · 2017-07-13T04:11:37Z

Got it. I think it is still worth noting that all of the other data points correctly reflect the accuracies/dates of the papers with respect to the open ended (oe) and multiple choice (mc) evaluation metric on test-std/test-standard.

From what I can see, the only place where confusion about real images / abstract scenes messed up the reported results is on the starting points for both the oe and mc graphs. I think if my above suggestions are implemented, the graphs will be correct.

pde closed this as completed Jul 12, 2017

CalvinLeGassick mentioned this issue Aug 18, 2017

Sort out VQA real vs abstract #48

Closed

pde reopened this Sep 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VQA: Is LSTM Q+I info correct? #49

VQA: Is LSTM Q+I info correct? #49

CalvinLeGassick commented Jul 12, 2017 •

edited

pde commented Jul 12, 2017

CalvinLeGassick commented Jul 13, 2017

VQA: Is LSTM Q+I info correct? #49

VQA: Is LSTM Q+I info correct? #49

Comments

CalvinLeGassick commented Jul 12, 2017 • edited

pde commented Jul 12, 2017

CalvinLeGassick commented Jul 13, 2017

CalvinLeGassick commented Jul 12, 2017 •

edited