You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm concerned that the reported LSTM Q + I data may be incorrect. Looking at the different versions, I don't think LSTM Q + I was included in the analysis until v3 of the initial VQA dataset paper, published on October 15th 2015 (https://arxiv.org/abs/1505.00468v3). Additionally, v3 is the first version of the paper that explicitly reports any result on the test-standard used throughout the VQA 1.0 evaluations. LSTM Q + I is reported (because of it's superior performance on test-dev), and it achieves 54.06% accuracy in the open-answer framework. The 58.2% currently reported in the repo for LSTM Q + I seems to come from v5 onward, not reported until March 7 2016 on arxiv.
On the multiple choice side of things, it is not until v5 (https://arxiv.org/abs/1511.05756) that the initial VQA dataset paper explicitly states a best result on "test-standard". The 63.1% figure that is currently reported in this repo lines up with the results in the v5 paper, but again, the v5 paper was not submitted to arxiv until 7 Mar 2016.
I believe the following actions should be taken:
v3 of the initial VQA paper should be the starting point of the open-ended evaluations (15 Oct 2015 on arxiv).
It should be reflected that this initial v3 VQA paper reports an accuracy of 54.06% on test-standard, not 58.2%.
LSTM Q + I should not be the starting point of the multiple-choice evaluations, but it should be included in the graph on March 7th 2016. The earliest results I have for test-standard results in the multiple choice framework are from the DPPnet paper: https://arxiv.org/abs/1511.05756. Just one version on arxiv, November 18 2015.
The text was updated successfully, but these errors were encountered:
Got it. I think it is still worth noting that all of the other data points correctly reflect the accuracies/dates of the papers with respect to the open ended (oe) and multiple choice (mc) evaluation metric on test-std/test-standard.
From what I can see, the only place where confusion about real images / abstract scenes messed up the reported results is on the starting points for both the oe and mc graphs. I think if my above suggestions are implemented, the graphs will be correct.
I'm concerned that the reported LSTM Q + I data may be incorrect. Looking at the different versions, I don't think LSTM Q + I was included in the analysis until v3 of the initial VQA dataset paper, published on October 15th 2015 (https://arxiv.org/abs/1505.00468v3). Additionally, v3 is the first version of the paper that explicitly reports any result on the
test-standard
used throughout the VQA 1.0 evaluations. LSTM Q + I is reported (because of it's superior performance ontest-dev
), and it achieves 54.06% accuracy in the open-answer framework. The 58.2% currently reported in the repo for LSTM Q + I seems to come from v5 onward, not reported until March 7 2016 on arxiv.On the multiple choice side of things, it is not until v5 (https://arxiv.org/abs/1511.05756) that the initial VQA dataset paper explicitly states a best result on "test-standard". The 63.1% figure that is currently reported in this repo lines up with the results in the v5 paper, but again, the v5 paper was not submitted to arxiv until 7 Mar 2016.
I believe the following actions should be taken:
test-standard
, not 58.2%.test-standard
results in the multiple choice framework are from the DPPnet paper: https://arxiv.org/abs/1511.05756. Just one version on arxiv, November 18 2015.The text was updated successfully, but these errors were encountered: