Sort out VQA real vs abstract #48

pde · 2017-07-12T01:44:18Z

At the moment, the VQA data assumes that results are for a combined real+abstract dataset. But at least some of the results are in fact just for "real". So we need to fix that...

CalvinLeGassick · 2017-07-12T05:55:05Z

I have been a little stumped about this –– are most papers using a combined real + abstract dataset?
Can you point me to any specific papers that use a combined real and abstract dataset in their reported result?

It is often not stated explicitly, but I was under the impression that most of the papers were only doing real images. I had this assumption for a few reasons:
1 From v7 on the original VQA papers:
"For abstract scenes, we created splits for standardization, separating the scenes into 20K/10K/20K for train/val/test splits, respectively. There are no subsplits (test-dev, test-standard, test-challenge, test-reserve) for abstract scenes."
2 I have not seen Abstract Scenes in the "example pictures".
3 Sometimes papers explicitly refer to the questions as coming from MS COCO images. I took this to mean, "explicitly not the Abstract Scenes from the VQA dataset".

VQA papers often site the number of questions / images they use. Maybe it can be definitely worked out whether or not they are using abstract scenes are not from those numbers?

Do you believe (HQI+ResNet: https://arxiv.org/pdf/1606.00061v1.pdf) uses a combined real+abstract dataset in their analysis? This one has an example of 3: "VQA dataset is the largest dataset for this problem, containing human annotated questions and answers on Microsoft COCO dataset [11]."

pde · 2017-07-12T23:59:36Z

@CalvinLeGassick yes I think I was confused when reading the original VQA paper, and in particular the human performance number in the notebook is currently wrong; it should just be whatever the number was for VQA 1.0 real rather than the weighted average of real+abstract.

We should do a sweep of the literature to see if there are any VQA abstract results out there (though certainly the VQA survey paper doesn't include any).

CalvinLeGassick · 2017-07-13T03:56:20Z

@pde agreed, I haven't seen any use of the weighted average of real+abstract in the literature and it probably should not be used here.

For Abstract Scenes
This paper reports results on abstract scenes and references 4 others with results on test: Graph-Structured Representations for Visual Question Answering. It also references another paper, which focuses only on abstract binary questions: Yin and Yang: Balancing and Answering Binary Visual Questions (Binary only).

pde · 2017-07-13T22:42:54Z

I was just looking at the "Graph Structured Representations" paper, and noticing that it reports some LSTM results that it attributes to the original VQA paper (eg 61.41 for LSTM blind on abstract-mc, 69.21 for LSTM + global features)... but I can't find those results in the VQA paper. So unless they contradict me (hi @dteney) I'm going to tentatively conclude the Graph Structured Representation team retrained to get those numbers themselves.

dteney · 2017-07-13T22:48:23Z

Hi @pde, those numbers were submitted by the VT for the 2016 challenge. They're still up on the leaderboards:
http://visualqa.org/aoe.html
http://visualqa.org/amc.html

Damien Teney

pde · 2017-08-16T19:42:33Z

I've merged and pushed what I think is a fix to this, plus new results for VQA 2.0 real OE. @CalvinLeGassick @dteney lmk if what you see on the live site isn't what you'd expect for VQA.

CalvinLeGassick · 2017-08-18T22:42:56Z

Great. In terms of seeing what I expect in general, I still think my suggestions from this closed issue need to be implemented:

#49

pde added bug data metric definitions labels Jul 12, 2017

pde mentioned this issue Jul 12, 2017

VQA: Is LSTM Q+I info correct? #49

Open

pde mentioned this issue Aug 16, 2017

Fix and update VQA results #59

Merged

pde closed this as completed in #59 Aug 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort out VQA real vs abstract #48

Sort out VQA real vs abstract #48

pde commented Jul 12, 2017

CalvinLeGassick commented Jul 12, 2017 •

edited

pde commented Jul 12, 2017

CalvinLeGassick commented Jul 13, 2017

pde commented Jul 13, 2017

dteney commented Jul 13, 2017

pde commented Aug 16, 2017

CalvinLeGassick commented Aug 18, 2017

Sort out VQA real vs abstract #48

Sort out VQA real vs abstract #48

Comments

pde commented Jul 12, 2017

CalvinLeGassick commented Jul 12, 2017 • edited

pde commented Jul 12, 2017

CalvinLeGassick commented Jul 13, 2017

pde commented Jul 13, 2017

dteney commented Jul 13, 2017

pde commented Aug 16, 2017

CalvinLeGassick commented Aug 18, 2017

CalvinLeGassick commented Jul 12, 2017 •

edited