Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out VQA real vs abstract #48

Closed
pde opened this issue Jul 12, 2017 · 7 comments
Closed

Sort out VQA real vs abstract #48

pde opened this issue Jul 12, 2017 · 7 comments

Comments

@pde
Copy link
Member

pde commented Jul 12, 2017

At the moment, the VQA data assumes that results are for a combined real+abstract dataset. But at least some of the results are in fact just for "real". So we need to fix that...

@CalvinLeGassick
Copy link

CalvinLeGassick commented Jul 12, 2017

I have been a little stumped about this –– are most papers using a combined real + abstract dataset?
Can you point me to any specific papers that use a combined real and abstract dataset in their reported result?

It is often not stated explicitly, but I was under the impression that most of the papers were only doing real images. I had this assumption for a few reasons:
1 From v7 on the original VQA papers:
"For abstract scenes, we created splits for standardization, separating the scenes into 20K/10K/20K for train/val/test splits, respectively. There are no subsplits (test-dev, test-standard, test-challenge, test-reserve) for abstract scenes."
2 I have not seen Abstract Scenes in the "example pictures".
3 Sometimes papers explicitly refer to the questions as coming from MS COCO images. I took this to mean, "explicitly not the Abstract Scenes from the VQA dataset".

VQA papers often site the number of questions / images they use. Maybe it can be definitely worked out whether or not they are using abstract scenes are not from those numbers?

Do you believe (HQI+ResNet: https://arxiv.org/pdf/1606.00061v1.pdf) uses a combined real+abstract dataset in their analysis? This one has an example of 3: "VQA dataset is the largest dataset for this problem, containing human annotated questions and answers on Microsoft COCO dataset [11]."

@pde
Copy link
Member Author

pde commented Jul 12, 2017

@CalvinLeGassick yes I think I was confused when reading the original VQA paper, and in particular the human performance number in the notebook is currently wrong; it should just be whatever the number was for VQA 1.0 real rather than the weighted average of real+abstract.

We should do a sweep of the literature to see if there are any VQA abstract results out there (though certainly the VQA survey paper doesn't include any).

@CalvinLeGassick
Copy link

@pde agreed, I haven't seen any use of the weighted average of real+abstract in the literature and it probably should not be used here.

For Abstract Scenes
This paper reports results on abstract scenes and references 4 others with results on test: Graph-Structured Representations for Visual Question Answering. It also references another paper, which focuses only on abstract binary questions: Yin and Yang: Balancing and Answering Binary Visual Questions (Binary only).

@pde
Copy link
Member Author

pde commented Jul 13, 2017

I was just looking at the "Graph Structured Representations" paper, and noticing that it reports some LSTM results that it attributes to the original VQA paper (eg 61.41 for LSTM blind on abstract-mc, 69.21 for LSTM + global features)... but I can't find those results in the VQA paper. So unless they contradict me (hi @dteney) I'm going to tentatively conclude the Graph Structured Representation team retrained to get those numbers themselves.

@dteney
Copy link

dteney commented Jul 13, 2017

Hi @pde, those numbers were submitted by the VT for the 2016 challenge. They're still up on the leaderboards:
http://visualqa.org/aoe.html
http://visualqa.org/amc.html

Damien Teney

@pde
Copy link
Member Author

pde commented Aug 16, 2017

I've merged and pushed what I think is a fix to this, plus new results for VQA 2.0 real OE. @CalvinLeGassick @dteney lmk if what you see on the live site isn't what you'd expect for VQA.

@CalvinLeGassick
Copy link

Great. In terms of seeing what I expect in general, I still think my suggestions from this closed issue need to be implemented:

#49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants