-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort out VQA real vs abstract #48
Comments
I have been a little stumped about this –– are most papers using a combined real + abstract dataset? It is often not stated explicitly, but I was under the impression that most of the papers were only doing real images. I had this assumption for a few reasons: VQA papers often site the number of questions / images they use. Maybe it can be definitely worked out whether or not they are using abstract scenes are not from those numbers? Do you believe (HQI+ResNet: https://arxiv.org/pdf/1606.00061v1.pdf) uses a combined real+abstract dataset in their analysis? This one has an example of 3: "VQA dataset is the largest dataset for this problem, containing human annotated questions and answers on Microsoft COCO dataset [11]." |
@CalvinLeGassick yes I think I was confused when reading the original VQA paper, and in particular the human performance number in the notebook is currently wrong; it should just be whatever the number was for VQA 1.0 real rather than the weighted average of real+abstract. We should do a sweep of the literature to see if there are any VQA abstract results out there (though certainly the VQA survey paper doesn't include any). |
@pde agreed, I haven't seen any use of the weighted average of real+abstract in the literature and it probably should not be used here. For Abstract Scenes |
I was just looking at the "Graph Structured Representations" paper, and noticing that it reports some LSTM results that it attributes to the original VQA paper (eg 61.41 for LSTM blind on abstract-mc, 69.21 for LSTM + global features)... but I can't find those results in the VQA paper. So unless they contradict me (hi @dteney) I'm going to tentatively conclude the Graph Structured Representation team retrained to get those numbers themselves. |
Hi @pde, those numbers were submitted by the VT for the 2016 challenge. They're still up on the leaderboards: Damien Teney |
I've merged and pushed what I think is a fix to this, plus new results for VQA 2.0 real OE. @CalvinLeGassick @dteney lmk if what you see on the live site isn't what you'd expect for VQA. |
Great. In terms of seeing what I expect in general, I still think my suggestions from this closed issue need to be implemented: |
At the moment, the VQA data assumes that results are for a combined real+abstract dataset. But at least some of the results are in fact just for "real". So we need to fix that...
The text was updated successfully, but these errors were encountered: