Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scene graph baseline for GQA #22

Closed
yjimmyy opened this issue Apr 5, 2019 · 3 comments
Closed

Scene graph baseline for GQA #22

yjimmyy opened this issue Apr 5, 2019 · 3 comments

Comments

@yjimmyy
Copy link

yjimmyy commented Apr 5, 2019

Hello,

Is there a way to run the scene graph baseline reported in the paper or are there any available details on how to implement it?

@dorarad
Copy link
Collaborator

dorarad commented Apr 5, 2019

Hi,
Thanks a lot for the interest in the dataset!

what do you mean by scene graph baseline? the baseline reported in the supplementary?
I currently don't provide it -- will add it after NeurIPS. But the implementation is very simple and straight forward: instead of using MAC over the set of object features form the image - I embed each node in the graph based on its symbol (similarly to how question or text is treated). For each node I have a vector:
Concat(Embedding(ObjectName),
Avg(Embedding(Attribute)) over all attributes,
Avg(Concat(Linear(Embedding(Relation),Embedding(RelationTarget))) over all relations)

Where relation target is the object name of the other node participating in this relation. I may release it sooner. And then I get a set of all objects in the image and run standard MAC over that (instead of using visual features).

*Note also that the accuracy there is when testing on the ground-truth scene graphs whereas all other baselines work over the images directly, so the scores should really be compared directly to each other as obviously the direct image task is more difficult. I included this experiment to show a simple ~upper bound of "how well would we do if vision was perfect", and then an ideal model should be in principle achieve 100% in that specific setting.

@dorarad dorarad closed this as completed Apr 5, 2019
@ronsoohyeong
Copy link

Hi,
After encoding each node as vector as above, did you run CNN-based stem function before the mac network?

@dorarad
Copy link
Collaborator

dorarad commented Sep 19, 2019

Nope, the CNN stem part was for the original MAC version that worked over the older grid features (by extracting from resnet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants