๐ I am pursuing my PhD on the topic of visual perception and reasoning in the open world.
๐ญ Iโm recently focusing on scene graph generation ๐ธ, vision language models ๐ง , and embodied AI ๐ค๏ธ.
Contact GitHub support about this userโs behavior. Learn more about reporting abuse.
Report abuse๐ I am pursuing my PhD on the topic of visual perception and reasoning in the open world.
๐ญ Iโm recently focusing on scene graph generation ๐ธ, vision language models ๐ง , and embodied AI ๐ค๏ธ.
๐Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)