[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
-
Updated
Jan 7, 2022 - Python
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
code for studying OpenAI's CLIP explainability
Instruction Following Agents with Multimodal Transforemrs
A list of research papers on knowledge-enhanced multimodal learning
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
VTC: Improving Video-Text Retrieval with User Comments
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
LAVIS - A One-stop Library for Language-Vision Intelligence
Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."