VTC: Improving Video-Text Retrieval with User Comments
-
Updated
May 1, 2024 - Python
VTC: Improving Video-Text Retrieval with User Comments
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
A list of research papers on knowledge-enhanced multimodal learning
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
code for studying OpenAI's CLIP explainability
Instruction Following Agents with Multimodal Transforemrs
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."