A list of research papers on knowledge-enhanced multimodal learning
-
Updated
Dec 8, 2022
A list of research papers on knowledge-enhanced multimodal learning
Matching questions to correct answers using pre-trained BERT models.
The Unified Code of Image-Text Retrieval for Further Exploration.
Searching Images: From Clip And Beyond
Modern Image Search's course repository for Super AI Engineer Development Program SS4
[TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on im…
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU
Research Code for Multimodal-Cognition Team in Ant Group
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Image captioning using python and BLIP
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"
Add a description, image, and links to the image-text-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the image-text-retrieval topic, visit your repo's landing page and select "manage topics."