vision-language-transformer

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Apr 23, 2024
C++

unitaryai / VTC

Star

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated May 1, 2024
Python

shenyunhang / APE

Star

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

open-world object-detection image-segmentation referring-expression-comprehension vision-language-transformer

Updated May 8, 2024
Python

salesforce / BLIP

Star

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

image-captioning visual-reasoning visual-question-answering vision-language vision-language-transformer image-text-retrieval vision-and-language-pre-training

Updated May 20, 2024
Jupyter Notebook

IDEA-Research / GroundingDINO

Star

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated May 23, 2024
Python

salesforce / LAVIS

Star

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Jun 3, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-transformer

Here are 15 public repositories matching this topic...

henghuiding / Vision-Language-Transformer

sMamooler / CLIP_Explainability

lhao499 / instructrl

marialymperaiou / knowledge-enhanced-multimodal-learning

henghuiding / ReLA

atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

sdc17 / CrossGET

sdc17 / UPop

yiren-jian / BLIText

AlibabaResearch / AdvancedLiterateMachinery

unitaryai / VTC

shenyunhang / APE

salesforce / BLIP

IDEA-Research / GroundingDINO

salesforce / LAVIS

Improve this page

Add this topic to your repo