An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
-
Updated
May 3, 2024 - Python
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
Karpathy Splits json files for image captioning
Microsoft COCO: Common Objects in Context for huggingface datasets
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
Create a YOLO-format subset of the COCO dataset
Mixed vision-language Attention Model that gets better by making mistakes
Object Detection Dataset Format Converter
Pytorch implementation of image captioning using transformer-based model.
PyTorch implementation of paper: "Self-critical Sequence Training for Image Captioning"
Real-time semantic image segmentation on mobile devices
Image captioning with pretrained encoder on MSCOCO
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Adds SPICE metric to coco-caption evaluation server codes
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
COCO-Stuff dataset for huggingface datasets
NLP - descriptive statistics of COCO annotations via Python COCO-API
Convert segmentation binary mask images to COCO JSON format.
A python based tool for looking things up in coco
A simple Python API (built on top of TensorFlow) for neural image captioning with MSCOCO data.
Using LSTM or Transformer to solve Image Captioning in Pytorch
Add a description, image, and links to the mscoco-dataset topic page so that developers can more easily learn about it.
To associate your repository with the mscoco-dataset topic, visit your repo's landing page and select "manage topics."