video-understanding

Here are 182 public repositories matching this topic...

OpenGVLab / InternVideo

Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understanding video-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering masked-autoencoder temporal-action-localization contrastive-learning spatio-temporal-action-localization zero-shot-retrieval video-clip vision-transformer zero-shot-classification foundation-models instruction-tuning

Updated May 16, 2024
Python

zhaoyue-zephyrus / AVION

Star

Code release for "Training a Large Video Model on a Single Machine in a Day"

video-understanding efficient-training

Updated May 15, 2024
Python

whwu95 / FreeVA

Star

FreeVA: Offline MLLM as Training-Free Video Assistant

chatbot video-understanding zero-shot-video-captioning video-question-answering chatgpt vision-language-model llava training-free multimodal-large-language-models

Updated May 14, 2024
Python

SoccerNet / sn-gamestate

Star

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)

tracking detection sports soccer video-understanding re-identification multi-object-tracking sports-analytics soccernet digit-re

Updated May 13, 2024
Python

boheumd / MA-LMM

Star

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

video-understanding llm

Updated May 12, 2024
Python

Vision-CAIR / MiniGPT4-video

Star

Official code for MiniGPT4-video

video-understanding video-question-answering

Updated May 8, 2024
Python

ZijiaLewisLu / CVPR2024-FACT

Star

Official Repo for CVPR 2024 Paper "FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentation"

video-understanding action-segmentation cvpr2024

Updated May 5, 2024
Python

Fsoft-AIC / UGLF

Star

[IJCNN 2024] Unifying Global and Local Scene Entities Modelling for Precise Action Spotting

video-processing video-understanding vision-language-model action-spotting

Updated May 4, 2024
Python

westlake-repl / MicroLens

Star

A Large Short-video Recommendation Dataset with Raw Text/Audio/Image/Videos (Talk Invited by DeepMind).

video video-understanding large short-video video-generation audio-recommendation image-recommendation video-recommendation foundation-models large-language-models llm text-recommendation llm-recommendation video-understanding-dataset video-generation-dataset

Updated May 3, 2024
Python

Fsoft-AIC / Z-GMOT

Star

[NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking

video-understanding vision-language-model open-vocabulary-object-tracking

Updated May 3, 2024
Python

unitaryai / VTC

Star

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated May 1, 2024
Python

unitaryai / VTC-dataset

Star

dataset video-understanding video-text-retrieval vision-language-pretraining vision-language-dataset

Updated May 1, 2024
Python

LilyDaytoy / OpenPVSG

Star

Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23

scene-graph video-understanding scene-understanding scene-graph-generation

Updated Apr 30, 2024
Jupyter Notebook

showlab / Awesome-Video-Diffusion

Star

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion

Updated Apr 30, 2024

declare-lab / Sealing

Star

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality video-understanding video-question-answering visual-language-models naacl2024

Updated Apr 26, 2024
Python

seanzhuh / Awesome-Open-Vocabulary-Detection-and-Segmentation

Star

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

object-detection semantic-segmentation video-understanding zero-shot instance-segmentation panoptic-segmentation 3d-scene-understanding open-vocabulary

Updated Apr 25, 2024

TencentARC / ST-LLM

Star

Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

video-understanding large-language-models video-language-model

Updated Apr 24, 2024
Python

IntelLabs / GraVi-T

Star

Graph learning framework for long-term video understanding

video-understanding graph-learning pytorch-geometric

Updated Apr 20, 2024
Python

henghuiding / MeViS

Star

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

video-understanding multimodal-learning referring-expression-segmentation referring-expression-comprehension referring-video-object-segmentation mose-dataset mevis-dataset

Updated Apr 19, 2024
Python

sming256 / OpenTAD

Star

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

video-understanding temporal-action-detection temporal-action-localization

Updated Apr 18, 2024
Python

Improve this page

Add a description, image, and links to the video-understanding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the video-understanding topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video-understanding

Here are 182 public repositories matching this topic...

OpenGVLab / InternVideo

zhaoyue-zephyrus / AVION

whwu95 / FreeVA

SoccerNet / sn-gamestate

boheumd / MA-LMM

Vision-CAIR / MiniGPT4-video

ZijiaLewisLu / CVPR2024-FACT

Fsoft-AIC / UGLF

westlake-repl / MicroLens

Fsoft-AIC / Z-GMOT

unitaryai / VTC

unitaryai / VTC-dataset

LilyDaytoy / OpenPVSG

showlab / Awesome-Video-Diffusion

declare-lab / Sealing

seanzhuh / Awesome-Open-Vocabulary-Detection-and-Segmentation

TencentARC / ST-LLM

IntelLabs / GraVi-T

henghuiding / MeViS

sming256 / OpenTAD

Improve this page

Add this topic to your repo