#

vlm

Here are 59 public repositories matching this topic...

JosefAlbers / Phi-3-Vision-MLX

microsoft/Phi-3-vision-128k-instruct for Apple MLX

macos mac metal lora mlx vlm fine-tuning finetuning llm phi-3 phi-3-vision phi-3-mini

Updated Jun 3, 2024
Jupyter Notebook

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Jun 3, 2024
Python

ThuCCSLab / Awesome-LM-SSP

A reading list for large models safety, security, and privacy.

nlp security privacy jailbreak safety awesome-list language-model vlm adversarial-attacks diffusion-models llm

Updated Jun 3, 2024

MbodiAI / mbodied-agents

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal large-language-models llm generative-ai vision-language-model

Updated Jun 3, 2024
Python

mgonzs13 / llama_ros

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

cpp llama gpt ros2 vlm llm llava llamacpp ggml gguf llavacpp

Updated Jun 2, 2024
C++

gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Jun 2, 2024
Python

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Jun 2, 2024
Python

xlang-ai / OSWorld

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Jun 1, 2024
Python

9unu / Outlook_RecSys

[Tobig's 컨퍼런스] VLM 모델을 활용한 대화형 코디 추천 시스템

multiprocessing recommender-system vlm

Updated May 31, 2024
Python

RPIDIAL / Disease-informed-VLM-Adaptation

MICCAI 2024 - Disease-informed Adaptation of Vision-Language Models

transfer-learning vlm adaptation multimodal vision-language-model new-disease

Updated May 31, 2024

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated May 30, 2024
Python

foundation-multimodal-models / ConBench

Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".

benchmark consistency vlm gpt-4o qwen-vl-max

Updated May 30, 2024

adithya-s-k / YoloGemma

Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detection and segmentation.

gemma vlm paligemma

Updated May 29, 2024
Python

manishkumart / Super-Rapid-Annotator-Multimodal-Annotation-Tool

This repository is part of the GSoC '24 project and demonstrates video annotation capabilities through the integration of a multimodal vision and language model with spatiotemporal analysis.

nlp vlm multimodal llm

Updated May 29, 2024

niuzaisheng / ScreenAgent

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model

agent ai vlm llm

Updated May 29, 2024
Python

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated May 28, 2024
Markdown

GodotMisogi / AeroFuse.jl

A toolbox meant for aircraft design analyses.

design julia aerospace vlm mdao

Updated May 27, 2024
Julia

ProGamerGov / VLM-Captioning-Tools

Python scripts to use for captioning images with VLMs

text-summarization image-captioning mistral vlm vision-language llm moondream cogvlm llama3

Updated May 23, 2024
Python

gokayfem / dspy-ollama-colab

dspy with ollama and llamacpp on google colab

evaluation agents vlm colab-notebook dspy llm llamacpp evals ollama

Updated May 23, 2024
Jupyter Notebook

opendilab / PsyDI

PsyDI: A MBTI agent that helps you understand your personality type through a relaxed multi-modal interaction.

reinforcement-learning chatbot mbti vlm llm

Updated May 21, 2024
TypeScript

Improve this page

Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."