microsoft/Phi-3-vision-128k-instruct for Apple MLX
-
Updated
Jun 3, 2024 - Jupyter Notebook
microsoft/Phi-3-vision-128k-instruct for Apple MLX
A reading list for large models safety, security, and privacy.
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
[Tobig's 컨퍼런스] VLM 모델을 활용한 대화형 코디 추천 시스템
MICCAI 2024 - Disease-informed Adaptation of Vision-Language Models
Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
This repository is part of the GSoC '24 project and demonstrates video annotation capabilities through the integration of a multimodal vision and language model with spatiotemporal analysis.
Famous Vision Language Models and Their Architectures
Python scripts to use for captioning images with VLMs
dspy with ollama and llamacpp on google colab
PsyDI: A MBTI agent that helps you understand your personality type through a relaxed multi-modal interaction.
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."